[PING][PATCH] arm: Remove unsigned variant of vcaddq_m
(Pinging since I realised that this is required for my later Low Overhead Loop patch series to work) Ok for trunk with the updated changelog that Christophe mentioned? Thanks, Stamatis/Stam Markianos-Wright From: Stam Markianos-Wright Sent: Tuesday, August 1, 2023 6:21 PM To: gcc-patches@gcc.gnu.org Cc: Richard Earnshaw ; Kyrylo Tkachov Subject: arm: Remove unsigned variant of vcaddq_m Hi all, The unsigned variants of the vcaddq_m operation are not needed within the compiler, as the assembly output of the signed and unsigned versions of the ops is identical: with a `.i` suffix (as opposed to separate `.s` and `.u` suffixes). Tested with baremetal arm-none-eabi on Arm's fastmodels. Ok for trunk? Thanks, Stamatis Markianos-Wright gcc/ChangeLog: * config/arm/arm-mve-builtins-base.cc (vcaddq_rot90, vcaddq_rot270): Use common insn for signed and unsigned front-end definitions. * config/arm/arm_mve_builtins.def (vcaddq_rot90_m_u, vcaddq_rot270_m_u): Make common. (vcaddq_rot90_m_s, vcaddq_rot270_m_s): Remove. * config/arm/iterators.md (mve_insn): Merge signed and unsigned defs. (isu): Likewise. (rot): Likewise. (mve_rot): Likewise. (supf): Likewise. (VxCADDQ_M): Likewise. * config/arm/unspecs.md (unspec): Likewise. --- gcc/config/arm/arm-mve-builtins-base.cc | 4 ++-- gcc/config/arm/arm_mve_builtins.def | 6 ++--- gcc/config/arm/iterators.md | 30 +++-- gcc/config/arm/mve.md | 4 ++-- gcc/config/arm/unspecs.md | 6 ++--- 5 files changed, 21 insertions(+), 29 deletions(-) diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-mve-builtins-base.cc index e31095ae112..426a87e9852 100644 --- a/gcc/config/arm/arm-mve-builtins-base.cc +++ b/gcc/config/arm/arm-mve-builtins-base.cc @@ -260,8 +260,8 @@ FUNCTION_PRED_P_S_U (vaddvq, VADDVQ) FUNCTION_PRED_P_S_U (vaddvaq, VADDVAQ) FUNCTION_WITH_RTX_M (vandq, AND, VANDQ) FUNCTION_ONLY_N (vbrsrq, VBRSRQ) -FUNCTION (vcaddq_rot90, unspec_mve_function_exact_insn_rot, (UNSPEC_VCADD90, UNSPEC_VCADD90, UNSPEC_VCADD90, VCADDQ_ROT90_M_S, VCADDQ_ROT90_M_U, VCADDQ_ROT90_M_F)) -FUNCTION (vcaddq_rot270, unspec_mve_function_exact_insn_rot, (UNSPEC_VCADD270, UNSPEC_VCADD270, UNSPEC_VCADD270, VCADDQ_ROT270_M_S, VCADDQ_ROT270_M_U, VCADDQ_ROT270_M_F)) +FUNCTION (vcaddq_rot90, unspec_mve_function_exact_insn_rot, (UNSPEC_VCADD90, UNSPEC_VCADD90, UNSPEC_VCADD90, VCADDQ_ROT90_M, VCADDQ_ROT90_M, VCADDQ_ROT90_M_F)) +FUNCTION (vcaddq_rot270, unspec_mve_function_exact_insn_rot, (UNSPEC_VCADD270, UNSPEC_VCADD270, UNSPEC_VCADD270, VCADDQ_ROT270_M, VCADDQ_ROT270_M, VCADDQ_ROT270_M_F)) FUNCTION (vcmlaq, unspec_mve_function_exact_insn_rot, (-1, -1, UNSPEC_VCMLA, -1, -1, VCMLAQ_M_F)) FUNCTION (vcmlaq_rot90, unspec_mve_function_exact_insn_rot, (-1, -1, UNSPEC_VCMLA90, -1, -1, VCMLAQ_ROT90_M_F)) FUNCTION (vcmlaq_rot180, unspec_mve_function_exact_insn_rot, (-1, -1, UNSPEC_VCMLA180, -1, -1, VCMLAQ_ROT180_M_F)) diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def index 43dacc3dda1..6ac1812c697 100644 --- a/gcc/config/arm/arm_mve_builtins.def +++ b/gcc/config/arm/arm_mve_builtins.def @@ -523,8 +523,8 @@ VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vhsubq_m_n_u, v16qi, v8hi, v4si) VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vhaddq_m_u, v16qi, v8hi, v4si) VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vhaddq_m_n_u, v16qi, v8hi, v4si) VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, veorq_m_u, v16qi, v8hi, v4si) -VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vcaddq_rot90_m_u, v16qi, v8hi, v4si) -VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vcaddq_rot270_m_u, v16qi, v8hi, v4si) +VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vcaddq_rot90_m_, v16qi, v8hi, v4si) +VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vcaddq_rot270_m_, v16qi, v8hi, v4si) VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vbicq_m_u, v16qi, v8hi, v4si) VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vandq_m_u, v16qi, v8hi, v4si) VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vaddq_m_u, v16qi, v8hi, v4si) @@ -587,8 +587,6 @@ VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vhcaddq_rot270_m_s, v16qi, v8hi, v4si) VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vhaddq_m_s, v16qi, v8hi, v4si) VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vhaddq_m_n_s, v16qi, v8hi, v4si) VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, veorq_m_s, v16qi, v8hi, v4si) -VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vcaddq_rot90_m_s, v16qi, v8hi, v4si) -VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vcaddq_rot270_m_s, v16qi, v8hi, v4si) VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vbrsrq_m_n_s, v16qi, v8hi, v4si) VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vbicq_m_s, v16qi, v8hi, v4si) VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vandq_m_s, v16qi, v8hi, v4si) diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md index b13ff53d36f..2edd0b06370 100644 --- a/gcc/config/arm
[commited trunk 7/9] arm testsuite: Remove reduntant tests
Following Andrea's overhaul of the MVE testsuite, these tests are now reduntant, as equivalent checks have been added to the each intrinsic's .c test. gcc/testsuite/ChangeLog: * gcc.target/arm/mve/intrinsics/mve_fp_vaddq_n.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vaddq_m.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vaddq_n.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vddupq_m_n_u16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vddupq_m_n_u32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vddupq_m_n_u8.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vddupq_n_u16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vddupq_n_u32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vddupq_n_u8.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vddupq_x_n_u16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vddupq_x_n_u32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vddupq_x_n_u8.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vdwdupq_x_n_u16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vdwdupq_x_n_u32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vdwdupq_x_n_u8.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vidupq_m_n_u16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vidupq_m_n_u32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vidupq_m_n_u8.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vidupq_n_u16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vidupq_n_u32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vidupq_n_u8.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vidupq_x_n_u16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vidupq_x_n_u32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vidupq_x_n_u8.c: Removed. * gcc.target/arm/mve/intrinsics/mve_viwdupq_x_n_u16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_viwdupq_x_n_u32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_viwdupq_x_n_u8.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_offset_s64.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_offset_u64.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_offset_z_s64.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_offset_z_u64.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_shifted_offset_s64.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_shifted_offset_u64.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_shifted_offset_z_s64.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_shifted_offset_z_u64.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_f16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_s16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_s32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_u16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_u32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_z_f16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_z_s16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_z_s32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_z_u16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_z_u32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_f16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_s16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_s32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_u16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_u32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_z_f16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_z_s16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_z_s32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_z_u16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_z_u32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_f32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_s32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_u32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_z_f32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_z_s32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_z_u32.c: Removed. *
[commited trunk 9/9] arm testsuite: Shifts and get_FPSCR ACLE optimisation fixes
These newly updated tests were rewritten by Andrea. Some of them needed further manual fixing as follows: * The #shift immediate value not in the check-function-bodies as expected * The ACLE was specifying sub-optimal code: lsr+and instead of ubfx. In this case the test rewritten from the ACLE had the lsr+and pattern, but the compiler was able to optimise to ubfx. Hence I've changed the test to now match on ubfx. * Added a separate test to check shift on constants being optimised to movs. gcc/testsuite/ChangeLog: * gcc.target/arm/mve/intrinsics/srshr.c: Update shift value. * gcc.target/arm/mve/intrinsics/srshrl.c: Update shift value. * gcc.target/arm/mve/intrinsics/uqshl.c: Update shift value. * gcc.target/arm/mve/intrinsics/uqshll.c: Update shift value. * gcc.target/arm/mve/intrinsics/urshr.c: Update shift value. * gcc.target/arm/mve/intrinsics/urshrl.c: Update shift value. * gcc.target/arm/mve/intrinsics/vadciq_m_s32.c: Update to ubfx. * gcc.target/arm/mve/intrinsics/vadciq_m_u32.c: Update to ubfx. * gcc.target/arm/mve/intrinsics/vadciq_s32.c: Update to ubfx. * gcc.target/arm/mve/intrinsics/vadciq_u32.c: Update to ubfx. * gcc.target/arm/mve/intrinsics/vadcq_m_s32.c: Update to ubfx. * gcc.target/arm/mve/intrinsics/vadcq_m_u32.c: Update to ubfx. * gcc.target/arm/mve/intrinsics/vadcq_s32.c: Update to ubfx. * gcc.target/arm/mve/intrinsics/vadcq_u32.c: Update to ubfx. * gcc.target/arm/mve/intrinsics/vsbciq_m_s32.c: Update to ubfx. * gcc.target/arm/mve/intrinsics/vsbciq_m_u32.c: Update to ubfx. * gcc.target/arm/mve/intrinsics/vsbciq_s32.c: Update to ubfx. * gcc.target/arm/mve/intrinsics/vsbciq_u32.c: Update to ubfx. * gcc.target/arm/mve/intrinsics/vsbcq_m_s32.c: Update to ubfx. * gcc.target/arm/mve/intrinsics/vsbcq_m_u32.c: Update to ubfx. * gcc.target/arm/mve/intrinsics/vsbcq_s32.c: Update to ubfx. * gcc.target/arm/mve/intrinsics/vsbcq_u32.c: Update to ubfx. * gcc.target/arm/mve/mve_const_shifts.c: New test. --- .../gcc.target/arm/mve/intrinsics/srshr.c | 2 +- .../gcc.target/arm/mve/intrinsics/srshrl.c| 2 +- .../gcc.target/arm/mve/intrinsics/uqshl.c | 14 +-- .../gcc.target/arm/mve/intrinsics/uqshll.c| 14 +-- .../gcc.target/arm/mve/intrinsics/urshr.c | 4 +- .../gcc.target/arm/mve/intrinsics/urshrl.c| 4 +- .../arm/mve/intrinsics/vadciq_m_s32.c | 8 +--- .../arm/mve/intrinsics/vadciq_m_u32.c | 8 +--- .../arm/mve/intrinsics/vadciq_s32.c | 8 +--- .../arm/mve/intrinsics/vadciq_u32.c | 8 +--- .../arm/mve/intrinsics/vadcq_m_s32.c | 8 +--- .../arm/mve/intrinsics/vadcq_m_u32.c | 8 +--- .../gcc.target/arm/mve/intrinsics/vadcq_s32.c | 8 +--- .../gcc.target/arm/mve/intrinsics/vadcq_u32.c | 8 +--- .../arm/mve/intrinsics/vsbciq_m_s32.c | 8 +--- .../arm/mve/intrinsics/vsbciq_m_u32.c | 8 +--- .../arm/mve/intrinsics/vsbciq_s32.c | 8 +--- .../arm/mve/intrinsics/vsbciq_u32.c | 8 +--- .../arm/mve/intrinsics/vsbcq_m_s32.c | 8 +--- .../arm/mve/intrinsics/vsbcq_m_u32.c | 8 +--- .../gcc.target/arm/mve/intrinsics/vsbcq_s32.c | 8 +--- .../gcc.target/arm/mve/intrinsics/vsbcq_u32.c | 8 +--- .../gcc.target/arm/mve/mve_const_shifts.c | 41 +++ 23 files changed, 81 insertions(+), 128 deletions(-) create mode 100644 gcc/testsuite/gcc.target/arm/mve/mve_const_shifts.c diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c index 94e3f42fd33..734375d58c0 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c @@ -12,7 +12,7 @@ extern "C" { /* **foo: ** ... -** srshr (?:ip|fp|r[0-9]+), #shift(?:@.*|) +** srshr (?:ip|fp|r[0-9]+), #1(?:@.*|) ** ... */ int32_t diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c index 65f28ccbfde..a91943c38a0 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c @@ -12,7 +12,7 @@ extern "C" { /* **foo: ** ... -** srshrl (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #shift(?: @.*|) +** srshrl (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #1(?: @.*|) ** ... */ int64_t diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c index b23c9d97ba6..462531cad54 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c @@ -12,7 +12,7 @@ extern "C" { /* **foo: ** ... -** uqshl (?:ip|fp|r[0-9]+), #shift(?:@.*|) +** uqshl (?:ip|fp|r[0-9]+), #1(?:@.*|) ** ...
[commited trunk 2/9] arm: Fix vstrwq* backend + testsuite
From: Andrea Corallo Hi all, this patch fixes the vstrwq* MVE instrinsics failing to emit the correct sequence of instruction due to a missing predicate. Also the immediate range is fixed to be multiples of 2 up between [-252, 252]. Best Regards Andrea gcc/ChangeLog: * config/arm/constraints.md (mve_vldrd_immediate): Move it to predicates.md. (Ri): Move constraint definition from predicates.md. (Rl): Define new constraint. * config/arm/mve.md (mve_vstrwq_scatter_base_wb_p_v4si): Add missing constraint. (mve_vstrwq_scatter_base_wb_p_fv4sf): Add missing Up constraint for op 1, use mve_vstrw_immediate predicate and Rl constraint for op 2. Fix asm output spacing. (mve_vstrdq_scatter_base_wb_p_v2di): Add missing constraint. * config/arm/predicates.md (Ri) Move constraint to constraints.md (mve_vldrd_immediate): Move it from constraints.md. (mve_vstrw_immediate): New predicate. gcc/testsuite/ChangeLog: * gcc.target/arm/mve/intrinsics/vstrwq_f32.c: Use check-function-bodies instead of scan-assembler checks. Use extern "C" for C++ testing. * gcc.target/arm/mve/intrinsics/vstrwq_p_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_p_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_p_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_p_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_p_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_p_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_p_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_p_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_p_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_u32.c: Likewise. --- gcc/config/arm/constraints.md | 20 -- gcc/config/arm/mve.md | 10 ++--- gcc/config/arm/predicates.md | 14 +++ .../arm/mve/intrinsics/vstrwq_f32.c | 32 --- .../arm/mve/intrinsics/vstrwq_p_f32.c | 40 --- .../arm/mve/intrinsics/vstrwq_p_s32.c | 40 --- .../arm/mve/intrinsics/vstrwq_p_u32.c | 40 --- .../arm/mve/intrinsics/vstrwq_s32.c | 32 --- .../mve/intrinsics/vstrwq_scatter_base_f32.c | 28 +++-- .../intrinsics/vstrwq_scatter_base_p_f32.c| 36 +++-- .../intrinsics/vstrwq_scatter_base_p_s32.c| 36 +++-- .../intrinsics/vstrwq_scatter_base_p_u32.c| 36 +++-- .../mve/intrinsics/vstrwq_scatter_base_s32.c | 28 +++-- .../mve/intrinsics/vstrwq_scatter_base_u32.c | 28 +++-- .../intrinsics/vstrwq_scatter_base_wb_f32.c | 32 --- .../intrinsics/vstrwq_scatter_base_wb_p_f32.c | 40 --- .../intrinsics/vstrwq_scatter_base_wb_p_s32.c | 40 --- .../intrinsics/vstrwq_scatter_base_wb_p_u32.c | 40 --- .../intrinsics/vstrwq_scatter_base_wb_s32.c | 32 --- .../intrinsics/vstrwq_scatter_base_wb_u32.c | 32 --- .../intrinsics/vstrwq_scatter_offset_f32.c| 32 --- .../intrinsics/vstrwq_scatter_offset_p_f32.c | 40 ---
[commited trunk 8/9] arm testsuite: XFAIL or relax registers in some tests [PR109697]
Hi all, This is a simple testsuite tidy-up patch, addressing to types of errors: * The vcmp vector-scalar tests failing due to the compiler's preference of vector-vector comparisons, over vector-scalar comparisons. This is due to the lack of cost model for MVE and the compiler not knowing that the RTL vec_duplicate is free in those instructions. For now, we simply XFAIL these checks. * The tests for pr108177 had strict usage of q0 and r0 registers, meaning that they would FAIL with -mfloat-abi=softf. The register checks have now been relaxed. A couple of these run-tests also had incosistent use of integer MVE with floating point vectors, so I've now changed these to use FP MVE. gcc/testsuite/ChangeLog: PR target/109697 * gcc.target/arm/mve/intrinsics/vcmpcsq_n_u16.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpcsq_n_u32.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpcsq_n_u8.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpeqq_n_u16.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpeqq_n_u32.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpeqq_n_u8.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmphiq_n_u16.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmphiq_n_u32.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmphiq_n_u8.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpneq_n_u16.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpneq_n_u32.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpneq_n_u8.c: XFAIL check. * gcc.target/arm/mve/pr108177-1.c: Relax registers. * gcc.target/arm/mve/pr108177-10.c: Relax registers. * gcc.target/arm/mve/pr108177-11.c: Relax registers. * gcc.target/arm/mve/pr108177-12.c: Relax registers. * gcc.target/arm/mve/pr108177-13.c: Relax registers. * gcc.target/arm/mve/pr108177-13-run.c: use mve_fp * gcc.target/arm/mve/pr108177-14.c: Relax registers. * gcc.target/arm/mve/pr108177-14-run.c: use mve_fp * gcc.target/arm/mve/pr108177-2.c: Relax registers. * gcc.target/arm/mve/pr108177-3.c: Relax registers. * gcc.target/arm/mve/pr108177-4.c: Relax registers. * gcc.target/arm/mve/pr108177-5.c: Relax registers. * gcc.target/arm/mve/pr108177-6.c: Relax registers. * gcc.target/arm/mve/pr108177-7.c: Relax registers. * gcc.target/arm/mve/pr108177-8.c: Relax registers. * gcc.target/arm/mve/pr108177-9.c: Relax registers. --- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u16.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u32.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u8.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u16.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u32.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u8.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u16.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u32.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u8.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u16.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u32.c | 2 +-
[commited trunk 4/9] arm: Stop vadcq, vsbcq intrinsics from overwriting the FPSCR NZ flags
Hi all, We noticed that calls to the vadcq and vsbcq intrinsics, both of which use __builtin_arm_set_fpscr_nzcvqc to set the Carry flag in the FPSCR, would produce the following code: ``` < r2 is the *carry input > vmrsr3, FPSCR_nzcvqc bic r3, r3, #536870912 orr r3, r3, r2, lsl #29 vmsrFPSCR_nzcvqc, r3 ``` when the MVE ACLE instead gives a different instruction sequence of: ``` < Rt is the *carry input > VMRS Rs,FPSCR_nzcvqc BFI Rs,Rt,#29,#1 VMSR FPSCR_nzcvqc,Rs ``` the bic + orr pair is slower and it's also wrong, because, if the *carry input is greater than 1, then we risk overwriting the top two bits of the FPSCR register (the N and Z flags). This turned out to be a problem in the header file and the solution was to simply add a `& 1x0u` to the `*carry` input: then the compiler knows that we only care about the lowest bit and can optimise to a BFI. Ok for trunk? Thanks, Stam Markianos-Wright gcc/ChangeLog: * config/arm/arm_mve.h (__arm_vadcq_s32): Fix arithmetic. (__arm_vadcq_u32): Likewise. (__arm_vadcq_m_s32): Likewise. (__arm_vadcq_m_u32): Likewise. (__arm_vsbcq_s32): Likewise. (__arm_vsbcq_u32): Likewise. (__arm_vsbcq_m_s32): Likewise. (__arm_vsbcq_m_u32): Likewise. * config/arm/mve.md (get_fpscr_nzcvqc): Make unspec_volatile. gcc/testsuite/ChangeLog: * gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c: New. --- gcc/config/arm/arm_mve.h | 16 ++--- gcc/config/arm/mve.md | 2 +- .../arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c | 67 +++ 3 files changed, 76 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h index 1774e6eca2b..4ad1c99c288 100644 --- a/gcc/config/arm/arm_mve.h +++ b/gcc/config/arm/arm_mve.h @@ -4098,7 +4098,7 @@ __extension__ extern __inline int32x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vadcq_s32 (int32x4_t __a, int32x4_t __b, unsigned * __carry) { - __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & ~0x2000u) | (*__carry << 29)); + __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & ~0x2000u) | ((*__carry & 0x1u) << 29)); int32x4_t __res = __builtin_mve_vadcq_sv4si (__a, __b); *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u; return __res; @@ -4108,7 +4108,7 @@ __extension__ extern __inline uint32x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vadcq_u32 (uint32x4_t __a, uint32x4_t __b, unsigned * __carry) { - __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & ~0x2000u) | (*__carry << 29)); + __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & ~0x2000u) | ((*__carry & 0x1u) << 29)); uint32x4_t __res = __builtin_mve_vadcq_uv4si (__a, __b); *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u; return __res; @@ -4118,7 +4118,7 @@ __extension__ extern __inline int32x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vadcq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, unsigned * __carry, mve_pred16_t __p) { - __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & ~0x2000u) | (*__carry << 29)); + __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & ~0x2000u) | ((*__carry & 0x1u) << 29)); int32x4_t __res = __builtin_mve_vadcq_m_sv4si (__inactive, __a, __b, __p); *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u; return __res; @@ -4128,7 +4128,7 @@ __extension__ extern __inline uint32x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vadcq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, unsigned * __carry, mve_pred16_t __p) { - __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & ~0x2000u) | (*__carry << 29)); + __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & ~0x2000u) | ((*__carry & 0x1u) << 29)); uint32x4_t __res = __builtin_mve_vadcq_m_uv4si (__inactive, __a, __b, __p); *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u; return __res; @@ -4174,7 +4174,7 @@ __extension__ extern __inline int32x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vsbcq_s32 (int32x4_t __a, int32x4_t __b, unsigned * __carry) { - __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & ~0x2000u) | (*__carry << 29)); + __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & ~0x2000u) | ((*__carry & 0x1u) << 29)); int32x4_t __res = __builtin_mve_vsbcq_sv4si (__a, __b);
[commited trunk 5/9] arm: Fix overloading of MVE scalar constant parameters on vbicq
We found this as part of the wider testsuite updates. The applicable tests are authored by Andrea earlier in this patch series Ok for trunk? gcc/ChangeLog: * config/arm/arm_mve.h (__arm_vbicq): Change coerce on scalar constant. --- gcc/config/arm/arm_mve.h | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h index 4ad1c99c288..30cec519791 100644 --- a/gcc/config/arm/arm_mve.h +++ b/gcc/config/arm/arm_mve.h @@ -10847,10 +10847,10 @@ extern void *__ARM_undef; #define __arm_vbicq(p0,p1) ({ __typeof(p0) __p0 = (p0); \ __typeof(p1) __p1 = (p1); \ _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \ - int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce1 (__p1, int)), \ - int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce1 (__p1, int)), \ - int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce1 (__p1, int)), \ - int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce1 (__p1, int)), \ + int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3 (p1, int)), \ + int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3 (p1, int)), \ + int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3 (p1, int)), \ + int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3 (p1, int)), \ int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vbicq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \ int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vbicq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \ int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vbicq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \ @@ -11699,10 +11699,10 @@ extern void *__ARM_undef; #define __arm_vbicq(p0,p1) ({ __typeof(p0) __p0 = (p0); \ __typeof(p1) __p1 = (p1); \ _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \ - int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce1 (__p1, int)), \ - int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce1 (__p1, int)), \ - int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce1 (__p1, int)), \ - int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce1 (__p1, int)), \ + int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3 (p1, int)), \ + int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3 (p1, int)), \ + int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3 (p1, int)), \ + int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3 (p1, int)), \ int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vbicq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \ int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vbicq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \ int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vbicq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \ -- 2.25.1
[committed gcc12 backport] arm: Fix overloading of MVE scalar constant parameters on vbicq, vmvnq_m
We found this as part of the wider testsuite updates. The applicable tests are authored by Andrea earlier in this patch series Ok for trunk? gcc/ChangeLog: * config/arm/arm_mve.h (__arm_vbicq): Change coerce on scalar constant. (__arm_vmvnq_m): Likewise. --- gcc/config/arm/arm_mve.h | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h index 39b3446617d..0b35bd0eedd 100644 --- a/gcc/config/arm/arm_mve.h +++ b/gcc/config/arm/arm_mve.h @@ -35906,10 +35906,10 @@ extern void *__ARM_undef; #define __arm_vbicq(p0,p1) ({ __typeof(p0) __p0 = (p0); \ __typeof(p1) __p1 = (p1); \ _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \ - int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce1 (__p1, int)), \ - int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce1 (__p1, int)), \ - int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce1 (__p1, int)), \ - int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce1 (__p1, int)), \ + int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3 (p1, int)), \ + int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3 (p1, int)), \ + int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3 (p1, int)), \ + int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3 (p1, int)), \ int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vbicq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \ int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vbicq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \ int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vbicq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \ @@ -38825,10 +38825,10 @@ extern void *__ARM_undef; #define __arm_vbicq(p0,p1) ({ __typeof(p0) __p0 = (p0); \ __typeof(p1) __p1 = (p1); \ _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \ - int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce1 (__p1, int)), \ - int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce1 (__p1, int)), \ - int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce1 (__p1, int)), \ - int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce1 (__p1, int)), \ + int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3 (p1, int)), \ + int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3 (p1, int)), \ + int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3 (p1, int)), \ + int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3 (p1, int)), \ int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vbicq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \ int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vbicq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \ int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vbicq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \ @@ -40962,10 +40962,10 @@ extern void *__ARM_undef; int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vmvnq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), p2), \ int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vmvnq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), p2), \ int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vmvnq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), p2), \ - int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmvnq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce1(__p1,
[committed gcc12 backport] arm testsuite: Shifts and get_FPSCR ACLE optimisation fixes
These newly updated tests were rewritten by Andrea. Some of them needed further manual fixing as follows: * The #shift immediate value not in the check-function-bodies as expected * The ACLE was specifying sub-optimal code: lsr+and instead of ubfx. In this case the test rewritten from the ACLE had the lsr+and pattern, but the compiler was able to optimise to ubfx. Hence I've changed the test to now match on ubfx. * Added a separate test to check shift on constants being optimised to movs. gcc/testsuite/ChangeLog: * gcc.target/arm/mve/intrinsics/srshr.c: Update shift value. * gcc.target/arm/mve/intrinsics/srshrl.c: Update shift value. * gcc.target/arm/mve/intrinsics/uqshl.c: Update shift value. * gcc.target/arm/mve/intrinsics/uqshll.c: Update shift value. * gcc.target/arm/mve/intrinsics/urshr.c: Update shift value. * gcc.target/arm/mve/intrinsics/urshrl.c: Update shift value. * gcc.target/arm/mve/intrinsics/vadciq_m_s32.c: Update to ubfx. * gcc.target/arm/mve/intrinsics/vadciq_m_u32.c: Update to ubfx. * gcc.target/arm/mve/intrinsics/vadciq_s32.c: Update to ubfx. * gcc.target/arm/mve/intrinsics/vadciq_u32.c: Update to ubfx. * gcc.target/arm/mve/intrinsics/vadcq_m_s32.c: Update to ubfx. * gcc.target/arm/mve/intrinsics/vadcq_m_u32.c: Update to ubfx. * gcc.target/arm/mve/intrinsics/vadcq_s32.c: Update to ubfx. * gcc.target/arm/mve/intrinsics/vadcq_u32.c: Update to ubfx. * gcc.target/arm/mve/intrinsics/vsbciq_m_s32.c: Update to ubfx. * gcc.target/arm/mve/intrinsics/vsbciq_m_u32.c: Update to ubfx. * gcc.target/arm/mve/intrinsics/vsbciq_s32.c: Update to ubfx. * gcc.target/arm/mve/intrinsics/vsbciq_u32.c: Update to ubfx. * gcc.target/arm/mve/intrinsics/vsbcq_m_s32.c: Update to ubfx. * gcc.target/arm/mve/intrinsics/vsbcq_m_u32.c: Update to ubfx. * gcc.target/arm/mve/intrinsics/vsbcq_s32.c: Update to ubfx. * gcc.target/arm/mve/intrinsics/vsbcq_u32.c: Update to ubfx. * gcc.target/arm/mve/mve_const_shifts.c: New test. --- .../gcc.target/arm/mve/intrinsics/srshr.c | 2 +- .../gcc.target/arm/mve/intrinsics/srshrl.c| 2 +- .../gcc.target/arm/mve/intrinsics/uqshl.c | 14 +-- .../gcc.target/arm/mve/intrinsics/uqshll.c| 14 +-- .../gcc.target/arm/mve/intrinsics/urshr.c | 4 +- .../gcc.target/arm/mve/intrinsics/urshrl.c| 4 +- .../arm/mve/intrinsics/vadciq_m_s32.c | 8 +--- .../arm/mve/intrinsics/vadciq_m_u32.c | 8 +--- .../arm/mve/intrinsics/vadciq_s32.c | 8 +--- .../arm/mve/intrinsics/vadciq_u32.c | 8 +--- .../arm/mve/intrinsics/vadcq_m_s32.c | 8 +--- .../arm/mve/intrinsics/vadcq_m_u32.c | 8 +--- .../gcc.target/arm/mve/intrinsics/vadcq_s32.c | 8 +--- .../gcc.target/arm/mve/intrinsics/vadcq_u32.c | 8 +--- .../arm/mve/intrinsics/vsbciq_m_s32.c | 8 +--- .../arm/mve/intrinsics/vsbciq_m_u32.c | 8 +--- .../arm/mve/intrinsics/vsbciq_s32.c | 8 +--- .../arm/mve/intrinsics/vsbciq_u32.c | 8 +--- .../arm/mve/intrinsics/vsbcq_m_s32.c | 8 +--- .../arm/mve/intrinsics/vsbcq_m_u32.c | 8 +--- .../gcc.target/arm/mve/intrinsics/vsbcq_s32.c | 8 +--- .../gcc.target/arm/mve/intrinsics/vsbcq_u32.c | 8 +--- .../gcc.target/arm/mve/mve_const_shifts.c | 41 +++ 23 files changed, 81 insertions(+), 128 deletions(-) create mode 100644 gcc/testsuite/gcc.target/arm/mve/mve_const_shifts.c diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c index 94e3f42fd33..734375d58c0 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c @@ -12,7 +12,7 @@ extern "C" { /* **foo: ** ... -** srshr (?:ip|fp|r[0-9]+), #shift(?:@.*|) +** srshr (?:ip|fp|r[0-9]+), #1(?:@.*|) ** ... */ int32_t diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c index 65f28ccbfde..a91943c38a0 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c @@ -12,7 +12,7 @@ extern "C" { /* **foo: ** ... -** srshrl (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #shift(?: @.*|) +** srshrl (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #1(?: @.*|) ** ... */ int64_t diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c index b23c9d97ba6..462531cad54 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c @@ -12,7 +12,7 @@ extern "C" { /* **foo: ** ... -** uqshl (?:ip|fp|r[0-9]+), #shift(?:@.*|) +** uqshl (?:ip|fp|r[0-9]+), #1(?:@.*|) ** ...
[committed gcc12 backport] arm testsuite: Remove reduntant tests
Following Andrea's overhaul of the MVE testsuite, these tests are now reduntant, as equivalent checks have been added to the each intrinsic's .c test. gcc/testsuite/ChangeLog: * gcc.target/arm/mve/intrinsics/mve_fp_vaddq_n.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vaddq_m.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vaddq_n.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vddupq_m_n_u16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vddupq_m_n_u32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vddupq_m_n_u8.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vddupq_n_u16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vddupq_n_u32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vddupq_n_u8.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vddupq_x_n_u16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vddupq_x_n_u32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vddupq_x_n_u8.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vdwdupq_x_n_u16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vdwdupq_x_n_u32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vdwdupq_x_n_u8.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vidupq_m_n_u16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vidupq_m_n_u32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vidupq_m_n_u8.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vidupq_n_u16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vidupq_n_u32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vidupq_n_u8.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vidupq_x_n_u16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vidupq_x_n_u32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vidupq_x_n_u8.c: Removed. * gcc.target/arm/mve/intrinsics/mve_viwdupq_x_n_u16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_viwdupq_x_n_u32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_viwdupq_x_n_u8.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_offset_s64.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_offset_u64.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_offset_z_s64.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_offset_z_u64.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_shifted_offset_s64.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_shifted_offset_u64.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_shifted_offset_z_s64.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_shifted_offset_z_u64.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_f16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_s16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_s32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_u16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_u32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_z_f16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_z_s16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_z_s32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_z_u16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_z_u32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_f16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_s16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_s32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_u16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_u32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_z_f16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_z_s16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_z_s32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_z_u16.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_z_u32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_f32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_s32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_u32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_z_f32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_z_s32.c: Removed. * gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_z_u32.c: Removed. *
[committed gcc12 backport] arm testsuite: XFAIL or relax registers in some tests [PR109697]
Hi all, This is a simple testsuite tidy-up patch, addressing to types of errors: * The vcmp vector-scalar tests failing due to the compiler's preference of vector-vector comparisons, over vector-scalar comparisons. This is due to the lack of cost model for MVE and the compiler not knowing that the RTL vec_duplicate is free in those instructions. For now, we simply XFAIL these checks. * The tests for pr108177 had strict usage of q0 and r0 registers, meaning that they would FAIL with -mfloat-abi=softf. The register checks have now been relaxed. A couple of these run-tests also had incosistent use of integer MVE with floating point vectors, so I've now changed these to use FP MVE. gcc/testsuite/ChangeLog: PR target/109697 * gcc.target/arm/mve/intrinsics/vcmpcsq_n_u16.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpcsq_n_u32.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpcsq_n_u8.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpeqq_n_u16.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpeqq_n_u32.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpeqq_n_u8.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmphiq_n_u16.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmphiq_n_u32.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmphiq_n_u8.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpneq_n_u16.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpneq_n_u32.c: XFAIL check. * gcc.target/arm/mve/intrinsics/vcmpneq_n_u8.c: XFAIL check. * gcc.target/arm/mve/pr108177-1.c: Relax registers. * gcc.target/arm/mve/pr108177-10.c: Relax registers. * gcc.target/arm/mve/pr108177-11.c: Relax registers. * gcc.target/arm/mve/pr108177-12.c: Relax registers. * gcc.target/arm/mve/pr108177-13.c: Relax registers. * gcc.target/arm/mve/pr108177-13-run.c: use mve_fp * gcc.target/arm/mve/pr108177-14.c: Relax registers. * gcc.target/arm/mve/pr108177-14-run.c: use mve_fp * gcc.target/arm/mve/pr108177-2.c: Relax registers. * gcc.target/arm/mve/pr108177-3.c: Relax registers. * gcc.target/arm/mve/pr108177-4.c: Relax registers. * gcc.target/arm/mve/pr108177-5.c: Relax registers. * gcc.target/arm/mve/pr108177-6.c: Relax registers. * gcc.target/arm/mve/pr108177-7.c: Relax registers. * gcc.target/arm/mve/pr108177-8.c: Relax registers. * gcc.target/arm/mve/pr108177-9.c: Relax registers. --- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u16.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u32.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u8.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u16.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u32.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u8.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u16.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u32.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u8.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u16.c | 2 +- gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u32.c | 2 +-
[committed gcc12 backport] arm: Stop vadcq, vsbcq intrinsics from overwriting the FPSCR NZ flags
Hi all, We noticed that calls to the vadcq and vsbcq intrinsics, both of which use __builtin_arm_set_fpscr_nzcvqc to set the Carry flag in the FPSCR, would produce the following code: ``` < r2 is the *carry input > vmrsr3, FPSCR_nzcvqc bic r3, r3, #536870912 orr r3, r3, r2, lsl #29 vmsrFPSCR_nzcvqc, r3 ``` when the MVE ACLE instead gives a different instruction sequence of: ``` < Rt is the *carry input > VMRS Rs,FPSCR_nzcvqc BFI Rs,Rt,#29,#1 VMSR FPSCR_nzcvqc,Rs ``` the bic + orr pair is slower and it's also wrong, because, if the *carry input is greater than 1, then we risk overwriting the top two bits of the FPSCR register (the N and Z flags). This turned out to be a problem in the header file and the solution was to simply add a `& 1x0u` to the `*carry` input: then the compiler knows that we only care about the lowest bit and can optimise to a BFI. Ok for trunk? Thanks, Stam Markianos-Wright gcc/ChangeLog: * config/arm/arm_mve.h (__arm_vadcq_s32): Fix arithmetic. (__arm_vadcq_u32): Likewise. (__arm_vadcq_m_s32): Likewise. (__arm_vadcq_m_u32): Likewise. (__arm_vsbcq_s32): Likewise. (__arm_vsbcq_u32): Likewise. (__arm_vsbcq_m_s32): Likewise. (__arm_vsbcq_m_u32): Likewise. * config/arm/mve.md (get_fpscr_nzcvqc): Make unspec_volatile. gcc/testsuite/ChangeLog: * gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c: New. (cherry picked from commit f1417d051be094ffbce228e11951f3e12e8fca1c) --- gcc/config/arm/arm_mve.h | 16 ++--- gcc/config/arm/mve.md | 2 +- .../arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c | 67 +++ 3 files changed, 76 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h index 82ceec2bbfc..6bf1794d2ff 100644 --- a/gcc/config/arm/arm_mve.h +++ b/gcc/config/arm/arm_mve.h @@ -16055,7 +16055,7 @@ __extension__ extern __inline int32x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vadcq_s32 (int32x4_t __a, int32x4_t __b, unsigned * __carry) { - __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & ~0x2000u) | (*__carry << 29)); + __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & ~0x2000u) | ((*__carry & 0x1u) << 29)); int32x4_t __res = __builtin_mve_vadcq_sv4si (__a, __b); *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u; return __res; @@ -16065,7 +16065,7 @@ __extension__ extern __inline uint32x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vadcq_u32 (uint32x4_t __a, uint32x4_t __b, unsigned * __carry) { - __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & ~0x2000u) | (*__carry << 29)); + __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & ~0x2000u) | ((*__carry & 0x1u) << 29)); uint32x4_t __res = __builtin_mve_vadcq_uv4si (__a, __b); *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u; return __res; @@ -16075,7 +16075,7 @@ __extension__ extern __inline int32x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vadcq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, unsigned * __carry, mve_pred16_t __p) { - __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & ~0x2000u) | (*__carry << 29)); + __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & ~0x2000u) | ((*__carry & 0x1u) << 29)); int32x4_t __res = __builtin_mve_vadcq_m_sv4si (__inactive, __a, __b, __p); *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u; return __res; @@ -16085,7 +16085,7 @@ __extension__ extern __inline uint32x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vadcq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, unsigned * __carry, mve_pred16_t __p) { - __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & ~0x2000u) | (*__carry << 29)); + __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & ~0x2000u) | ((*__carry & 0x1u) << 29)); uint32x4_t __res = __builtin_mve_vadcq_m_uv4si (__inactive, __a, __b, __p); *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u; return __res; @@ -16131,7 +16131,7 @@ __extension__ extern __inline int32x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vsbcq_s32 (int32x4_t __a, int32x4_t __b, unsigned * __carry) { - __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & ~0x2000u) | (*__carry << 29)); + __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & ~0x2000u) | ((*__carry &a
[committed gcc12 backport] [arm] complete vmsr/vmrs blank and case adjustments
From: Alexandre Oliva Back in September last year, some of the vmsr and vmrs patterns had an extraneous blank removed, and the case of register names lowered, but another instance remained, and so did a testcase. for gcc/ChangeLog * config/arm/vfp.md (*thumb2_movsi_vfp): Drop blank after tab after vmsr and vmrs, and lower the case of P0. for gcc/testsuite/ChangeLog * gcc.target/arm/acle/cde-mve-full-assembly.c: Drop blank after tab after vmsr, and lower the case of P0. --- gcc/config/arm/vfp.md | 4 +- .../arm/acle/cde-mve-full-assembly.c | 264 +- 2 files changed, 134 insertions(+), 134 deletions(-) diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md index 932e4b7447e..7a430ef8d36 100644 --- a/gcc/config/arm/vfp.md +++ b/gcc/config/arm/vfp.md @@ -312,9 +312,9 @@ (define_insn "*thumb2_movsi_vfp" case 12: case 13: return output_move_vfp (operands); case 14: - return \"vmsr\\t P0, %1\"; + return \"vmsr\\tp0, %1\"; case 15: - return \"vmrs\\t %0, P0\"; + return \"vmrs\\t%0, p0\"; case 16: return \"mcr\\tp10, 7, %1, cr1, cr0, 0\\t @SET_FPSCR\"; case 17: diff --git a/gcc/testsuite/gcc.target/arm/acle/cde-mve-full-assembly.c b/gcc/testsuite/gcc.target/arm/acle/cde-mve-full-assembly.c index 501cc84da10..e3e7f7ef3e5 100644 --- a/gcc/testsuite/gcc.target/arm/acle/cde-mve-full-assembly.c +++ b/gcc/testsuite/gcc.target/arm/acle/cde-mve-full-assembly.c @@ -567,80 +567,80 @@ contain back references). */ /* ** test_cde_vcx1q_mfloat16x8_tintint: -** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64 d1, \.L[0-9]*\+8|vmsr P0, r2 @ movhi) -** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64 d1, \.L[0-9]*\+8|vmsr P0, r2 @ movhi) +** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64 d1, \.L[0-9]*\+8|vmsr p0, r2 @ movhi) +** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64 d1, \.L[0-9]*\+8|vmsr p0, r2 @ movhi) ** vpst ** vcx1t p0, q0, #32 ** bx lr */ /* ** test_cde_vcx1q_mfloat32x4_tintint: -** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64 d1, \.L[0-9]*\+8|vmsr P0, r2 @ movhi) -** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64 d1, \.L[0-9]*\+8|vmsr P0, r2 @ movhi) +** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64 d1, \.L[0-9]*\+8|vmsr p0, r2 @ movhi) +** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64 d1, \.L[0-9]*\+8|vmsr p0, r2 @ movhi) ** vpst ** vcx1t p0, q0, #32 ** bx lr */ /* ** test_cde_vcx1q_muint8x16_tintint: -** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64 d1, \.L[0-9]*\+8|vmsr P0, r2 @ movhi) -** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64 d1, \.L[0-9]*\+8|vmsr P0, r2 @ movhi) +** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64 d1, \.L[0-9]*\+8|vmsr p0, r2 @ movhi) +** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64 d1, \.L[0-9]*\+8|vmsr p0, r2 @ movhi) ** vpst ** vcx1t p0, q0, #32 ** bx lr */ /* ** test_cde_vcx1q_muint16x8_tintint: -** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64 d1, \.L[0-9]*\+8|vmsr P0, r2 @ movhi) -** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64 d1, \.L[0-9]*\+8|vmsr P0, r2 @ movhi) +** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64 d1, \.L[0-9]*\+8|vmsr p0, r2 @ movhi) +** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64 d1, \.L[0-9]*\+8|vmsr p0, r2 @ movhi) ** vpst ** vcx1t p0, q0, #32 ** bx lr */ /* ** test_cde_vcx1q_muint32x4_tintint: -** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64 d1, \.L[0-9]*\+8|vmsr P0, r2 @ movhi) -** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64 d1, \.L[0-9]*\+8|vmsr P0, r2 @ movhi) +** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64 d1, \.L[0-9]*\+8|vmsr p0, r2 @ movhi) +** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64 d1, \.L[0-9]*\+8|vmsr p0, r2 @ movhi) ** vpst ** vcx1t p0, q0, #32 ** bx lr */ /* ** test_cde_vcx1q_muint64x2_tintint: -** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64 d1, \.L[0-9]*\+8|vmsr P0, r2 @ movhi) -** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64 d1, \.L[0-9]*\+8|vmsr P0, r2 @ movhi) +** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64 d1, \.L[0-9]*\+8|vmsr p0, r2 @ movhi) +** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64 d1, \.L[0-9]*\+8|vmsr p0, r2 @ movhi) ** vpst ** vcx1t p0, q0, #32 ** bx lr */ /* ** test_cde_vcx1q_mint8x16_tintint: -** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64 d1, \.L[0-9]*\+8|vmsr P0, r2 @ movhi) -** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64 d1, \.L[0-9]*\+8|vmsr P0, r2 @ movhi) +** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64 d1, \.L[0-9]*\+8|vmsr p0, r2 @ movhi) +** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64 d1, \.L[0-9]*\+8|vmsr p0, r2 @
[committed gcc12 backport] arm: Add vorrq_n overloading into vorrq _Generic
We found this as part of the wider testsuite updates. The applicable tests are authored by Andrea earlier in this patch series Ok for trunk? gcc/ChangeLog: * config/arm/arm_mve.h (__arm_vorrq): Add _n variant. --- gcc/config/arm/arm_mve.h | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h index 6bf1794d2ff..39b3446617d 100644 --- a/gcc/config/arm/arm_mve.h +++ b/gcc/config/arm/arm_mve.h @@ -35852,6 +35852,10 @@ extern void *__ARM_undef; int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vorrq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \ int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vorrq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \ int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vorrq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)), \ + int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vorrq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \ + int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vorrq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \ + int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vorrq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \ + int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vorrq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \ int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vorrq_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t)), \ int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vorrq_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t)));}) @@ -38637,7 +38641,11 @@ extern void *__ARM_undef; int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vorrq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \ int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vorrq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \ int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vorrq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \ - int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vorrq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));}) + int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vorrq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)), \ + int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vorrq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \ + int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vorrq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \ + int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vorrq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \ + int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vorrq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)));}) #define __arm_vornq(p0,p1) ({ __typeof(p0) __p0 = (p0); \ __typeof(p1) __p1 = (p1); \ -- 2.25.1
[committed gcc12 backport] arm: Fix vstrwq* backend + testsuite
From: Andrea Corallo Hi all, this patch fixes the vstrwq* MVE instrinsics failing to emit the correct sequence of instruction due to a missing predicate. Also the immediate range is fixed to be multiples of 2 up between [-252, 252]. Best Regards Andrea gcc/ChangeLog: * config/arm/constraints.md (mve_vldrd_immediate): Move it to predicates.md. (Ri): Move constraint definition from predicates.md. (Rl): Define new constraint. * config/arm/mve.md (mve_vstrwq_scatter_base_wb_p_v4si): Add missing constraint. (mve_vstrwq_scatter_base_wb_p_fv4sf): Add missing Up constraint for op 1, use mve_vstrw_immediate predicate and Rl constraint for op 2. Fix asm output spacing. (mve_vstrdq_scatter_base_wb_p_v2di): Add missing constraint. * config/arm/predicates.md (Ri) Move constraint to constraints.md (mve_vldrd_immediate): Move it from constraints.md. (mve_vstrw_immediate): New predicate. gcc/testsuite/ChangeLog: * gcc.target/arm/mve/intrinsics/vstrwq_f32.c: Use check-function-bodies instead of scan-assembler checks. Use extern "C" for C++ testing. * gcc.target/arm/mve/intrinsics/vstrwq_p_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_p_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_p_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_p_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_p_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_p_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_p_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_p_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_p_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_u32.c: Likewise. --- gcc/config/arm/constraints.md | 20 -- gcc/config/arm/mve.md | 10 ++--- gcc/config/arm/predicates.md | 14 +++ .../arm/mve/intrinsics/vstrwq_f32.c | 32 --- .../arm/mve/intrinsics/vstrwq_p_f32.c | 40 --- .../arm/mve/intrinsics/vstrwq_p_s32.c | 40 --- .../arm/mve/intrinsics/vstrwq_p_u32.c | 40 --- .../arm/mve/intrinsics/vstrwq_s32.c | 32 --- .../mve/intrinsics/vstrwq_scatter_base_f32.c | 28 +++-- .../intrinsics/vstrwq_scatter_base_p_f32.c| 36 +++-- .../intrinsics/vstrwq_scatter_base_p_s32.c| 36 +++-- .../intrinsics/vstrwq_scatter_base_p_u32.c| 36 +++-- .../mve/intrinsics/vstrwq_scatter_base_s32.c | 28 +++-- .../mve/intrinsics/vstrwq_scatter_base_u32.c | 28 +++-- .../intrinsics/vstrwq_scatter_base_wb_f32.c | 32 --- .../intrinsics/vstrwq_scatter_base_wb_p_f32.c | 40 --- .../intrinsics/vstrwq_scatter_base_wb_p_s32.c | 40 --- .../intrinsics/vstrwq_scatter_base_wb_p_u32.c | 40 --- .../intrinsics/vstrwq_scatter_base_wb_s32.c | 32 --- .../intrinsics/vstrwq_scatter_base_wb_u32.c | 32 --- .../intrinsics/vstrwq_scatter_offset_f32.c| 32 --- .../intrinsics/vstrwq_scatter_offset_p_f32.c | 40 ---
[PATCH 2/2 v2] arm: Add support for MVE Tail-Predicated Low Overhead Loops
- Respin of the below patch - In this 2/2 patch, from v1 to v2 I have: * Removed the modification the interface of the doloop_end target-insn (so I no longer need to touch any other target backends) * Added more modes to `arm_get_required_vpr_reg` to make it flexible between searching: all operands/only input arguments/only outputs. Also added helpers: `arm_get_required_vpr_reg_ret_val` `arm_get_required_vpr_reg_param` * Added support for the use of other VPR predicate values within a dlstp/letp loop, as long as they don't originate from the vctp-generated VPR value. Also changed `arm_mve_get_loop_unique_vctp` to the simpler `arm_mve_get_loop_vctp` since now we can support other VCTP insns within the loop. * Added support for loops of the form: int num_of_iters = (num_of_elem + num_of_lanes - 1) / num_of_lanes for (i = 0; i < num_of_iters; i++) { p = vctp (num_of_elem) n -= num_of_lanes; } to be tranformed into dlstp/letp loops. * Changed the VCTP look-ahead for SIGN_EXTEND and SUBREG insns to use df def/use chains instead of `next_nonnote_nondebug_insn_bb`. * Added support for using unpredicated (but predicable) insns within the dlstp/letp loop. These need to meet some specific conditions, because they _will_ become implicitly tail predicated by the dlstp/letp transformation. * Added a df chain check to any other instructions to make sure that they don't USE the VCTP-generated VPR value. * Added testing of all these various edge cases. Original email with updated Changelog at the end: Hi all, This is the 2/2 patch that contains the functional changes needed for MVE Tail Predicated Low Overhead Loops. See my previous email for a general introduction of MVE LOLs. This support is added through the already existing loop-doloop mechanisms that are used for non-MVE dls/le looping. Changes are: 1) Relax the loop-doloop mechanism in the mid-end to allow for decrement numbers other that -1 and for `count` to be an rtx containing the number of elements to be processed, rather than an expression for calculating the number of iterations. 2) Add a `allow_elementwise_doloop` target hook. This allows the target backend to manipulate the iteration count as it needs: in our case to change it from a pre-calculation of the number of iterations to the number of elements to be processed. 3) The doloop_end target-insn now had an additional parameter: the `count` (note: this is before it gets modified to just be the number of elements), so that the decrement value is extracted from that parameter. And many things in the backend to implement the above optimisation: 4) Appropriate changes to the define_expand of doloop_end and new patterns for dlstp and letp. 5) `arm_attempt_dlstp_transform`: (called from the define_expand of doloop_end) this function checks for the loop's suitability for dlstp/letp transformation and then implements it, if possible. 6) `arm_mve_get_loop_unique_vctp`: A function that loops through the loop contents and returns the vctp VPR-genereting operation within the loop, if it is unique and there is exclusively one vctp within the loop. 7) A couple of utility functions: `arm_mve_get_vctp_lanes` to map from vctp unspecs to number of lanes, and `arm_get_required_vpr_reg` to check an insn to see if it requires the VPR or not. No regressions on arm-none-eabi with various targets and on aarch64-none-elf. Thoughts on getting this into trunk? Thank you, Stam Markianos-Wright gcc/ChangeLog: * config/arm/arm-protos.h (arm_attempt_dlstp_transform): New. * config/arm/arm.cc (TARGET_ALLOW_ELEMENTWISE_DOLOOP): New. (arm_mve_get_vctp_lanes): New. (arm_get_required_vpr_reg): New. (arm_get_required_vpr_reg_ret_val): New. (arm_get_required_vpr_reg_param): New. (arm_mve_get_loop_vctp): New. (arm_attempt_dlstp_transform): New. (arm_allow_elementwise_doloop): New. * config/arm/iterators.md (DLSTP): New. (mode1): Add DLSTP mappings. * config/arm/mve.md (*predicated_doloop_end_internal): New. (dlstp_insn): New. * config/arm/thumb2.md (doloop_end): Update for MVE LOLs. * config/arm/unspecs.md: New unspecs. * tm.texi: Document new hook. * tm.texi.in: Likewise. * loop-doloop.cc (doloop_condition_get): Relax conditions. (doloop_optimize): Add support for elementwise LoLs. * target.def (allow_elementwise_doloop): New hook. * targhooks.cc (default_allow_elementwise_doloop): New. * targhooks.h (default_allow_elementwise_doloop): New. gcc/testsuite/ChangeLog: * gcc.target/arm/lob.h: Update framework. * gcc.target/arm/lob1.c: Likewise. * gcc.target/arm/lob6.c: Likewise. * gcc.target/arm/dlstp-int16x8.c: New test. * gcc.target/arm/dlstp-int32x4.c: New test. * gcc.target/arm/dl
[PING][PATCH] arm: Split up MVE _Generic associations to prevent type clashes [PR107515]
Hi all, With these previous patches: https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606586.html https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606587.html we enabled the MVE overloaded _Generic associations to handle more scalar types, however at PR 107515 we found a new regression that wasn't detected in our testing: With glibc's `posix/types.h`: ``` typedef signed int __int32_t; ... typedef __int32_t int32_t; ``` We would get a `error: '_Generic' specifies two compatible types` from `__ARM_mve_coerce3` because of `type: param`, when `type` is `int` and `int32_t: param` both being the same under the hood. The same did not happen with Newlib's header `sys/_stdint.h`: ``` typedef long int __int32_t; ... typedef __int32_t int32_t ; ``` which worked fine, because it uses `long int`. The same could feasibly happen in `__ARM_mve_coerce2` between `__fp16` and `float16_t`. The solution here is to break the _Generic down, so that the similar types don't appear at the same level, as is done in `__ARM_mve_typeid`. Ok for trunk? Thanks, Stam Markianos-Wright gcc/ChangeLog: PR target/96795 PR target/107515 * config/arm/arm_mve.h (__ARM_mve_coerce2): Split types. (__ARM_mve_coerce3): Likewise. gcc/testsuite/ChangeLog: PR target/96795 PR target/107515 * gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-fp.c: New test. * gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c: New test. === Inline Ctrl+C, Ctrl+V or patch === diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h index 09167ec118ed3310c5077145e119196f29d83cac..70003653db65736fcfd019e83d9f18153be650dc 100644 --- a/gcc/config/arm/arm_mve.h +++ b/gcc/config/arm/arm_mve.h @@ -35659,9 +35659,9 @@ extern void *__ARM_undef; #define __ARM_mve_coerce1(param, type) \ _Generic(param, type: param, const type: param, default: *(type *)__ARM_undef) #define __ARM_mve_coerce2(param, type) \ -_Generic(param, type: param, float16_t: param, float32_t: param, default: *(type *)__ARM_undef) +_Generic(param, type: param, __fp16: param, default: _Generic (param, _Float16: param, float16_t: param, float32_t: param, default: *(type *)__ARM_undef)) #define __ARM_mve_coerce3(param, type) \ -_Generic(param, type: param, int8_t: param, int16_t: param, int32_t: param, int64_t: param, uint8_t: param, uint16_t: param, uint32_t: param, uint64_t: param, default: *(type *)__ARM_undef) +_Generic(param, type: param, default: _Generic (param, int8_t: param, int16_t: param, int32_t: param, int64_t: param, uint8_t: param, uint16_t: param, uint32_t: param, uint64_t: param, default: *(type *)__ARM_undef)) #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point. */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-fp.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-fp.c new file mode 100644 index ..427dcacb5ff59b53d5eab1f1582ef6460da3f2f3 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-fp.c @@ -0,0 +1,65 @@ +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ +/* { dg-additional-options "-O2 -Wno-pedantic -Wno-long-long" } */ +#include "arm_mve.h" + +float f1; +double f2; +float16_t f3; +float32_t f4; +__fp16 f5; +_Float16 f6; + +int i1; +short i2; +long i3; +long long i4; +int8_t i5; +int16_t i6; +int32_t i7; +int64_t i8; + +const int ci1; +const short ci2; +const long ci3; +const long long ci4; +const int8_t ci5; +const int16_t ci6; +const int32_t ci7; +const int64_t ci8; + +float16x8_t floatvec; +int16x8_t intvec; + +void test(void) +{ +/* Test a few different supported ways of passing an int value. The +intrinsic vmulq was chosen arbitrarily, but it is representative of +all intrinsics that take a non-const scalar value. */ +intvec = vmulq(intvec, 2); +intvec = vmulq(intvec, (int32_t) 2); +intvec = vmulq(intvec, (short) 2); +intvec = vmulq(intvec, i1); +intvec = vmulq(intvec, i2); +intvec = vmulq(intvec, i3); +intvec = vmulq(intvec, i4); +intvec = vmulq(intvec, i5); +intvec = vmulq(intvec, i6); +intvec = vmulq(intvec, i7); +intvec = vmulq(intvec, i8); + +/* Test a few different supported ways of passing a float value. */ +floatvec = vmulq(floatvec, 0.5); +floatvec = vmulq(floatvec, 0.5f); +floatvec = vmulq(floatvec, (__fp16) 0.5); +floatvec = vmulq(floatvec, f1); +floatvec = vmulq(floatvec, f2); +floatvec = vmulq(floatvec, f3); +floatvec = vmulq(floatvec, f4); +floatvec = vmulq(floatvec, f5); +floatvec = vmulq(floatvec, f6); +floatvec = vmulq(floatvec, 0.15f16); +floatvec = vmulq(floatvec, (_Float16) 0.15); +} + +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ \ No newline at end
Re: [PATCH] Fix memory constraint on MVE v[ld/st][2/4] instructions [PR107714]
On 12/12/2022 13:42, Kyrylo Tkachov wrote: Hi Stam, -Original Message- From: Stam Markianos-Wright Sent: Friday, December 9, 2022 1:32 PM To: gcc-patches@gcc.gnu.org Cc: Kyrylo Tkachov ; Richard Earnshaw ; Ramana Radhakrishnan ; ni...@redhat.com Subject: [PATCH] Fix memory constraint on MVE v[ld/st][2/4] instructions [PR107714] Hi all, In the M-Class Arm-ARM: https://developer.arm.com/documentation/ddi0553/bu/?lang=en these MVE instructions only have '!' writeback variant and at: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107714 we found that the Um constraint would also allow through a register offset writeback, resulting in an assembler error. Here I have added a new constraint and predicate for these instructions, which (uniquely, AFAICT), only support a `!` writeback increment by the data size (inside the compiler this is a POST_INC). No regressions in arm-none-eabi with MVE and MVE.FP. Ok for trunk, and backport to GCC11 and GCC12 (testing pending)? Thanks, Stam gcc/ChangeLog: PR target/107714 * config/arm/arm-protos.h (mve_struct_mem_operand): New protoype. * config/arm/arm.cc (mve_struct_mem_operand): New function. * config/arm/constraints.md (Ug): New constraint. * config/arm/mve.md (mve_vst4q): Change constraint. (mve_vst2q): Likewise. (mve_vld4q): Likewise. (mve_vld2q): Likewise. * config/arm/predicates.md (mve_struct_operand): New predicate. gcc/testsuite/ChangeLog: PR target/107714 * gcc.target/arm/mve/intrinsics/vldst24q_reg_offset.c: New test. diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md index e5a36d29c7135943b9bb5ea396f70e2e4beb1e4a..8908b7f5b15ce150685868e78e75280bf32053f1 100644 --- a/gcc/config/arm/constraints.md +++ b/gcc/config/arm/constraints.md @@ -474,6 +474,12 @@ (and (match_code "mem") (match_test "TARGET_32BIT && arm_coproc_mem_operand (op, FALSE)"))) +(define_memory_constraint "Ug" + "@internal + In Thumb-2 state a valid MVE struct load/store address." + (and (match_code "mem") + (match_test "TARGET_HAVE_MVE && mve_struct_mem_operand (op)"))) + I think you can define the constraints in terms of the new mve_struct_operand predicate directly (see how we define the "Ua" constraint, for example). Ok if that works (and testing passes of course). Done as discussed and re-tested on all branches. Pushed as: 4269a6567eb991e6838f40bda5be9e3a7972530c to trunk 25edc76f2afba0b4eaf22174d42de042a6969dbe to gcc-12 08842ad274f5e2630994f7c6e70b2d31768107ea to gcc-11 Thank you! Stam Thanks, Kyrill
[PATCH] Fix memory constraint on MVE v[ld/st][2/4] instructions [PR107714]
Hi all, In the M-Class Arm-ARM: https://developer.arm.com/documentation/ddi0553/bu/?lang=en these MVE instructions only have '!' writeback variant and at: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107714 we found that the Um constraint would also allow through a register offset writeback, resulting in an assembler error. Here I have added a new constraint and predicate for these instructions, which (uniquely, AFAICT), only support a `!` writeback increment by the data size (inside the compiler this is a POST_INC). No regressions in arm-none-eabi with MVE and MVE.FP. Ok for trunk, and backport to GCC11 and GCC12 (testing pending)? Thanks, Stam gcc/ChangeLog: PR target/107714 * config/arm/arm-protos.h (mve_struct_mem_operand): New protoype. * config/arm/arm.cc (mve_struct_mem_operand): New function. * config/arm/constraints.md (Ug): New constraint. * config/arm/mve.md (mve_vst4q): Change constraint. (mve_vst2q): Likewise. (mve_vld4q): Likewise. (mve_vld2q): Likewise. * config/arm/predicates.md (mve_struct_operand): New predicate. gcc/testsuite/ChangeLog: PR target/107714 * gcc.target/arm/mve/intrinsics/vldst24q_reg_offset.c: New test.diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 550272facd12e60a49bf8a3b20f811cc13765b3a..8ea38118b05769bd6fcb1d22d902a50979cfd953 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -122,6 +122,7 @@ extern int arm_coproc_mem_operand_wb (rtx, int); extern int neon_vector_mem_operand (rtx, int, bool); extern int mve_vector_mem_operand (machine_mode, rtx, bool); extern int neon_struct_mem_operand (rtx); +extern int mve_struct_mem_operand (rtx); extern rtx *neon_vcmla_lane_prepare_operands (rtx *); diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc index b587561eebea921bdc68016922d37948e2870ce2..31f2a7b9d4688dde69d1435e24cf885e8544be71 100644 --- a/gcc/config/arm/arm.cc +++ b/gcc/config/arm/arm.cc @@ -13737,6 +13737,24 @@ neon_vector_mem_operand (rtx op, int type, bool strict) return FALSE; } +/* Return TRUE if OP is a mem suitable for loading/storing an MVE struct + type. */ +int +mve_struct_mem_operand (rtx op) +{ + rtx ind = XEXP (op, 0); + + /* Match: (mem (reg)). */ + if (REG_P (ind)) +return arm_address_register_rtx_p (ind, 0); + + /* Allow only post-increment by the mode size. */ + if (GET_CODE (ind) == POST_INC) +return arm_address_register_rtx_p (XEXP (ind, 0), 0); + + return FALSE; +} + /* Return TRUE if OP is a mem suitable for loading/storing a Neon struct type. */ int diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md index e5a36d29c7135943b9bb5ea396f70e2e4beb1e4a..8908b7f5b15ce150685868e78e75280bf32053f1 100644 --- a/gcc/config/arm/constraints.md +++ b/gcc/config/arm/constraints.md @@ -474,6 +474,12 @@ (and (match_code "mem") (match_test "TARGET_32BIT && arm_coproc_mem_operand (op, FALSE)"))) +(define_memory_constraint "Ug" + "@internal + In Thumb-2 state a valid MVE struct load/store address." + (and (match_code "mem") + (match_test "TARGET_HAVE_MVE && mve_struct_mem_operand (op)"))) + (define_memory_constraint "Uj" "@internal In ARM/Thumb-2 state a VFP load/store address that supports writeback diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md index b5e6da4b1335818a3e8815de59850e845a2d0400..847bc032afa2c3977c05725562a14940beb282d4 100644 --- a/gcc/config/arm/mve.md +++ b/gcc/config/arm/mve.md @@ -99,7 +99,7 @@ ;; [vst4q]) ;; (define_insn "mve_vst4q" - [(set (match_operand:XI 0 "neon_struct_operand" "=Um") + [(set (match_operand:XI 0 "mve_struct_operand" "=Ug") (unspec:XI [(match_operand:XI 1 "s_register_operand" "w") (unspec:MVE_VLD_ST [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] VST4Q)) @@ -9959,7 +9959,7 @@ ;; [vst2q]) ;; (define_insn "mve_vst2q" - [(set (match_operand:OI 0 "neon_struct_operand" "=Um") + [(set (match_operand:OI 0 "mve_struct_operand" "=Ug") (unspec:OI [(match_operand:OI 1 "s_register_operand" "w") (unspec:MVE_VLD_ST [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] VST2Q)) @@ -9988,7 +9988,7 @@ ;; (define_insn "mve_vld2q" [(set (match_operand:OI 0 "s_register_operand" "=w") - (unspec:OI [(match_operand:OI 1 "neon_struct_operand" "Um") + (unspec:OI [(match_operand:OI 1 "mve_struct_operand" "Ug") (unspec:MVE_VLD_ST [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] VLD2Q)) ] @@ -10016,7 +10016,7 @@ ;; (define_insn "mve_vld4q" [(set (match_operand:XI 0 "s_register_operand" "=w") - (unspec:XI [(match_operand:XI 1 "neon_struct_operand" "Um") + (unspec:XI [(match_operand:XI 1 "mve_struct_operand" "Ug") (unspec:MVE_VLD_ST [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] VLD4Q)) ] diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md index aab5a91ad4ddc6a7a02611d05442d6de63841a7c..67f2fdb4f8f607ceb50871e1bc17dbdb9b987c2c 100644 ---
[PATCH] arm: Split up MVE _Generic associations to prevent type clashes [PR107515]
Hi all, With these previous patches: https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606586.html https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606587.html we enabled the MVE overloaded _Generic associations to handle more scalar types, however at PR 107515 we found a new regression that wasn't detected in our testing: With glibc's `posix/types.h`: ``` typedef signed int __int32_t; ... typedef __int32_t int32_t; ``` We would get a `error: '_Generic' specifies two compatible types` from `__ARM_mve_coerce3` because of `type: param`, when `type` is `int` and `int32_t: param` both being the same under the hood. The same did not happen with Newlib's header `sys/_stdint.h`: ``` typedef long int __int32_t; ... typedef __int32_t int32_t ; ``` which worked fine, because it uses `long int`. The same could feasibly happen in `__ARM_mve_coerce2` between `__fp16` and `float16_t`. The solution here is to break the _Generic down, so that the similar types don't appear at the same level, as is done in `__ARM_mve_typeid`. Ok for trunk? Thanks, Stam Markianos-Wright gcc/ChangeLog: PR target/96795 PR target/107515 * config/arm/arm_mve.h (__ARM_mve_coerce2): Split types. (__ARM_mve_coerce3): Likewise. gcc/testsuite/ChangeLog: PR target/96795 PR target/107515 * gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-fp.c: New test. * gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c: New test. === Inline Ctrl+C, Ctrl+V or patch === diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h index 09167ec118ed3310c5077145e119196f29d83cac..70003653db65736fcfd019e83d9f18153be650dc 100644 --- a/gcc/config/arm/arm_mve.h +++ b/gcc/config/arm/arm_mve.h @@ -35659,9 +35659,9 @@ extern void *__ARM_undef; #define __ARM_mve_coerce1(param, type) \ _Generic(param, type: param, const type: param, default: *(type *)__ARM_undef) #define __ARM_mve_coerce2(param, type) \ - _Generic(param, type: param, float16_t: param, float32_t: param, default: *(type *)__ARM_undef) + _Generic(param, type: param, __fp16: param, default: _Generic (param, _Float16: param, float16_t: param, float32_t: param, default: *(type *)__ARM_undef)) #define __ARM_mve_coerce3(param, type) \ - _Generic(param, type: param, int8_t: param, int16_t: param, int32_t: param, int64_t: param, uint8_t: param, uint16_t: param, uint32_t: param, uint64_t: param, default: *(type *)__ARM_undef) + _Generic(param, type: param, default: _Generic (param, int8_t: param, int16_t: param, int32_t: param, int64_t: param, uint8_t: param, uint16_t: param, uint32_t: param, uint64_t: param, default: *(type *)__ARM_undef)) #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point. */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-fp.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-fp.c new file mode 100644 index ..427dcacb5ff59b53d5eab1f1582ef6460da3f2f3 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-fp.c @@ -0,0 +1,65 @@ +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ +/* { dg-additional-options "-O2 -Wno-pedantic -Wno-long-long" } */ +#include "arm_mve.h" + +float f1; +double f2; +float16_t f3; +float32_t f4; +__fp16 f5; +_Float16 f6; + +int i1; +short i2; +long i3; +long long i4; +int8_t i5; +int16_t i6; +int32_t i7; +int64_t i8; + +const int ci1; +const short ci2; +const long ci3; +const long long ci4; +const int8_t ci5; +const int16_t ci6; +const int32_t ci7; +const int64_t ci8; + +float16x8_t floatvec; +int16x8_t intvec; + +void test(void) +{ + /* Test a few different supported ways of passing an int value. The + intrinsic vmulq was chosen arbitrarily, but it is representative of + all intrinsics that take a non-const scalar value. */ + intvec = vmulq(intvec, 2); + intvec = vmulq(intvec, (int32_t) 2); + intvec = vmulq(intvec, (short) 2); + intvec = vmulq(intvec, i1); + intvec = vmulq(intvec, i2); + intvec = vmulq(intvec, i3); + intvec = vmulq(intvec, i4); + intvec = vmulq(intvec, i5); + intvec = vmulq(intvec, i6); + intvec = vmulq(intvec, i7); + intvec = vmulq(intvec, i8); + + /* Test a few different supported ways of passing a float value. */ + floatvec = vmulq(floatvec, 0.5); + floatvec = vmulq(floatvec, 0.5f); + floatvec = vmulq(floatvec, (__fp16) 0.5); + floatvec = vmulq(floatvec, f1); + floatvec = vmulq(floatvec, f2); + floatvec = vmulq(floatvec, f3); + floatvec = vmulq(floatvec, f4); + floatvec = vmulq(floatvec, f5); + floatvec = vmulq(floatvec, f6); + floatvec = vmulq(floatvec, 0.15f16); + floatvec = vmulq(floatvec, (_Float16) 0.15); +} + +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ \ No
[PATCH 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops
On 11/15/22 15:51, Andre Vieira (lists) wrote: On 11/11/2022 17:40, Stam Markianos-Wright via Gcc-patches wrote: Hi all, This is the 2/2 patch that contains the functional changes needed for MVE Tail Predicated Low Overhead Loops. See my previous email for a general introduction of MVE LOLs. This support is added through the already existing loop-doloop mechanisms that are used for non-MVE dls/le looping. Changes are: 1) Relax the loop-doloop mechanism in the mid-end to allow for decrement numbers other that -1 and for `count` to be an rtx containing the number of elements to be processed, rather than an expression for calculating the number of iterations. 2) Add a `allow_elementwise_doloop` target hook. This allows the target backend to manipulate the iteration count as it needs: in our case to change it from a pre-calculation of the number of iterations to the number of elements to be processed. 3) The doloop_end target-insn now had an additional parameter: the `count` (note: this is before it gets modified to just be the number of elements), so that the decrement value is extracted from that parameter. And many things in the backend to implement the above optimisation: 4) Appropriate changes to the define_expand of doloop_end and new patterns for dlstp and letp. 5) `arm_attempt_dlstp_transform`: (called from the define_expand of doloop_end) this function checks for the loop's suitability for dlstp/letp transformation and then implements it, if possible. 6) `arm_mve_get_loop_unique_vctp`: A function that loops through the loop contents and returns the vctp VPR-genereting operation within the loop, if it is unique and there is exclusively one vctp within the loop. 7) A couple of utility functions: `arm_mve_get_vctp_lanes` to map from vctp unspecs to number of lanes, and `arm_get_required_vpr_reg` to check an insn to see if it requires the VPR or not. No regressions on arm-none-eabi with various targets and on aarch64-none-elf. Thoughts on getting this into trunk? Thank you, Stam Markianos-Wright gcc/ChangeLog: * config/aarch64/aarch64.md: Add extra doloop_end arg. * config/arm/arm-protos.h (arm_attempt_dlstp_transform): New. * config/arm/arm.cc (TARGET_ALLOW_ELEMENTWISE_DOLOOP): New. (arm_mve_get_vctp_lanes): New. (arm_get_required_vpr_reg): New. (arm_mve_get_loop_unique_vctp): New. (arm_attempt_dlstp_transform): New. (arm_allow_elementwise_doloop): New. * config/arm/iterators.md: * config/arm/mve.md (*predicated_doloop_end_internal): New. (dlstp_insn): New. * config/arm/thumb2.md (doloop_end): Update for MVE LOLs. * config/arm/unspecs.md: New unspecs. * config/ia64/ia64.md: Add extra doloop_end arg. * config/pru/pru.md: Add extra doloop_end arg. * config/rs6000/rs6000.md: Add extra doloop_end arg. * config/s390/s390.md: Add extra doloop_end arg. * config/v850/v850.md: Add extra doloop_end arg. * doc/tm.texi: Document new hook. * doc/tm.texi.in: Likewise. * loop-doloop.cc (doloop_condition_get): Relax conditions. (doloop_optimize): Add support for elementwise LoLs. * target-insns.def (doloop_end): Add extra arg. * target.def (allow_elementwise_doloop): New hook. * targhooks.cc (default_allow_elementwise_doloop): New. * targhooks.h (default_allow_elementwise_doloop): New. gcc/testsuite/ChangeLog: * gcc.target/arm/lob.h: Update framework. * gcc.target/arm/lob1.c: Likewise. * gcc.target/arm/lob6.c: Likewise. * gcc.target/arm/dlstp-int16x8.c: New test. * gcc.target/arm/dlstp-int32x4.c: New test. * gcc.target/arm/dlstp-int64x2.c: New test. * gcc.target/arm/dlstp-int8x16.c: New test. ### Inline copy of patch ### diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index f2e3d905dbbeb2949f2947f5cfd68208c94c9272..7a6d24a80060b4a704a481ccd1a32d96e7b0f369 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -7366,7 +7366,8 @@ ;; knows what to generate. (define_expand "doloop_end" [(use (match_operand 0 "" "")) ; loop pseudo - (use (match_operand 1 "" ""))] ; label + (use (match_operand 1 "" "")) ; label + (use (match_operand 2 "" ""))] ; decrement constant "optimize > 0 && flag_modulo_sched" { rtx s0; diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 550272facd12e60a49bf8a3b20f811cc13765b3a..7684620f0f4d161dd9e9ad2d70308021ec3d3d34 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -63,7 +63,7 @@ extern void arm_decompose_di_binop (rtx, rtx, rtx *, rtx *, rtx *, rtx *); extern bool arm_q_bit_acce
[PATCH 15/35] arm: Explicitly specify other float types for _Generic overloading [PR107515]
On 11/20/22 22:49, Ramana Radhakrishnan wrote: On Fri, Nov 18, 2022 at 4:59 PM Kyrylo Tkachov via Gcc-patches wrote: -Original Message- From: Andrea Corallo Sent: Thursday, November 17, 2022 4:38 PM To: gcc-patches@gcc.gnu.org Cc: Kyrylo Tkachov ; Richard Earnshaw ; Stam Markianos-Wright Subject: [PATCH 15/35] arm: Explicitly specify other float types for _Generic overloading [PR107515] From: Stam Markianos-Wright This patch adds explicit references to other float types to __ARM_mve_typeid in arm_mve.h. Resolves PR 107515: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107515 gcc/ChangeLog: PR 107515 * config/arm/arm_mve.h (__ARM_mve_typeid): Add float types. Argh, I'm looking forward to when we move away from this _Generic business, but for now ok. The ChangeLog should say "PR target/107515" for the git hook to recognize it IIRC. and the PR is against 11.x - is there a plan to back port this and dependent patches to relevant branches ? Hi Ramana! Assuming maintainer approval, we do hope to backport. And yes, it would have to be the whole patch series, so that we carry over all the improved testing, as well (and we'll have to run it ofc). Does that sound Ok? Thank you, Stam Ramana Thanks, Kyrill --- gcc/config/arm/arm_mve.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h index fd1876b57a0..f6b42dc3fab 100644 --- a/gcc/config/arm/arm_mve.h +++ b/gcc/config/arm/arm_mve.h @@ -35582,6 +35582,9 @@ enum { short: __ARM_mve_type_int_n, \ int: __ARM_mve_type_int_n, \ long: __ARM_mve_type_int_n, \ + _Float16: __ARM_mve_type_fp_n, \ + __fp16: __ARM_mve_type_fp_n, \ + float: __ARM_mve_type_fp_n, \ double: __ARM_mve_type_fp_n, \ long long: __ARM_mve_type_int_n, \ unsigned char: __ARM_mve_type_int_n, \ -- 2.25.1
Re: [PATCH 15/35] arm: Explicitly specify other float types for _Generic overloading [PR107515]
On 11/18/22 16:58, Kyrylo Tkachov wrote: -Original Message- From: Andrea Corallo Sent: Thursday, November 17, 2022 4:38 PM To: gcc-patches@gcc.gnu.org Cc: Kyrylo Tkachov ; Richard Earnshaw ; Stam Markianos-Wright Subject: [PATCH 15/35] arm: Explicitly specify other float types for _Generic overloading [PR107515] From: Stam Markianos-Wright This patch adds explicit references to other float types to __ARM_mve_typeid in arm_mve.h. Resolves PR 107515: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107515 gcc/ChangeLog: PR 107515 * config/arm/arm_mve.h (__ARM_mve_typeid): Add float types. Argh, I'm looking forward to when we move away from this _Generic business, but for now ok. Oh we all are ;) The ChangeLog should say "PR target/107515" for the git hook to recognize it IIRC. Agh, thanks for spotting this! Will change and push it with the rest of the patch series when ready/ Thank you, Stam Thanks, Kyrill --- gcc/config/arm/arm_mve.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h index fd1876b57a0..f6b42dc3fab 100644 --- a/gcc/config/arm/arm_mve.h +++ b/gcc/config/arm/arm_mve.h @@ -35582,6 +35582,9 @@ enum { short: __ARM_mve_type_int_n, \ int: __ARM_mve_type_int_n, \ long: __ARM_mve_type_int_n, \ + _Float16: __ARM_mve_type_fp_n, \ + __fp16: __ARM_mve_type_fp_n, \ + float: __ARM_mve_type_fp_n, \ double: __ARM_mve_type_fp_n, \ long long: __ARM_mve_type_int_n, \ unsigned char: __ARM_mve_type_int_n, \ -- 2.25.1
Re: [PATCH 13/35] arm: further fix overloading of MVE vaddq[_m]_n intrinsic
On 11/18/22 16:49, Kyrylo Tkachov wrote: -Original Message- From: Andrea Corallo Sent: Thursday, November 17, 2022 4:38 PM To: gcc-patches@gcc.gnu.org Cc: Kyrylo Tkachov ; Richard Earnshaw ; Stam Markianos-Wright Subject: [PATCH 13/35] arm: further fix overloading of MVE vaddq[_m]_n intrinsic From: Stam Markianos-Wright It was observed that in tests `vaddq_m_n_[s/u][8/16/32].c`, the _Generic resolution would fall back to the `__ARM_undef` failure state. This is a regression since `dc39db873670bea8d8e655444387ceaa53a01a79` and `6bd4ce64eb48a72eca300cb52773e6101d646004`, but it previously wasn't identified, because the tests were not checking for this kind of failure. The above commits changed the definitions of the intrinsics from using `[u]int[8/16/32]_t` types for the scalar argument to using `int`. This allowed `int` to be supported in user code through the overloaded `#defines`, but seems to have broken the `[u]int[8/16/32]_t` types The solution implemented by this patch is to explicitly use a new _Generic mapping from all the `[u]int[8/16/32]_t` types for int. With this change, both `int` and `[u]int[8/16/32]_t` parameters are supported from user code and are handled by the overloading mechanism correctly. gcc/ChangeLog: * config/arm/arm_mve.h (__arm_vaddq_m_n_s8): Change types. (__arm_vaddq_m_n_s32): Likewise. (__arm_vaddq_m_n_s16): Likewise. (__arm_vaddq_m_n_u8): Likewise. (__arm_vaddq_m_n_u32): Likewise. (__arm_vaddq_m_n_u16): Likewise. (__arm_vaddq_m): Fix Overloading. (__ARM_mve_coerce3): New. Ok. Wasn't there a PR in Bugzilla about this that we can cite in the commit message? Thanks, Kyrill Thanks for the review! Ah yes, there was this one: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96795 which was closed last time around. It does make sense to add it, though, so we'll do that. Thanks! --- gcc/config/arm/arm_mve.h | 78 1 file changed, 40 insertions(+), 38 deletions(-) diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h index 684f997520f..951dc25374b 100644 --- a/gcc/config/arm/arm_mve.h +++ b/gcc/config/arm/arm_mve.h @@ -9675,42 +9675,42 @@ __arm_vabdq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pr __extension__ extern __inline int8x16_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vaddq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int __b, mve_pred16_t __p) +__arm_vaddq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int8_t __b, mve_pred16_t __p) { return __builtin_mve_vaddq_m_n_sv16qi (__inactive, __a, __b, __p); } __extension__ extern __inline int32x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vaddq_m_n_s32 (int32x4_t __inactive, int32x4_t __a, int __b, mve_pred16_t __p) +__arm_vaddq_m_n_s32 (int32x4_t __inactive, int32x4_t __a, int32_t __b, mve_pred16_t __p) { return __builtin_mve_vaddq_m_n_sv4si (__inactive, __a, __b, __p); } __extension__ extern __inline int16x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vaddq_m_n_s16 (int16x8_t __inactive, int16x8_t __a, int __b, mve_pred16_t __p) +__arm_vaddq_m_n_s16 (int16x8_t __inactive, int16x8_t __a, int16_t __b, mve_pred16_t __p) { return __builtin_mve_vaddq_m_n_sv8hi (__inactive, __a, __b, __p); } __extension__ extern __inline uint8x16_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vaddq_m_n_u8 (uint8x16_t __inactive, uint8x16_t __a, int __b, mve_pred16_t __p) +__arm_vaddq_m_n_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b, mve_pred16_t __p) { return __builtin_mve_vaddq_m_n_uv16qi (__inactive, __a, __b, __p); } __extension__ extern __inline uint32x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vaddq_m_n_u32 (uint32x4_t __inactive, uint32x4_t __a, int __b, mve_pred16_t __p) +__arm_vaddq_m_n_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32_t __b, mve_pred16_t __p) { return __builtin_mve_vaddq_m_n_uv4si (__inactive, __a, __b, __p); } __extension__ extern __inline uint16x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vaddq_m_n_u16 (uint16x8_t __inactive, uint16x8_t __a, int __b, mve_pred16_t __p) +__arm_vaddq_m_n_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16_t __b, mve_pred16_t __p) { return __builtin_mve_vaddq_m_n_uv8hi (__inactive, __a, __b, __p); } @@ -26417,42 +26417,42 @@ __arm_vabdq_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16 __extension__ extern __inline int8x16_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vaddq_m (int8x16_t __inactive, int8x16_t __a, int __b, mve_pred16_t __p) +__arm_vaddq_m (int8x16_t __inactive, int8x16_t __a, int8_t __b, mve_pred16_t __p) { return __arm_vaddq_m_n_s8 (__inactive, __a, __b, __p
[PATCH 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops
Hi all, This is the 2/2 patch that contains the functional changes needed for MVE Tail Predicated Low Overhead Loops. See my previous email for a general introduction of MVE LOLs. This support is added through the already existing loop-doloop mechanisms that are used for non-MVE dls/le looping. Changes are: 1) Relax the loop-doloop mechanism in the mid-end to allow for decrement numbers other that -1 and for `count` to be an rtx containing the number of elements to be processed, rather than an expression for calculating the number of iterations. 2) Add a `allow_elementwise_doloop` target hook. This allows the target backend to manipulate the iteration count as it needs: in our case to change it from a pre-calculation of the number of iterations to the number of elements to be processed. 3) The doloop_end target-insn now had an additional parameter: the `count` (note: this is before it gets modified to just be the number of elements), so that the decrement value is extracted from that parameter. And many things in the backend to implement the above optimisation: 4) Appropriate changes to the define_expand of doloop_end and new patterns for dlstp and letp. 5) `arm_attempt_dlstp_transform`: (called from the define_expand of doloop_end) this function checks for the loop's suitability for dlstp/letp transformation and then implements it, if possible. 6) `arm_mve_get_loop_unique_vctp`: A function that loops through the loop contents and returns the vctp VPR-genereting operation within the loop, if it is unique and there is exclusively one vctp within the loop. 7) A couple of utility functions: `arm_mve_get_vctp_lanes` to map from vctp unspecs to number of lanes, and `arm_get_required_vpr_reg` to check an insn to see if it requires the VPR or not. No regressions on arm-none-eabi with various targets and on aarch64-none-elf. Thoughts on getting this into trunk? Thank you, Stam Markianos-Wright gcc/ChangeLog: * config/aarch64/aarch64.md: Add extra doloop_end arg. * config/arm/arm-protos.h (arm_attempt_dlstp_transform): New. * config/arm/arm.cc (TARGET_ALLOW_ELEMENTWISE_DOLOOP): New. (arm_mve_get_vctp_lanes): New. (arm_get_required_vpr_reg): New. (arm_mve_get_loop_unique_vctp): New. (arm_attempt_dlstp_transform): New. (arm_allow_elementwise_doloop): New. * config/arm/iterators.md: * config/arm/mve.md (*predicated_doloop_end_internal): New. (dlstp_insn): New. * config/arm/thumb2.md (doloop_end): Update for MVE LOLs. * config/arm/unspecs.md: New unspecs. * config/ia64/ia64.md: Add extra doloop_end arg. * config/pru/pru.md: Add extra doloop_end arg. * config/rs6000/rs6000.md: Add extra doloop_end arg. * config/s390/s390.md: Add extra doloop_end arg. * config/v850/v850.md: Add extra doloop_end arg. * doc/tm.texi: Document new hook. * doc/tm.texi.in: Likewise. * loop-doloop.cc (doloop_condition_get): Relax conditions. (doloop_optimize): Add support for elementwise LoLs. * target-insns.def (doloop_end): Add extra arg. * target.def (allow_elementwise_doloop): New hook. * targhooks.cc (default_allow_elementwise_doloop): New. * targhooks.h (default_allow_elementwise_doloop): New. gcc/testsuite/ChangeLog: * gcc.target/arm/lob.h: Update framework. * gcc.target/arm/lob1.c: Likewise. * gcc.target/arm/lob6.c: Likewise. * gcc.target/arm/dlstp-int16x8.c: New test. * gcc.target/arm/dlstp-int32x4.c: New test. * gcc.target/arm/dlstp-int64x2.c: New test. * gcc.target/arm/dlstp-int8x16.c: New test. ### Inline copy of patch ### diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index f2e3d905dbbeb2949f2947f5cfd68208c94c9272..7a6d24a80060b4a704a481ccd1a32d96e7b0f369 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -7366,7 +7366,8 @@ ;; knows what to generate. (define_expand "doloop_end" [(use (match_operand 0 "" "")) ; loop pseudo - (use (match_operand 1 "" ""))] ; label + (use (match_operand 1 "" "")) ; label + (use (match_operand 2 "" ""))] ; decrement constant "optimize > 0 && flag_modulo_sched" { rtx s0; diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 550272facd12e60a49bf8a3b20f811cc13765b3a..7684620f0f4d161dd9e9ad2d70308021ec3d3d34 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -63,7 +63,7 @@ extern void arm_decompose_di_binop (rtx, rtx, rtx *, rtx *, rtx *, rtx *); extern bool arm_q_bit_access (void); extern bool arm_ge_bits_access (void); extern bool arm_target_insn_ok_for_lob (rtx); - +extern rtx arm_attempt_dlstp_transform (r
[PATCH] slp tree vectorizer: Re-calculate vectorization factor in the case of invalid choices [PR96974]
On 29/03/2021 10:20, Richard Biener wrote: On Fri, 26 Mar 2021, Richard Sandiford wrote: Richard Biener writes: On Wed, 24 Mar 2021, Stam Markianos-Wright wrote: Hi all, This patch resolves bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974 This is achieved by forcing a re-calculation of *stmt_vectype_out if an incompatible combination of TYPE_VECTOR_SUBPARTS is detected, but with an extra introduced max_nunits ceiling. I am not 100% sure if this is the best way to go about fixing this, because this is my first look at the vectorizer and I lack knowledge of the wider context, so do let me know if you see a better way to do this! I have added the previously ICE-ing reproducer as a new test. This is compiled as "g++ -Ofast -march=armv8.2-a+sve -fdisable-tree-fre4" for GCC11 and "g++ -Ofast -march=armv8.2-a+sve" for GCC10. (the non-fdisable-tree-fre4 version has gone latent on GCC11) Bootstrapped and reg-tested on aarch64-linux-gnu. Also reg-tested on aarch64-none-elf. I don't think this is going to work well given uses will expect a vector type that's consistent here. I think giving up is for the moment the best choice, thus replacing the assert with vectorization failure. In the end we shouldn't require those nunits vectypes to be separately computed - we compute the vector type of the defs anyway and in case they're invariant the vectorizable_* function either can deal with the type mix or not anyway. I agree this area needs simplification, but I think the direction of travel should be to make the assert valid. I agree this is probably the pragmatic fix for GCC 11 and earlier though. The issue is that we compute a vector type for a use that may differ from what we'd compute for it in the context of its definition (or in the context of another use). Any such "local" decision is likely flawed and I'd rather simplify further doing the only decision on the definition side - if there's a disconnect between the number of lanes (and thus altering the VF won't help) then we have to give up anyway. Richard. Thank you both for the further info! Would it be fair to close the initial PR regarding the ICE (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974) and then open a second one at a lower priority level to address these further improvements? Also Christophe has kindly found out that the test FAILs in ILP32, so it would be great to get that one in asap, too! https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567431.html Cheers, Stam
Re: [PATCH] slp tree vectorizer: Re-calculate vectorization factor in the case of invalid choices [PR96974]
On 24/03/2021 13:46, Richard Biener wrote: On Wed, 24 Mar 2021, Stam Markianos-Wright wrote: Hi all, This patch resolves bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974 This is achieved by forcing a re-calculation of *stmt_vectype_out if an incompatible combination of TYPE_VECTOR_SUBPARTS is detected, but with an extra introduced max_nunits ceiling. I am not 100% sure if this is the best way to go about fixing this, because this is my first look at the vectorizer and I lack knowledge of the wider context, so do let me know if you see a better way to do this! I have added the previously ICE-ing reproducer as a new test. This is compiled as "g++ -Ofast -march=armv8.2-a+sve -fdisable-tree-fre4" for GCC11 and "g++ -Ofast -march=armv8.2-a+sve" for GCC10. (the non-fdisable-tree-fre4 version has gone latent on GCC11) Bootstrapped and reg-tested on aarch64-linux-gnu. Also reg-tested on aarch64-none-elf. I don't think this is going to work well given uses will expect a vector type that's consistent here. I think giving up is for the moment the best choice, thus replacing the assert with vectorization failure. In the end we shouldn't require those nunits vectypes to be separately computed - we compute the vector type of the defs anyway and in case they're invariant the vectorizable_* function either can deal with the type mix or not anyway. Yea good point! I agree and after all we are very close to releases now ;) I've attached the patch that just do the graceful vectorization failure and add a slightly better test now. Re-tested as previously with no issues ofc. gcc-10.patch is what I'd backport to GCC10 (the only difference between that and gcc-11.patch is that one compiles with `-fdisable-tree-fre4` and the other without it). Ok to push this to the GCC11 branch and backport to the GCC10 branch? Cheers :D Stam That said, the goal should be to simplify things here. Richard. gcc/ChangeLog: * tree-vect-stmts.c (get_vectype_for_scalar_type): Add new parameter to core function and add new function overload. (vect_get_vector_types_for_stmt): Add re-calculation logic. gcc/testsuite/ChangeLog: * g++.target/aarch64/sve/pr96974.C: New test. diff --git a/gcc/testsuite/g++.target/aarch64/sve/pr96974.C b/gcc/testsuite/g++.target/aarch64/sve/pr96974.C new file mode 100644 index 000..363241d18df --- /dev/null +++ b/gcc/testsuite/g++.target/aarch64/sve/pr96974.C @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -march=armv8.2-a+sve -fdisable-tree-fre4 -fdump-tree-slp-details" } */ + +float a; +int +b () +{ return __builtin_lrintf(a); } + +struct c { + float d; +c() { + for (int e = 0; e < 9; e++) + coeffs[e] = d ? b() : 0; +} +int coeffs[10]; +} f; + +/* { dg-final { scan-tree-dump "Not vectorized: Incompatible number of vector subparts between" "slp1" } } */ diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index d791d3a4720..4c01e82ff39 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -12148,8 +12148,12 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info, } } - gcc_assert (multiple_p (TYPE_VECTOR_SUBPARTS (nunits_vectype), - TYPE_VECTOR_SUBPARTS (*stmt_vectype_out))); + if (!multiple_p (TYPE_VECTOR_SUBPARTS (nunits_vectype), + TYPE_VECTOR_SUBPARTS (*stmt_vectype_out))) +return opt_result::failure_at (stmt, + "Not vectorized: Incompatible number " + "of vector subparts between %T and %T\n", + nunits_vectype, *stmt_vectype_out); if (dump_enabled_p ()) { diff --git a/gcc/testsuite/g++.target/aarch64/sve/pr96974.C b/gcc/testsuite/g++.target/aarch64/sve/pr96974.C new file mode 100644 index 000..2023c55e3e6 --- /dev/null +++ b/gcc/testsuite/g++.target/aarch64/sve/pr96974.C @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -march=armv8.2-a+sve -fdump-tree-slp-details" } */ + +float a; +int +b () +{ return __builtin_lrintf(a); } + +struct c { + float d; +c() { + for (int e = 0; e < 9; e++) + coeffs[e] = d ? b() : 0; +} +int coeffs[10]; +} f; + +/* { dg-final { scan-tree-dump "Not vectorized: Incompatible number of vector subparts between" "slp1" } } */ diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index c2d1f39fe0f..6418edb5204 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -12249,8 +12249,12 @@ vect_get_vector_types_for_stmt (stmt_vec_info stmt_info, } } - gcc_assert (multiple_p (TYPE_VECTOR_SUBPARTS (nunits_vectype), - TYPE_VECTOR_SUBPARTS (*stmt_vectype_out))); + if (!multiple_p (TYPE_VECTOR_SUBPARTS (nunits_vectype), + TYPE_VECTOR_SUBPARTS (*stmt_vectype_out))) +return opt_result::failure_at (stmt, + "Not vectorized: Incompatible number " + "of vector subparts between %T and %T\n", + nunits_vectype, *stmt_vectype_out); if (dump_enabled_p ()) {
[PATCH] slp tree vectorizer: Re-calculate vectorization factor in the case of invalid choices [PR96974]
Hi all, This patch resolves bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974 This is achieved by forcing a re-calculation of *stmt_vectype_out if an incompatible combination of TYPE_VECTOR_SUBPARTS is detected, but with an extra introduced max_nunits ceiling. I am not 100% sure if this is the best way to go about fixing this, because this is my first look at the vectorizer and I lack knowledge of the wider context, so do let me know if you see a better way to do this! I have added the previously ICE-ing reproducer as a new test. This is compiled as "g++ -Ofast -march=armv8.2-a+sve -fdisable-tree-fre4" for GCC11 and "g++ -Ofast -march=armv8.2-a+sve" for GCC10. (the non-fdisable-tree-fre4 version has gone latent on GCC11) Bootstrapped and reg-tested on aarch64-linux-gnu. Also reg-tested on aarch64-none-elf. gcc/ChangeLog: * tree-vect-stmts.c (get_vectype_for_scalar_type): Add new parameter to core function and add new function overload. (vect_get_vector_types_for_stmt): Add re-calculation logic. gcc/testsuite/ChangeLog: * g++.target/aarch64/sve/pr96974.C: New test. diff --git a/gcc/testsuite/g++.target/aarch64/sve/pr96974.C b/gcc/testsuite/g++.target/aarch64/sve/pr96974.C new file mode 100644 index ..2f6ebd6ce3dd8626f5e666edba77d2c925739b7d --- /dev/null +++ b/gcc/testsuite/g++.target/aarch64/sve/pr96974.C @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -march=armv8.2-a+sve -fdisable-tree-fre4" } */ + +float a; +int +b () +{ return __builtin_lrintf(a); } + +struct c { + float d; +c() { + for (int e = 0; e < 9; e++) + coeffs[e] = d ? b() : 0; +} +int coeffs[10]; +} f; diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index c2d1f39fe0f4bbc90ffa079cb6a8fcf87b76b3af..f8d3eac38718e18bf957b85109cccbc03e21c041 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -11342,7 +11342,7 @@ get_related_vectype_for_scalar_type (machine_mode prevailing_mode, tree get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type, - unsigned int group_size) + unsigned int group_size, unsigned int max_nunits) { /* For BB vectorization, we should always have a group size once we've constructed the SLP tree; the only valid uses of zero GROUP_SIZEs @@ -11375,13 +11375,16 @@ get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type, fail (in the latter case because GROUP_SIZE is too small for the target), but it's possible that a target could have a hole between supported vector types. + There is also the option to artificially pass a max_nunits, + which is smaller than GROUP_SIZE, if the use of GROUP_SIZE + would result in an incompatible mode for the target. If GROUP_SIZE is not a power of 2, this has the effect of trying the largest power of 2 that fits within the group, even though the group is not a multiple of that vector size. The BB vectorizer will then try to carve up the group into smaller pieces. */ - unsigned int nunits = 1 << floor_log2 (group_size); + unsigned int nunits = 1 << floor_log2 (max_nunits); do { vectype = get_related_vectype_for_scalar_type (vinfo->vector_mode, @@ -11394,6 +11397,14 @@ get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type, return vectype; } +tree +get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type, + unsigned int group_size) +{ + return get_vectype_for_scalar_type (vinfo, scalar_type, + group_size, group_size); +} + /* Return the vector type corresponding to SCALAR_TYPE as supported by the target. NODE, if nonnull, is the SLP tree node that will use the returned vector type. */ @@ -12172,6 +12183,8 @@ vect_get_vector_types_for_stmt (stmt_vec_info stmt_info, tree vectype; tree scalar_type = NULL_TREE; + tree scalar_type_orig = NULL_TREE; + if (group_size == 0 && STMT_VINFO_VECTYPE (stmt_info)) { vectype = STMT_VINFO_VECTYPE (stmt_info); @@ -12210,6 +12223,7 @@ vect_get_vector_types_for_stmt (stmt_vec_info stmt_info, "get vectype for scalar type: %T\n", scalar_type); } vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size); + scalar_type_orig = scalar_type; if (!vectype) return opt_result::failure_at (stmt, "not vectorized:" @@ -12249,6 +12263,36 @@ vect_get_vector_types_for_stmt (stmt_vec_info stmt_info, } } + /* In rare cases with different types and sizes we may reach an invalid + combination where nunits_vectype has fewer TYPE_VECTOR_SUBPARTS than + *stmt_vectype_out. In that case attempt to re-calculate + *stmt_vectype_out with an imposed max taken from nunits_vectype. */ + unsigned int max_nunits; + if (known_lt (TYPE_VECTOR_SUBPARTS (nunits_vectype), + TYPE_VECTOR_SUBPARTS (*stmt_vectype_out))) +{ + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, +
Re: [committed obvious][arm] Add test that was missing from old commit [PR91816]
On 26/11/2020 09:01, Christophe Lyon wrote: On Wed, 25 Nov 2020 at 14:24, Stam Markianos-Wright via Gcc-patches wrote: Hi all, A while back I submitted GCC10 commit: 44f77a6dea2f312ee1743f3dde465c1b8453ee13 for PR91816. Turns out I was an idiot and forgot to include the test in the actual git commit, even my entire patch had been approved. Tested that the test still passes on a cross arm-none-eabi and also in a Cortex A-15 bootstrap with no regressions. Submitting this as Obvious to gcc-11 and backporting to gcc-10. Hi, This new test fails when forcing -mcpu=cortex-m3/4/5/7/33: FAIL: gcc.target/arm/pr91816.c scan-assembler-times beq\\t.L[0-9] 2 FAIL: gcc.target/arm/pr91816.c scan-assembler-times beq\\t.Lbcond[0-9] 1 FAIL: gcc.target/arm/pr91816.c scan-assembler-times bne\\t.L[0-9] 2 FAIL: gcc.target/arm/pr91816.c scan-assembler-times bne\\t.Lbcond[0-9] 1 I didn't check manually what is generated, can you have a look? Oh wow thank you for spotting this! It looks like the A class target that I had tested had a tendency to emit a movw/movt pair, whereas these M class targets would emit a single ldr. This resulted in an overall shorter jump for these targets that did not trigger the new far-branch code. The test passes after... doubling it's own size: #define HW3HW2 HW2 HW2 HW2 HW2 HW2 HW2 HW2 HW2 HW2 #define HW4HW3 HW3 HW3 HW3 HW3 HW3 HW3 HW3 HW3 HW3 #define HW5HW4 HW4 HW4 HW4 HW4 HW4 HW4 HW4 HW4 HW4 +#define HW6HW5 HW5 __attribute__((noinline,noclone)) void f1 (int a) { @@ -25,7 +26,7 @@ __attribute__((noinline,noclone)) void f2 (int a) __attribute__((noinline,noclone)) void f3 (int a) { - if (a) { HW5 } + if (a) { HW6 } } __attribute__((noinline,noclone)) void f4 (int a) @@ -41,7 +42,7 @@ __attribute__((noinline,noclone)) void f5 (int a) __attribute__((noinline,noclone)) void f6 (int a) { - if (a == 1) { HW5 } + if (a == 1) { HW6 } } But this does effectively double the compilation time of an already quite large test. Would that be ok? Overall this is the edge case testing that the compiler behaves correctly with a branch in huge compilation unit, so it would be nice to have test coverage of it on as many targets as possible... but also kinda rare. Hope this helps! Cheers, Stam Thanks, Christophe Thanks, Stam Markianos-Wright gcc/testsuite/ChangeLog: PR target/91816 * gcc.target/arm/pr91816.c: New test.
[backport gcc-8,9][arm] Thumb2 out of range conditional branch fix [PR91816]
Hi all, Now that I have pushed the entirety of this patch to gcc-10 and gcc-11, I would like to backport it to gcc-8 and gcc-9. PR link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91816 This patch had originally been approved here: https://gcc.gnu.org/legacy-ml/gcc-patches/2020-01/msg02010.html See the attached diffs that have been rebased and apply cleanly. Tested on a cross arm-none-eabi and also in a Cortex A-15 bootstrap with no regressions. Ok to backport? Thanks, Stam Markianos-Wright diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 9d0acde7a39..87e01e35221 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -553,4 +553,6 @@ void arm_parse_option_features (sbitmap, const cpu_arch_option *, void arm_initialize_isa (sbitmap, const enum isa_feature *); +const char * arm_gen_far_branch (rtx *, int, const char * , const char *); + #endif /* ! GCC_ARM_PROTOS_H */ diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index f990ca11bcb..eefe3d99548 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -31629,6 +31629,39 @@ arm_constant_alignment (const_tree exp, HOST_WIDE_INT align) return align; } +/* Generate code to enable conditional branches in functions over 1 MiB. + Parameters are: + operands: is the operands list of the asm insn (see arm_cond_branch or + arm_cond_branch_reversed). + pos_label: is an index into the operands array where operands[pos_label] is + the asm label of the final jump destination. + dest: is a string which is used to generate the asm label of the intermediate + destination + branch_format: is a string denoting the intermediate branch format, e.g. + "beq", "bne", etc. */ + +const char * +arm_gen_far_branch (rtx * operands, int pos_label, const char * dest, + const char * branch_format) +{ + rtx_code_label * tmp_label = gen_label_rtx (); + char label_buf[256]; + char buffer[128]; + ASM_GENERATE_INTERNAL_LABEL (label_buf, dest , \ + CODE_LABEL_NUMBER (tmp_label)); + const char *label_ptr = arm_strip_name_encoding (label_buf); + rtx dest_label = operands[pos_label]; + operands[pos_label] = tmp_label; + + snprintf (buffer, sizeof (buffer), "%s%s", branch_format , label_ptr); + output_asm_insn (buffer, operands); + + snprintf (buffer, sizeof (buffer), "b\t%%l0%d\n%s:", pos_label, label_ptr); + operands[pos_label] = dest_label; + output_asm_insn (buffer, operands); + return ""; +} + #if CHECKING_P namespace selftest { diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 6d6b37719e0..81c96658d95 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -7187,9 +7187,15 @@ ;; And for backward branches we have ;; (neg_range - neg_base_offs + pc_offs) = (neg_range - (-2 or -4) + 4). ;; +;; In 16-bit Thumb these ranges are: ;; For a 'b' pos_range = 2046, neg_range = -2048 giving (-2040->2048). ;; For a 'b' pos_range = 254, neg_range = -256 giving (-250 ->256). +;; In 32-bit Thumb these ranges are: +;; For a 'b' +/- 16MB is not checked for. +;; For a 'b' pos_range = 1048574, neg_range = -1048576 giving +;; (-1048568 -> 1048576). + (define_expand "cbranchsi4" [(set (pc) (if_then_else (match_operator 0 "expandable_comparison_operator" @@ -7444,23 +7450,50 @@ (label_ref (match_operand 0 "" "")) (pc)))] "TARGET_32BIT" - "* - if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2) + { +if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2) { arm_ccfsm_state += 2; - return \"\"; + return ""; } - return \"b%d1\\t%l0\"; - " +switch (get_attr_length (insn)) + { + case 2: /* Thumb2 16-bit b{cond}. */ + case 4: /* Thumb2 32-bit b{cond} or A32 b{cond}. */ + return "b%d1\t%l0"; + break; + + /* Thumb2 b{cond} out of range. Use 16-bit b{cond} and + unconditional branch b. */ + default: return arm_gen_far_branch (operands, 0, "Lbcond", "b%D1\t"); + } + } [(set_attr "conds" "use") (set_attr "type" "branch") (set (attr "length") - (if_then_else - (and (match_test "TARGET_THUMB2") - (and (ge (minus (match_dup 0) (pc)) (const_int -250)) -(le (minus (match_dup 0) (pc)) (const_int 256 - (const_int 2) - (const_int 4)))] +(if_then_else (match_test "!TARGET_THUMB2") + + ;;Target is not Thumb2, therefore is A32. Generate b{cond}. + (const_int 4) + + ;; Check if target is within 16-bit Thumb2 b{cond} range. + (if_then_else (and (ge (minus (match_dup 0) (pc)) (const_int -
[committed obvious][arm] Add test that was missing from old commit [PR91816]
Hi all, A while back I submitted GCC10 commit: 44f77a6dea2f312ee1743f3dde465c1b8453ee13 for PR91816. Turns out I was an idiot and forgot to include the test in the actual git commit, even my entire patch had been approved. Tested that the test still passes on a cross arm-none-eabi and also in a Cortex A-15 bootstrap with no regressions. Submitting this as Obvious to gcc-11 and backporting to gcc-10. Thanks, Stam Markianos-Wright gcc/testsuite/ChangeLog: PR target/91816 * gcc.target/arm/pr91816.c: New test. diff --git a/gcc/testsuite/gcc.target/arm/pr91816.c b/gcc/testsuite/gcc.target/arm/pr91816.c new file mode 100644 index 000..75b938a6aad --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr91816.c @@ -0,0 +1,63 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_thumb2_ok } */ +/* { dg-additional-options "-mthumb" } */ +/* { dg-timeout-factor 4.0 } */ + +int printf(const char *, ...); + +#define HW0 printf("Hello World!\n"); +#define HW1 HW0 HW0 HW0 HW0 HW0 HW0 HW0 HW0 HW0 HW0 +#define HW2 HW1 HW1 HW1 HW1 HW1 HW1 HW1 HW1 HW1 HW1 +#define HW3 HW2 HW2 HW2 HW2 HW2 HW2 HW2 HW2 HW2 HW2 +#define HW4 HW3 HW3 HW3 HW3 HW3 HW3 HW3 HW3 HW3 HW3 +#define HW5 HW4 HW4 HW4 HW4 HW4 HW4 HW4 HW4 HW4 HW4 + +__attribute__((noinline,noclone)) void f1 (int a) +{ + if (a) { HW0 } +} + +__attribute__((noinline,noclone)) void f2 (int a) +{ + if (a) { HW3 } +} + + +__attribute__((noinline,noclone)) void f3 (int a) +{ + if (a) { HW5 } +} + +__attribute__((noinline,noclone)) void f4 (int a) +{ + if (a == 1) { HW0 } +} + +__attribute__((noinline,noclone)) void f5 (int a) +{ + if (a == 1) { HW3 } +} + + +__attribute__((noinline,noclone)) void f6 (int a) +{ + if (a == 1) { HW5 } +} + + +int main(void) +{ + f1(0); + f2(0); + f3(0); + f4(0); + f5(0); + f6(0); + return 0; +} + + +/* { dg-final { scan-assembler-times "beq\\t.L\[0-9\]" 2 } } */ +/* { dg-final { scan-assembler-times "beq\\t.Lbcond\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-times "bne\\t.L\[0-9\]" 2 } } */ +/* { dg-final { scan-assembler-times "bne\\t.Lbcond\[0-9\]" 1 } } */
Re: [Pingx3][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, vdot - by element) for AArch32 AdvSIMD ARMv8.6 Extension
On 2/11/20 10:25 AM, Kyrill Tkachov wrote: Hi Stam, On 2/10/20 1:35 PM, Stam Markianos-Wright wrote: On 2/3/20 11:20 AM, Stam Markianos-Wright wrote: > > > On 1/27/20 3:54 PM, Stam Markianos-Wright wrote: >> >> On 1/16/20 4:05 PM, Stam Markianos-Wright wrote: >>> >>> >>> On 1/10/20 6:48 PM, Stam Markianos-Wright wrote: >>>> >>>> >>>> On 12/18/19 1:25 PM, Stam Markianos-Wright wrote: >>>>> >>>>> >>>>> On 12/13/19 10:22 AM, Stam Markianos-Wright wrote: >>>>>> Hi all, >>>>>> >>>>>> This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product >>>>>> operations (vector/by element) to the ARM back-end. >>>>>> >>>>>> These are: >>>>>> usdot (vector), dot (by element). >>>>>> >>>>>> The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and >>>>>> for ARM they remain optional as of ARMv8.6-a. >>>>>> >>>>>> The functions are declared in arm_neon.h, RTL patterns are defined to >>>>>> generate assembler and tests are added to verify and perform adequate checks. >>>>>> >>>>>> Regression testing on arm-none-eabi passed successfully. >>>>>> >>>>>> This patch depends on: >>>>>> >>>>>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html >>>>>> >>>>>> for ARM CLI updates, and on: >>>>>> >>>>>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html >>>>>> >>>>>> for testsuite effective_target update. >>>>>> >>>>>> Ok for trunk? >>>>> >>>> >>>> New diff addressing review comments from Aarch64 version of the patch. >>>> >>>> _Change of order of operands in RTL patterns. >>>> _Change tests to use check-function-bodies, compile with optimisation and >>>> check for exact registers. >>>> _Rename tests to remove "-compile-" in filename. >>>> >>> > .Ping! Ping :) Diff re-attached in this ping email is same as the one posted on 10/01 Thank you! Sorry for the delay. This is ok. No worries, thank you! Committed as r10-6575. Cheers, Stam Thanks, Kyrill > . >>> >>> Cheers, >>> Stam >>> >>>>>> >>>>>> >>>>>> ACLE documents are at https://developer.arm.com/docs/101028/latest >>>>>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest >>>>>> >>>>>> PS. I don't have commit rights, so if someone could commit on my behalf, >>>>>> that would be great :) >>>>>> >>>>>> >>>>>> gcc/ChangeLog: >>>>>> >>>>>> 2019-11-28 Stam Markianos-Wright >>>>>> >>>>>> * config/arm/arm-builtins.c (enum arm_type_qualifiers): >>>>>> (USTERNOP_QUALIFIERS): New define. >>>>>> (USMAC_LANE_QUADTUP_QUALIFIERS): New define. >>>>>> (SUMAC_LANE_QUADTUP_QUALIFIERS): New define. >>>>>> (arm_expand_builtin_args): >>>>>> Add case ARG_BUILTIN_LANE_QUADTUP_INDEX. >>>>>> (arm_expand_builtin_1): Add qualifier_lane_quadtup_index. >>>>>> * config/arm/arm_neon.h (vusdot_s32): New. >>>>>> (vusdot_lane_s32): New. >>>>>> (vusdotq_lane_s32): New. >>>>>> (vsudot_lane_s32): New. >>>>>> (vsudotq_lane_s32): New. >>>>>> * config/arm/arm_neon_builtins.def >>>>>> (usdot,usdot_lane,sudot_lane): New. >>>>>> * config/arm/iterators.md (DOTPROD_I8MM): New. >>>>>> (sup, opsuffix): Add . >>>>>> * config/arm/neon.md (neon_usdot, dot_lane: New. >>>>>> * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New. >>>>>> >>>>>> >>>>>> gcc/testsuite/ChangeLog: >>>>>> >>>>>> 2019-12-12 Stam Markianos-Wright >>>>>> >>>>>> * gcc.target/arm/simd/vdot-2-1.c: New test. >>>>>> * gcc.target/arm/simd/vdot-2-2.c: New test. >>>>>> * gcc.target/arm/simd/vdot-2-3.c: New test. >>>>>> * gcc.target/arm/simd/vdot-2-4.c: New test. >>>>>> >>>>>> >>>>
[Pingx3][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, vdot - by element) for AArch32 AdvSIMD ARMv8.6 Extension
On 2/3/20 11:20 AM, Stam Markianos-Wright wrote: On 1/27/20 3:54 PM, Stam Markianos-Wright wrote: On 1/16/20 4:05 PM, Stam Markianos-Wright wrote: On 1/10/20 6:48 PM, Stam Markianos-Wright wrote: On 12/18/19 1:25 PM, Stam Markianos-Wright wrote: On 12/13/19 10:22 AM, Stam Markianos-Wright wrote: Hi all, This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product operations (vector/by element) to the ARM back-end. These are: usdot (vector), dot (by element). The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and for ARM they remain optional as of ARMv8.6-a. The functions are declared in arm_neon.h, RTL patterns are defined to generate assembler and tests are added to verify and perform adequate checks. Regression testing on arm-none-eabi passed successfully. This patch depends on: https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html for ARM CLI updates, and on: https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html for testsuite effective_target update. Ok for trunk? New diff addressing review comments from Aarch64 version of the patch. _Change of order of operands in RTL patterns. _Change tests to use check-function-bodies, compile with optimisation and check for exact registers. _Rename tests to remove "-compile-" in filename. .Ping! Ping :) Diff re-attached in this ping email is same as the one posted on 10/01 Thank you! . Cheers, Stam ACLE documents are at https://developer.arm.com/docs/101028/latest ISA documents are at https://developer.arm.com/docs/ddi0596/latest PS. I don't have commit rights, so if someone could commit on my behalf, that would be great :) gcc/ChangeLog: 2019-11-28 Stam Markianos-Wright * config/arm/arm-builtins.c (enum arm_type_qualifiers): (USTERNOP_QUALIFIERS): New define. (USMAC_LANE_QUADTUP_QUALIFIERS): New define. (SUMAC_LANE_QUADTUP_QUALIFIERS): New define. (arm_expand_builtin_args): Add case ARG_BUILTIN_LANE_QUADTUP_INDEX. (arm_expand_builtin_1): Add qualifier_lane_quadtup_index. * config/arm/arm_neon.h (vusdot_s32): New. (vusdot_lane_s32): New. (vusdotq_lane_s32): New. (vsudot_lane_s32): New. (vsudotq_lane_s32): New. * config/arm/arm_neon_builtins.def (usdot,usdot_lane,sudot_lane): New. * config/arm/iterators.md (DOTPROD_I8MM): New. (sup, opsuffix): Add . * config/arm/neon.md (neon_usdot, dot_lane: New. * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New. gcc/testsuite/ChangeLog: 2019-12-12 Stam Markianos-Wright * gcc.target/arm/simd/vdot-2-1.c: New test. * gcc.target/arm/simd/vdot-2-2.c: New test. * gcc.target/arm/simd/vdot-2-3.c: New test. * gcc.target/arm/simd/vdot-2-4.c: New test. diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index df84560588a..1b4316d0e93 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -86,7 +86,10 @@ enum arm_type_qualifiers qualifier_const_void_pointer = 0x802, /* Lane indices selected in pairs - must be within range of previous argument = a vector. */ - qualifier_lane_pair_index = 0x1000 + qualifier_lane_pair_index = 0x1000, + /* Lane indices selected in quadtuplets - must be within range of previous + argument = a vector. */ + qualifier_lane_quadtup_index = 0x2000 }; /* The qualifier_internal allows generation of a unary builtin from @@ -122,6 +125,13 @@ arm_unsigned_uternop_qualifiers[SIMD_MAX_BUILTIN_ARGS] qualifier_unsigned }; #define UTERNOP_QUALIFIERS (arm_unsigned_uternop_qualifiers) +/* T (T, unsigned T, T). */ +static enum arm_type_qualifiers +arm_usternop_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_unsigned, + qualifier_none }; +#define USTERNOP_QUALIFIERS (arm_usternop_qualifiers) + /* T (T, immediate). */ static enum arm_type_qualifiers arm_binop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] @@ -176,6 +186,20 @@ arm_umac_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] qualifier_unsigned, qualifier_lane_index }; #define UMAC_LANE_QUALIFIERS (arm_umac_lane_qualifiers) +/* T (T, unsigned T, T, lane index). */ +static enum arm_type_qualifiers +arm_usmac_lane_quadtup_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_unsigned, + qualifier_none, qualifier_lane_quadtup_index }; +#define USMAC_LANE_QUADTUP_QUALIFIERS (arm_usmac_lane_quadtup_qualifiers) + +/* T (T, T, unsigend T, lane index). */ +static enum arm_type_qualifiers +arm_sumac_lane_quadtup_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_none, + qualifier_unsigned, qualifier_lane_quadtup_index }; +#define SUMAC_LANE_QUADTUP_QUALIFIERS (arm_sumac_lane_quadtup_qualifiers) + /* T (T, T, immediate). */ static enum arm_type_qualifiers arm_ternop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] @@ -2148,6 +2172,7 @@ ty
Re: [GCC][BUG][Aarch64][ARM] (PR93300) Fix ICE due to BFmode placement in GET_MODES_WIDER chain.
On 2/4/20 12:02 PM, Richard Sandiford wrote: Stam Markianos-Wright writes: On 1/31/20 1:45 PM, Richard Sandiford wrote: Stam Markianos-Wright writes: On 1/30/20 10:01 AM, Richard Sandiford wrote: Stam Markianos-Wright writes: On 1/29/20 12:42 PM, Richard Sandiford wrote: Stam Markianos-Wright writes: Hi all, This fixes: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93300 Genmodes.c was generating the "wider_mode" chain as follows: HF -> BF -> SF - > DF -> TF -> VOID This caused issues in some rare cases where conversion between modes was needed, such as the above PR93300 where BFmode was being picked up as a valid mode for: optabs.c:prepare_float_lib_cmp which then led to the ICE at expr.c:convert_mode_scalar. Hi Richard, Can you go into more details about why this chain was a problem? Naively, it's the one I'd have expected: HF should certainly have priority over BF, Is that because functionally it looks like genmodes puts things in reverse alphabetical order if all else is equal? (If I'm reading the comment about MODE_RANDOM, MODE_CC correctly) but BF coming before SF doesn't seem unusual in itself. I'm not saying the patch is wrong. It just wasn't clear why it was right either. Yes, I see what you mean. I'll go through my thought process here: In investigating the ICE PR93300 I found that the diversion from pre-bf16 behaviour was specifically at `optabs.c:prepare_float_lib_cmp`, where a `FOR_EACH_MODE_FROM (mode, orig_mode)` is used to then go off and generate library calls for conversions. This was then being caught further down by the gcc_assert at expr.c:325 where GET_MODE_PRECISION (from_mode) was equal to GET_MODE_PRECISION (to_mode) because it was trying to emit a HF->BF conversion libcall as `bl __extendhfbf2` (which is what happened if i removed the gcc_assert at expr.c:325) With BFmode being a target-defined mode, I didn't want to add something like `if (mode != BFmode)` to specifically exclude BFmode from being selected for this. (and there's nothing different between HFmode and BFmode here to allow me to make this distinction?) Also I couldn't find anywhere where the target back-end is not consulted for a "is this supported: yes/no" between the `FOR_EACH_MODE_FROM` loop and the libcall being created later on as __extendhfbf2. Yeah, prepare_float_lib_cmp just checks for libfuncs rather than calling target hooks directly. The libfuncs themselves are under the control of the target though. By default we assume all float modes have associated libfuncs. It's then up to the target to remove functions that don't exist (or redirect them to other functions). So I think we need to remove BFmode libfuncs in arm_init_libfuncs in the same way as we currently do for HFmode. I guess we should also nullify the conversion libfuncs for BFmode, not just the arithmetic and comparison ones. Ahhh now this works, thank you for the suggestion! I was aware of arm_init_libfuncs, but I had not realised that returning NULL would have the desired effect for us, in this case. So I have essentially rolled back the whole previous version of the patch and done this in the new diff. It seems to have fixed the ICE and I am currently in the process of regression testing! LGTM behaviourally, just a couple of requests about how it's written: Thank you! Stam Thanks, Richard Finally, because we really don't want __bf16 to be in the same "conversion rank" as standard float modes for things like automatic promotion, this seemed like a reasonable solution to that problem :) Let me know of your thoughts! Cheers, Stam diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index c47fc232f39..18055d4a75e 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -2643,6 +2643,30 @@ arm_init_libfuncs (void) default: break; } + + /* For all possible libcalls in BFmode, return NULL. */ + /* Conversions. */ + set_conv_libfunc (trunc_optab, BFmode, HFmode, (NULL)); + set_conv_libfunc (sext_optab, HFmode, BFmode, (NULL)); + set_conv_libfunc (trunc_optab, BFmode, SFmode, (NULL)); + set_conv_libfunc (sext_optab, SFmode, BFmode, (NULL)); + set_conv_libfunc (trunc_optab, BFmode, DFmode, (NULL)); + set_conv_libfunc (sext_optab, DFmode, BFmode, (NULL)); It might be slightly safer to do: FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_FLOAT) to iterate over all float modes on the non-BF side. Done :) + /* Arithmetic. */ + set_optab_libfunc (add_optab, BFmode, NULL); + set_optab_libfunc (sdiv_optab, BFmode, NULL); + set_optab_libfunc (smul_optab, BFmode, NULL); + set_optab_libfunc (neg_optab, BFmode, NULL); + set_optab_libfunc (sub_optab, BFmode, NULL); + + /* Comparisons. */ + set_optab_libfunc (eq_optab, BFmode, NULL); + set_optab_libfunc (ne_optab, BFmode, NULL); + set_optab_libfunc (lt_optab, BFmode, NULL); + set_optab_libfunc (le_optab, BFmode, NULL); + set_optab_libfunc (ge_opta
Re: [GCC][BUG][Aarch64][ARM] (PR93300) Fix ICE due to BFmode placement in GET_MODES_WIDER chain.
On 1/31/20 1:45 PM, Richard Sandiford wrote: Stam Markianos-Wright writes: On 1/30/20 10:01 AM, Richard Sandiford wrote: Stam Markianos-Wright writes: On 1/29/20 12:42 PM, Richard Sandiford wrote: Stam Markianos-Wright writes: Hi all, This fixes: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93300 Genmodes.c was generating the "wider_mode" chain as follows: HF -> BF -> SF - > DF -> TF -> VOID This caused issues in some rare cases where conversion between modes was needed, such as the above PR93300 where BFmode was being picked up as a valid mode for: optabs.c:prepare_float_lib_cmp which then led to the ICE at expr.c:convert_mode_scalar. Hi Richard, Can you go into more details about why this chain was a problem? Naively, it's the one I'd have expected: HF should certainly have priority over BF, Is that because functionally it looks like genmodes puts things in reverse alphabetical order if all else is equal? (If I'm reading the comment about MODE_RANDOM, MODE_CC correctly) but BF coming before SF doesn't seem unusual in itself. I'm not saying the patch is wrong. It just wasn't clear why it was right either. Yes, I see what you mean. I'll go through my thought process here: In investigating the ICE PR93300 I found that the diversion from pre-bf16 behaviour was specifically at `optabs.c:prepare_float_lib_cmp`, where a `FOR_EACH_MODE_FROM (mode, orig_mode)` is used to then go off and generate library calls for conversions. This was then being caught further down by the gcc_assert at expr.c:325 where GET_MODE_PRECISION (from_mode) was equal to GET_MODE_PRECISION (to_mode) because it was trying to emit a HF->BF conversion libcall as `bl __extendhfbf2` (which is what happened if i removed the gcc_assert at expr.c:325) With BFmode being a target-defined mode, I didn't want to add something like `if (mode != BFmode)` to specifically exclude BFmode from being selected for this. (and there's nothing different between HFmode and BFmode here to allow me to make this distinction?) Also I couldn't find anywhere where the target back-end is not consulted for a "is this supported: yes/no" between the `FOR_EACH_MODE_FROM` loop and the libcall being created later on as __extendhfbf2. Yeah, prepare_float_lib_cmp just checks for libfuncs rather than calling target hooks directly. The libfuncs themselves are under the control of the target though. By default we assume all float modes have associated libfuncs. It's then up to the target to remove functions that don't exist (or redirect them to other functions). So I think we need to remove BFmode libfuncs in arm_init_libfuncs in the same way as we currently do for HFmode. I guess we should also nullify the conversion libfuncs for BFmode, not just the arithmetic and comparison ones. Ahhh now this works, thank you for the suggestion! I was aware of arm_init_libfuncs, but I had not realised that returning NULL would have the desired effect for us, in this case. So I have essentially rolled back the whole previous version of the patch and done this in the new diff. It seems to have fixed the ICE and I am currently in the process of regression testing! LGTM behaviourally, just a couple of requests about how it's written: Thank you! Stam Thanks, Richard Finally, because we really don't want __bf16 to be in the same "conversion rank" as standard float modes for things like automatic promotion, this seemed like a reasonable solution to that problem :) Let me know of your thoughts! Cheers, Stam diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index c47fc232f39..18055d4a75e 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -2643,6 +2643,30 @@ arm_init_libfuncs (void) default: break; } + + /* For all possible libcalls in BFmode, return NULL. */ + /* Conversions. */ + set_conv_libfunc (trunc_optab, BFmode, HFmode, (NULL)); + set_conv_libfunc (sext_optab, HFmode, BFmode, (NULL)); + set_conv_libfunc (trunc_optab, BFmode, SFmode, (NULL)); + set_conv_libfunc (sext_optab, SFmode, BFmode, (NULL)); + set_conv_libfunc (trunc_optab, BFmode, DFmode, (NULL)); + set_conv_libfunc (sext_optab, DFmode, BFmode, (NULL)); It might be slightly safer to do: FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_FLOAT) to iterate over all float modes on the non-BF side. Done :) + /* Arithmetic. */ + set_optab_libfunc (add_optab, BFmode, NULL); + set_optab_libfunc (sdiv_optab, BFmode, NULL); + set_optab_libfunc (smul_optab, BFmode, NULL); + set_optab_libfunc (neg_optab, BFmode, NULL); + set_optab_libfunc (sub_optab, BFmode, NULL); + + /* Comparisons. */ + set_optab_libfunc (eq_optab, BFmode, NULL); + set_optab_libfunc (ne_optab, BFmode, NULL); + set_optab_libfunc (lt_optab, BFmode, NULL); + set_optab_libfunc (le_optab, BFmode, NULL); + set_optab_libfunc (ge_optab, BFmode, NULL); + set_optab_libfunc (gt_optab, BFmode, NULL); + set_optab_libfun
[Pingx3][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, vdot - by element) for AArch32 AdvSIMD ARMv8.6 Extension
On 1/27/20 3:54 PM, Stam Markianos-Wright wrote: On 1/16/20 4:05 PM, Stam Markianos-Wright wrote: On 1/10/20 6:48 PM, Stam Markianos-Wright wrote: On 12/18/19 1:25 PM, Stam Markianos-Wright wrote: On 12/13/19 10:22 AM, Stam Markianos-Wright wrote: Hi all, This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product operations (vector/by element) to the ARM back-end. These are: usdot (vector), dot (by element). The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and for ARM they remain optional as of ARMv8.6-a. The functions are declared in arm_neon.h, RTL patterns are defined to generate assembler and tests are added to verify and perform adequate checks. Regression testing on arm-none-eabi passed successfully. This patch depends on: https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html for ARM CLI updates, and on: https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html for testsuite effective_target update. Ok for trunk? New diff addressing review comments from Aarch64 version of the patch. _Change of order of operands in RTL patterns. _Change tests to use check-function-bodies, compile with optimisation and check for exact registers. _Rename tests to remove "-compile-" in filename. .Ping! . Cheers, Stam ACLE documents are at https://developer.arm.com/docs/101028/latest ISA documents are at https://developer.arm.com/docs/ddi0596/latest PS. I don't have commit rights, so if someone could commit on my behalf, that would be great :) gcc/ChangeLog: 2019-11-28 Stam Markianos-Wright * config/arm/arm-builtins.c (enum arm_type_qualifiers): (USTERNOP_QUALIFIERS): New define. (USMAC_LANE_QUADTUP_QUALIFIERS): New define. (SUMAC_LANE_QUADTUP_QUALIFIERS): New define. (arm_expand_builtin_args): Add case ARG_BUILTIN_LANE_QUADTUP_INDEX. (arm_expand_builtin_1): Add qualifier_lane_quadtup_index. * config/arm/arm_neon.h (vusdot_s32): New. (vusdot_lane_s32): New. (vusdotq_lane_s32): New. (vsudot_lane_s32): New. (vsudotq_lane_s32): New. * config/arm/arm_neon_builtins.def (usdot,usdot_lane,sudot_lane): New. * config/arm/iterators.md (DOTPROD_I8MM): New. (sup, opsuffix): Add . * config/arm/neon.md (neon_usdot, dot_lane: New. * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New. gcc/testsuite/ChangeLog: 2019-12-12 Stam Markianos-Wright * gcc.target/arm/simd/vdot-compile-2-1.c: New test. * gcc.target/arm/simd/vdot-compile-2-2.c: New test. * gcc.target/arm/simd/vdot-compile-2-3.c: New test. * gcc.target/arm/simd/vdot-compile-2-4.c: New test.
Re: [GCC][BUG][Aarch64][ARM] (PR93300) Fix ICE due to BFmode placement in GET_MODES_WIDER chain.
On 1/30/20 10:01 AM, Richard Sandiford wrote: Stam Markianos-Wright writes: On 1/29/20 12:42 PM, Richard Sandiford wrote: Stam Markianos-Wright writes: Hi all, This fixes: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93300 Genmodes.c was generating the "wider_mode" chain as follows: HF -> BF -> SF - > DF -> TF -> VOID This caused issues in some rare cases where conversion between modes was needed, such as the above PR93300 where BFmode was being picked up as a valid mode for: optabs.c:prepare_float_lib_cmp which then led to the ICE at expr.c:convert_mode_scalar. Hi Richard, Can you go into more details about why this chain was a problem? Naively, it's the one I'd have expected: HF should certainly have priority over BF, Is that because functionally it looks like genmodes puts things in reverse alphabetical order if all else is equal? (If I'm reading the comment about MODE_RANDOM, MODE_CC correctly) but BF coming before SF doesn't seem unusual in itself. I'm not saying the patch is wrong. It just wasn't clear why it was right either. Yes, I see what you mean. I'll go through my thought process here: In investigating the ICE PR93300 I found that the diversion from pre-bf16 behaviour was specifically at `optabs.c:prepare_float_lib_cmp`, where a `FOR_EACH_MODE_FROM (mode, orig_mode)` is used to then go off and generate library calls for conversions. This was then being caught further down by the gcc_assert at expr.c:325 where GET_MODE_PRECISION (from_mode) was equal to GET_MODE_PRECISION (to_mode) because it was trying to emit a HF->BF conversion libcall as `bl __extendhfbf2` (which is what happened if i removed the gcc_assert at expr.c:325) With BFmode being a target-defined mode, I didn't want to add something like `if (mode != BFmode)` to specifically exclude BFmode from being selected for this. (and there's nothing different between HFmode and BFmode here to allow me to make this distinction?) Also I couldn't find anywhere where the target back-end is not consulted for a "is this supported: yes/no" between the `FOR_EACH_MODE_FROM` loop and the libcall being created later on as __extendhfbf2. Yeah, prepare_float_lib_cmp just checks for libfuncs rather than calling target hooks directly. The libfuncs themselves are under the control of the target though. By default we assume all float modes have associated libfuncs. It's then up to the target to remove functions that don't exist (or redirect them to other functions). So I think we need to remove BFmode libfuncs in arm_init_libfuncs in the same way as we currently do for HFmode. I guess we should also nullify the conversion libfuncs for BFmode, not just the arithmetic and comparison ones. Ahhh now this works, thank you for the suggestion! I was aware of arm_init_libfuncs, but I had not realised that returning NULL would have the desired effect for us, in this case. So I have essentially rolled back the whole previous version of the patch and done this in the new diff. It seems to have fixed the ICE and I am currently in the process of regression testing! Thank you! Stam Thanks, Richard Finally, because we really don't want __bf16 to be in the same "conversion rank" as standard float modes for things like automatic promotion, this seemed like a reasonable solution to that problem :) Let me know of your thoughts! Cheers, Stam diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index c47fc232f39..18055d4a75e 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -2643,6 +2643,30 @@ arm_init_libfuncs (void) default: break; } + + /* For all possible libcalls in BFmode, return NULL. */ + /* Conversions. */ + set_conv_libfunc (trunc_optab, BFmode, HFmode, (NULL)); + set_conv_libfunc (sext_optab, HFmode, BFmode, (NULL)); + set_conv_libfunc (trunc_optab, BFmode, SFmode, (NULL)); + set_conv_libfunc (sext_optab, SFmode, BFmode, (NULL)); + set_conv_libfunc (trunc_optab, BFmode, DFmode, (NULL)); + set_conv_libfunc (sext_optab, DFmode, BFmode, (NULL)); + + /* Arithmetic. */ + set_optab_libfunc (add_optab, BFmode, NULL); + set_optab_libfunc (sdiv_optab, BFmode, NULL); + set_optab_libfunc (smul_optab, BFmode, NULL); + set_optab_libfunc (neg_optab, BFmode, NULL); + set_optab_libfunc (sub_optab, BFmode, NULL); + + /* Comparisons. */ + set_optab_libfunc (eq_optab, BFmode, NULL); + set_optab_libfunc (ne_optab, BFmode, NULL); + set_optab_libfunc (lt_optab, BFmode, NULL); + set_optab_libfunc (le_optab, BFmode, NULL); + set_optab_libfunc (ge_optab, BFmode, NULL); + set_optab_libfunc (gt_optab, BFmode, NULL); + set_optab_libfunc (unord_optab, BFmode, NULL); /* Use names prefixed with __gnu_ for fixed-point helper functions. */ {
Re: [PING][PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)
On 1/28/20 10:35 AM, Kyrill Tkachov wrote: Hi Stam, On 1/8/20 3:18 PM, Stam Markianos-Wright wrote: On 12/10/19 5:03 PM, Kyrill Tkachov wrote: Hi Stam, On 11/15/19 5:26 PM, Stam Markianos-Wright wrote: Pinging with more correct maintainers this time :) Also would need to backport to gcc7,8,9, but need to get this approved first! Sorry for the delay. Same here now! Sorry totally forget about this in the lead up to Xmas! Done the changes marked below and also removed the unnecessary extra #defines from the test. This is ok with a nit on the testcase... diff --git a/gcc/testsuite/gcc.target/arm/pr91816.c b/gcc/testsuite/gcc.target/arm/pr91816.c new file mode 100644 index ..757c897e9c0db32709227b3fdf1b4a8033428232 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr91816.c @@ -0,0 +1,61 @@ +/* { dg-do compile } */ +/* { dg-options "-march=armv7-a -mthumb -mfpu=vfpv3-d16" } */ +int printf(const char *, ...); + I think this needs a couple of effective target checks like arm_hard_vfp_ok and arm_thumb2_ok. See other tests in gcc.target/arm that add -mthumb to the options. Hmm, looking back at this now, is there any reason why it can't just be: /* { dg-do compile } */ /* { dg-require-effective-target arm_thumb2_ok } */ /* { dg-additional-options "-mthumb" } */ were we don't override the march or fpu options at all, but just use `require-effective-target arm_thumb2_ok` to make sure that thumb2 is supported? The attached new diff does just that. Cheers :) Stam. Thanks, Kyrill diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 7c4b1003844..8895becc639 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -576,4 +576,6 @@ void arm_parse_option_features (sbitmap, const cpu_arch_option *, void arm_initialize_isa (sbitmap, const enum isa_feature *); +const char * arm_gen_far_branch (rtx *, int, const char * , const char *); + #endif /* ! GCC_ARM_PROTOS_H */ diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 07231d722b9..ee5de169f3e 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -32626,6 +32626,40 @@ arm_run_selftests (void) } } /* Namespace selftest. */ + +/* Generate code to enable conditional branches in functions over 1 MiB. + Parameters are: + operands: is the operands list of the asm insn (see arm_cond_branch or + arm_cond_branch_reversed). + pos_label: is an index into the operands array where operands[pos_label] is + the asm label of the final jump destination. + dest: is a string which is used to generate the asm label of the intermediate + destination + branch_format: is a string denoting the intermediate branch format, e.g. + "beq", "bne", etc. */ + +const char * +arm_gen_far_branch (rtx * operands, int pos_label, const char * dest, + const char * branch_format) +{ + rtx_code_label * tmp_label = gen_label_rtx (); + char label_buf[256]; + char buffer[128]; + ASM_GENERATE_INTERNAL_LABEL (label_buf, dest , \ + CODE_LABEL_NUMBER (tmp_label)); + const char *label_ptr = arm_strip_name_encoding (label_buf); + rtx dest_label = operands[pos_label]; + operands[pos_label] = tmp_label; + + snprintf (buffer, sizeof (buffer), "%s%s", branch_format , label_ptr); + output_asm_insn (buffer, operands); + + snprintf (buffer, sizeof (buffer), "b\t%%l0%d\n%s:", pos_label, label_ptr); + operands[pos_label] = dest_label; + output_asm_insn (buffer, operands); + return ""; +} + #undef TARGET_RUN_TARGET_SELFTESTS #define TARGET_RUN_TARGET_SELFTESTS selftest::arm_run_selftests #endif /* CHECKING_P */ diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index f89a2d412df..fb1d4547e5c 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -7546,9 +7546,15 @@ ;; And for backward branches we have ;; (neg_range - neg_base_offs + pc_offs) = (neg_range - (-2 or -4) + 4). ;; +;; In 16-bit Thumb these ranges are: ;; For a 'b' pos_range = 2046, neg_range = -2048 giving (-2040->2048). ;; For a 'b' pos_range = 254, neg_range = -256 giving (-250 ->256). +;; In 32-bit Thumb these ranges are: +;; For a 'b' +/- 16MB is not checked for. +;; For a 'b' pos_range = 1048574, neg_range = -1048576 giving +;; (-1048568 -> 1048576). + (define_expand "cbranchsi4" [(set (pc) (if_then_else (match_operator 0 "expandable_comparison_operator" @@ -7721,23 +7727,50 @@ (label_ref (match_operand 0 "" "")) (pc)))] "TARGET_32BIT" - "* - if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2) + { +if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2) { arm_ccfsm_state += 2; - return \"\"; + return ""; } - return \"b%d1\\
Re: [GCC][BUG][Aarch64][ARM] (PR93300) Fix ICE due to BFmode placement in GET_MODES_WIDER chain.
On 1/29/20 12:42 PM, Richard Sandiford wrote: Stam Markianos-Wright writes: Hi all, This fixes: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93300 Genmodes.c was generating the "wider_mode" chain as follows: HF -> BF -> SF - > DF -> TF -> VOID This caused issues in some rare cases where conversion between modes was needed, such as the above PR93300 where BFmode was being picked up as a valid mode for: optabs.c:prepare_float_lib_cmp which then led to the ICE at expr.c:convert_mode_scalar. Hi Richard, Can you go into more details about why this chain was a problem? Naively, it's the one I'd have expected: HF should certainly have priority over BF, Is that because functionally it looks like genmodes puts things in reverse alphabetical order if all else is equal? (If I'm reading the comment about MODE_RANDOM, MODE_CC correctly) but BF coming before SF doesn't seem unusual in itself. I'm not saying the patch is wrong. It just wasn't clear why it was right either. Yes, I see what you mean. I'll go through my thought process here: In investigating the ICE PR93300 I found that the diversion from pre-bf16 behaviour was specifically at `optabs.c:prepare_float_lib_cmp`, where a `FOR_EACH_MODE_FROM (mode, orig_mode)` is used to then go off and generate library calls for conversions. This was then being caught further down by the gcc_assert at expr.c:325 where GET_MODE_PRECISION (from_mode) was equal to GET_MODE_PRECISION (to_mode) because it was trying to emit a HF->BF conversion libcall as `bl __extendhfbf2` (which is what happened if i removed the gcc_assert at expr.c:325) With BFmode being a target-defined mode, I didn't want to add something like `if (mode != BFmode)` to specifically exclude BFmode from being selected for this. (and there's nothing different between HFmode and BFmode here to allow me to make this distinction?) Also I couldn't find anywhere where the target back-end is not consulted for a "is this supported: yes/no" between the `FOR_EACH_MODE_FROM` loop and the libcall being created later on as __extendhfbf2. Finally, because we really don't want __bf16 to be in the same "conversion rank" as standard float modes for things like automatic promotion, this seemed like a reasonable solution to that problem :) Let me know of your thoughts! Cheers, Stam Thanks, Richard This patch adds a new FLOAT_MODE_UNRANKED macro which uses the existing "order" attribute of mode_data to place BFmode as: HF -> SF - > DF -> TF -> BF -> VOID This fixes the existing ICE seen by PR93300 (hence providing this with no explicit test) and causes no further regressions. Reg-tested on arm-none-eabi, aarch64-none-elf and bootstrapped on a Cortex-A15. Ok for trunk? Cheers, Stam gcc/ChangeLog: 2020-01-28 Stam Markianos-Wright * config/aarch64/aarch64-modes.def: Update BFmode to use FLOAT_MODE_UNRANKED. * config/arm/arm-modes.def: Update BFmode to use FLOAT_MODE_UNRANKED. * genmodes.c (FLOAT_MODE_UNRANKED): New macro. (make_float_mode): Add ORDER parameter. The whole diff for reference: diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def index 1eeb8d88452..0b36da942b4 100644 --- a/gcc/config/aarch64/aarch64-modes.def +++ b/gcc/config/aarch64/aarch64-modes.def @@ -69,10 +69,10 @@ VECTOR_MODES (FLOAT, 16); /*V4SF V2DF. */ VECTOR_MODE (FLOAT, DF, 1); /* V1DF. */ VECTOR_MODE (FLOAT, HF, 2); /* V2HF. */ -/* Bfloat16 modes. */ -FLOAT_MODE (BF, 2, 0); +/* Bfloat16 modes. Using 1 as the ORDER argument ensures that this is + placed after normal floating point modes in the GET_MODES_WIDER chain. */ +FLOAT_MODE_UNRANKED (BF, 2, 0, 1); ADJUST_FLOAT_FORMAT (BF, _bfloat_half_format); - VECTOR_MODE (FLOAT, BF, 4); /* V4BF. */ VECTOR_MODE (FLOAT, BF, 8); /* V8BF. */ diff --git a/gcc/config/arm/arm-modes.def b/gcc/config/arm/arm-modes.def index ea92ef35723..86551be8e3b 100644 --- a/gcc/config/arm/arm-modes.def +++ b/gcc/config/arm/arm-modes.def @@ -78,7 +78,9 @@ VECTOR_MODES (FLOAT, 8); /*V4HF V2SF */ VECTOR_MODES (FLOAT, 16); /* V8HF V4SF V2DF */ VECTOR_MODE (FLOAT, HF, 2); /* V2HF */ -FLOAT_MODE (BF, 2, 0); +/* Bfloat16 modes. Using 1 as the ORDER argument ensures that this is + placed after normal floating point modes in the GET_MODES_WIDER chain. */ +FLOAT_MODE_UNRANKED (BF, 2, 0, 1); ADJUST_FLOAT_FORMAT (BF, _bfloat_half_format); VECTOR_MODE (FLOAT, BF, 4); /* V4BF. */ VECTOR_MODE (FLOAT, BF, 8); /* V8BF. */ diff --git a/gcc/genmodes.c b/gcc/genmodes.c index bd78310ea24..c4e3dd1150d 100644 --- a/gcc/genmodes.c +++ b/gcc/genmodes.c @@ -617,20 +617,23 @@ make_fixed_point_mode (enum mode_class cl, m->fbit = fbit; }
[GCC][BUG][Aarch64][ARM] (PR93300) Fix ICE due to BFmode placement in GET_MODES_WIDER chain.
Hi all, This fixes: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93300 Genmodes.c was generating the "wider_mode" chain as follows: HF -> BF -> SF - > DF -> TF -> VOID This caused issues in some rare cases where conversion between modes was needed, such as the above PR93300 where BFmode was being picked up as a valid mode for: optabs.c:prepare_float_lib_cmp which then led to the ICE at expr.c:convert_mode_scalar. This patch adds a new FLOAT_MODE_UNRANKED macro which uses the existing "order" attribute of mode_data to place BFmode as: HF -> SF - > DF -> TF -> BF -> VOID This fixes the existing ICE seen by PR93300 (hence providing this with no explicit test) and causes no further regressions. Reg-tested on arm-none-eabi, aarch64-none-elf and bootstrapped on a Cortex-A15. Ok for trunk? Cheers, Stam gcc/ChangeLog: 2020-01-28 Stam Markianos-Wright * config/aarch64/aarch64-modes.def: Update BFmode to use FLOAT_MODE_UNRANKED. * config/arm/arm-modes.def: Update BFmode to use FLOAT_MODE_UNRANKED. * genmodes.c (FLOAT_MODE_UNRANKED): New macro. (make_float_mode): Add ORDER parameter. The whole diff for reference: diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def index 1eeb8d88452..0b36da942b4 100644 --- a/gcc/config/aarch64/aarch64-modes.def +++ b/gcc/config/aarch64/aarch64-modes.def @@ -69,10 +69,10 @@ VECTOR_MODES (FLOAT, 16); /*V4SF V2DF. */ VECTOR_MODE (FLOAT, DF, 1); /* V1DF. */ VECTOR_MODE (FLOAT, HF, 2); /* V2HF. */ -/* Bfloat16 modes. */ -FLOAT_MODE (BF, 2, 0); +/* Bfloat16 modes. Using 1 as the ORDER argument ensures that this is + placed after normal floating point modes in the GET_MODES_WIDER chain. */ +FLOAT_MODE_UNRANKED (BF, 2, 0, 1); ADJUST_FLOAT_FORMAT (BF, _bfloat_half_format); - VECTOR_MODE (FLOAT, BF, 4); /*V4BF. */ VECTOR_MODE (FLOAT, BF, 8); /*V8BF. */ diff --git a/gcc/config/arm/arm-modes.def b/gcc/config/arm/arm-modes.def index ea92ef35723..86551be8e3b 100644 --- a/gcc/config/arm/arm-modes.def +++ b/gcc/config/arm/arm-modes.def @@ -78,7 +78,9 @@ VECTOR_MODES (FLOAT, 8); /*V4HF V2SF */ VECTOR_MODES (FLOAT, 16); /* V8HF V4SF V2DF */ VECTOR_MODE (FLOAT, HF, 2); /* V2HF */ -FLOAT_MODE (BF, 2, 0); +/* Bfloat16 modes. Using 1 as the ORDER argument ensures that this is + placed after normal floating point modes in the GET_MODES_WIDER chain. */ +FLOAT_MODE_UNRANKED (BF, 2, 0, 1); ADJUST_FLOAT_FORMAT (BF, _bfloat_half_format); VECTOR_MODE (FLOAT, BF, 4); /*V4BF. */ VECTOR_MODE (FLOAT, BF, 8); /*V8BF. */ diff --git a/gcc/genmodes.c b/gcc/genmodes.c index bd78310ea24..c4e3dd1150d 100644 --- a/gcc/genmodes.c +++ b/gcc/genmodes.c @@ -617,20 +617,23 @@ make_fixed_point_mode (enum mode_class cl, m->fbit = fbit; } -#define FLOAT_MODE(N, Y, F) FRACTIONAL_FLOAT_MODE (N, -1U, Y, F) -#define FRACTIONAL_FLOAT_MODE(N, B, Y, F) \ - make_float_mode (#N, B, Y, #F, __FILE__, __LINE__) +#define FLOAT_MODE_UNRANKED(N, Y, F, ORDER) \ + FRACTIONAL_FLOAT_MODE (N, -1U, Y, F, ORDER) +#define FLOAT_MODE(N, Y, F) FRACTIONAL_FLOAT_MODE (N, -1U, Y, F, 0) +#define FRACTIONAL_FLOAT_MODE(N, B, Y, F, ORDER) \ + make_float_mode (#N, B, Y, #F, ORDER, __FILE__, __LINE__) static void make_float_mode (const char *name, unsigned int precision, unsigned int bytesize, -const char *format, +const char *format, unsigned int order, const char *file, unsigned int line) { struct mode_data *m = new_mode (MODE_FLOAT, name, file, line); m->bytesize = bytesize; m->precision = precision; m->format = format; + m->order = order; } #define DECIMAL_FLOAT_MODE(N, Y, F)\ diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def index 1eeb8d88452..0b36da942b4 100644 --- a/gcc/config/aarch64/aarch64-modes.def +++ b/gcc/config/aarch64/aarch64-modes.def @@ -69,10 +69,10 @@ VECTOR_MODES (FLOAT, 16); /*V4SF V2DF. */ VECTOR_MODE (FLOAT, DF, 1); /* V1DF. */ VECTOR_MODE (FLOAT, HF, 2); /* V2HF. */ -/* Bfloat16 modes. */ -FLOAT_MODE (BF, 2, 0); +/* Bfloat16 modes. Using 1 as the ORDER argument ensures that this is + placed after normal floating point modes in the GET_MODES_WIDER chain. */ +FLOAT_MODE_UNRANKED (BF, 2, 0, 1); ADJUST_FLOAT_FORMAT (BF, _bfloat_half_format); - VECTOR_MODE (FLOAT, BF, 4); /* V4BF. */ VECTOR_MODE (FLOAT, BF, 8); /* V8BF. */ diff --git a/gcc/config/arm/arm-modes.def b/gcc/config/arm/arm-modes.def index ea92ef35723..86551be8e3b 100644 --- a/gcc/config/arm/arm-modes.def +++ b/gcc/config/arm/arm-modes.def @@ -78,7 +78,9 @@ VECTOR_MODES (FLOAT, 8); /*
[committed][GCC][ARM] Update __fp16 test to fix regression caused by Bfloat optimisation.
Hi all, This was committed following offline approval by Kyryl. One minor optimisation introduced by : https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01237.html was to set a preference for both __fp16 types and __bf16 types to be loaded/stored directly into/from the FP/NEON registers (if they are available and if the vld1.16 is compatible), rather than be passed through the regular r-registers. This would convert many observed instances of: ** ldrhr3, [r3]@ __fp16 ** vmov.f16s15, r3 @ __fp16 Into a single: ** vld1.16 {d7[2]}, [r3] This resulted in a regression of a dg-scan-assembler in a __fp16 test. This patch updates the test to the same testing standard used by the BFloat tests (use check-function-bodies to explicitly check for correct assembler generated by each function) and updates it for the latest optimisation. Cheers, Stam gcc/testsuite/ChangeLog: 2020-01-27 Stam Markianos-Wright * gcc.target/arm/armv8_2-fp16-move-1.c: Update following load/store optimisation. diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c index 2321dd38cc6..009bb8d1575 100644 --- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c +++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c @@ -3,39 +3,78 @@ /* { dg-options "-O2" } */ /* { dg-add-options arm_v8_2a_fp16_scalar } */ /* { dg-additional-options "-mfloat-abi=hard" } */ - +/* { dg-final { check-function-bodies "**" "" } } */ + +/* +**test_load_1: +** ... +** vld1.16 {d[0-9]+\[[0-9]+\]}, \[r[0-9]+\] +** ... +*/ __fp16 test_load_1 (__fp16* a) { return *a; } +/* +**test_load_2: +** ... +** vld1.16 {d[0-9]+\[[0-9]+\]}, \[r[0-9]+\] +** ... +*/ __fp16 test_load_2 (__fp16* a, int i) { return a[i]; } - +/* +**test_store_1: +** ... +** vst1.16 {d[0-9]+\[[0-9]+\]}, \[r[0-9]+\] +** ... +*/ void test_store_1 (__fp16* a, __fp16 b) { *a = b; } +/* +**test_store_2: +** ... +** vst1.16 {d[0-9]+\[[0-9]+\]}, \[r[0-9]+\] +** ... +*/ void test_store_2 (__fp16* a, int i, __fp16 b) { a[i] = b; } - +/* +**test_load_store_1: +** ... +** vld1.16 {d[0-9]+\[[0-9]+\]}, \[r[0-9]+\] +** ... +** vst1.16 {d[0-9]+\[[0-9]+\]}, \[r[0-9]+\] +** ... +*/ __fp16 test_load_store_1 (__fp16* a, int i, __fp16* b) { a[i] = b[i]; } +/* +**test_load_store_2: +** ... +** vld1.16 {d[0-9]+\[[0-9]+\]}, \[r[0-9]+\] +** ... +** vst1.16 {d[0-9]+\[[0-9]+\]}, \[r[0-9]+\] +** ... +*/ __fp16 test_load_store_2 (__fp16* a, int i, __fp16* b) { @@ -43,9 +82,6 @@ test_load_store_2 (__fp16* a, int i, __fp16* b) return a[i]; } -/* { dg-final { scan-assembler-times {vst1\.16\t\{d[0-9]+\[[0-9]+\]\}, \[r[0-9]+\]} 3 } } */ -/* { dg-final { scan-assembler-times {vld1\.16\t\{d[0-9]+\[[0-9]+\]\}, \[r[0-9]+\]} 3 } } */ - __fp16 test_select_1 (int sel, __fp16 a, __fp16 b) {
[PINGx2][PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)
On 1/16/20 4:06 PM, Stam Markianos-Wright wrote: > > > On 1/8/20 3:18 PM, Stam Markianos-Wright wrote: >> >> >> On 12/10/19 5:03 PM, Kyrill Tkachov wrote: >>> Hi Stam, >>> >>> On 11/15/19 5:26 PM, Stam Markianos-Wright wrote: >>>> Pinging with more correct maintainers this time :) >>>> >>>> Also would need to backport to gcc7,8,9, but need to get this approved >>>> first! >>>> >>> >>> Sorry for the delay. >> >> Same here now! Sorry totally forget about this in the lead up to Xmas! >> >> Done the changes marked below and also removed the unnecessary extra >> #defines >> from the test. > > Ping :) > > Cheers, > Stam > >> >>> >>> >>>> Thank you, >>>> Stam >>>> >>>> >>>> Forwarded Message >>>> Subject: Re: [PATCH][GCC][ARM] Arm generates out of range conditional >>>> branches in Thumb2 (PR91816) >>>> Date: Mon, 21 Oct 2019 10:37:09 +0100 >>>> From: Stam Markianos-Wright >>>> To: Ramana Radhakrishnan >>>> CC: gcc-patches@gcc.gnu.org , nd , >>>> James Greenhalgh , Richard Earnshaw >>>> >>>> >>>> >>>> >>>> On 10/13/19 4:23 PM, Ramana Radhakrishnan wrote: >>>> >> >>>> >> Patch bootstrapped and regression tested on arm-none-linux-gnueabihf, >>>> >> however, on my native Aarch32 setup the test times out when run as part >>>> >> of a big "make check-gcc" regression, but not when run individually. >>>> >> >>>> >> 2019-10-11 Stamatis Markianos-Wright >>>> >> >>>> >> * config/arm/arm.md: Update b for Thumb2 range checks. >>>> >> * config/arm/arm.c: New function arm_gen_far_branch. >>>> >> * config/arm/arm-protos.h: New function arm_gen_far_branch >>>> >> prototype. >>>> >> >>>> >> gcc/testsuite/ChangeLog: >>>> >> >>>> >> 2019-10-11 Stamatis Markianos-Wright >>>> >> >>>> >> * testsuite/gcc.target/arm/pr91816.c: New test. >>>> > >>>> >> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h >>>> >> index f995974f9bb..1dce333d1c3 100644 >>>> >> --- a/gcc/config/arm/arm-protos.h >>>> >> +++ b/gcc/config/arm/arm-protos.h >>>> >> @@ -570,4 +570,7 @@ void arm_parse_option_features (sbitmap, const >>>> cpu_arch_option *, >>>> >> >>>> >> void arm_initialize_isa (sbitmap, const enum isa_feature *); >>>> >> >>>> >> +const char * arm_gen_far_branch (rtx *, int,const char * , const char >>>> >> *); >>>> >> + >>>> >> + >>>> > >>>> > Lets get the nits out of the way. >>>> > >>>> > Unnecessary extra new line, need a space between int and const above. >>>> > >>>> > >>>> >>>> .Fixed! >>>> >>>> >> #endif /* ! GCC_ARM_PROTOS_H */ >>>> >> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c >>>> >> index 39e1a1ef9a2..1a693d2ddca 100644 >>>> >> --- a/gcc/config/arm/arm.c >>>> >> +++ b/gcc/config/arm/arm.c >>>> >> @@ -32139,6 +32139,31 @@ arm_run_selftests (void) >>>> >> } >>>> >> } /* Namespace selftest. */ >>>> >> >>>> >> + >>>> >> +/* Generate code to enable conditional branches in functions over 1 >>>> MiB. */ >>>> >> +const char * >>>> >> +arm_gen_far_branch (rtx * operands, int pos_label, const char * dest, >>>> >> + const char * branch_format) >>>> > >>>> > Not sure if this is some munging from the attachment but check >>>> > vertical alignment of parameters. >>>> > >>>> >>>> .Fixed! >>>> >>>> >> +{ >>>> >> + rtx_code_label * tmp_label = gen_label_rtx (); >>>> >> + char label_buf[256]; >>>> >> + char buffer[128]; >>>> >> + ASM_GENERATE_INTERNAL_LABEL (label_buf, dest , \ >>>> >&g
[Pingx3][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, vdot - by element) for AArch32 AdvSIMD ARMv8.6 Extension
On 1/16/20 4:05 PM, Stam Markianos-Wright wrote: > > > On 1/10/20 6:48 PM, Stam Markianos-Wright wrote: >> >> >> On 12/18/19 1:25 PM, Stam Markianos-Wright wrote: >>> >>> >>> On 12/13/19 10:22 AM, Stam Markianos-Wright wrote: >>>> Hi all, >>>> >>>> This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product >>>> operations (vector/by element) to the ARM back-end. >>>> >>>> These are: >>>> usdot (vector), dot (by element). >>>> >>>> The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and >>>> for ARM they remain optional as of ARMv8.6-a. >>>> >>>> The functions are declared in arm_neon.h, RTL patterns are defined to >>>> generate assembler and tests are added to verify and perform adequate >>>> checks. >>>> >>>> Regression testing on arm-none-eabi passed successfully. >>>> >>>> This patch depends on: >>>> >>>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html >>>> >>>> for ARM CLI updates, and on: >>>> >>>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html >>>> >>>> for testsuite effective_target update. >>>> >>>> Ok for trunk? >>> >>> .Ping :) >>> >> Ping :) >> >> New diff addressing review comments from Aarch64 version of the patch. >> >> _Change of order of operands in RTL patterns. >> _Change tests to use check-function-bodies, compile with optimisation and >> check for exact registers. >> _Rename tests to remove "-compile-" in filename. >> > > Ping! > > Cheers, > Stam > >>>> >>>> Cheers, >>>> Stam >>>> >>>> >>>> ACLE documents are at https://developer.arm.com/docs/101028/latest >>>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest >>>> >>>> PS. I don't have commit rights, so if someone could commit on my behalf, >>>> that would be great :) >>>> >>>> >>>> gcc/ChangeLog: >>>> >>>> 2019-11-28 Stam Markianos-Wright >>>> >>>> * config/arm/arm-builtins.c (enum arm_type_qualifiers): >>>> (USTERNOP_QUALIFIERS): New define. >>>> (USMAC_LANE_QUADTUP_QUALIFIERS): New define. >>>> (SUMAC_LANE_QUADTUP_QUALIFIERS): New define. >>>> (arm_expand_builtin_args): >>>> Add case ARG_BUILTIN_LANE_QUADTUP_INDEX. >>>> (arm_expand_builtin_1): Add qualifier_lane_quadtup_index. >>>> * config/arm/arm_neon.h (vusdot_s32): New. >>>> (vusdot_lane_s32): New. >>>> (vusdotq_lane_s32): New. >>>> (vsudot_lane_s32): New. >>>> (vsudotq_lane_s32): New. >>>> * config/arm/arm_neon_builtins.def >>>> (usdot,usdot_lane,sudot_lane): New. >>>> * config/arm/iterators.md (DOTPROD_I8MM): New. >>>> (sup, opsuffix): Add . >>>> * config/arm/neon.md (neon_usdot, dot_lane: New. >>>> * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New. >>>> >>>> >>>> gcc/testsuite/ChangeLog: >>>> >>>> 2019-12-12 Stam Markianos-Wright >>>> >>>> * gcc.target/arm/simd/vdot-compile-2-1.c: New test. >>>> * gcc.target/arm/simd/vdot-compile-2-2.c: New test. >>>> * gcc.target/arm/simd/vdot-compile-2-3.c: New test. >>>> * gcc.target/arm/simd/vdot-compile-2-4.c: New test. >>>> >>>> >>
Re: [GCC][PATCH][ARM] Add Bfloat16_t scalar type, vector types and machine modes to ARM back-end [1/2]
On 1/20/20 1:07 PM, Christophe Lyon wrote: > Hi, > > > On Thu, 16 Jan 2020 at 16:59, Stam Markianos-Wright > wrote: >> >> >> >> On 1/13/20 10:05 AM, Kyrill Tkachov wrote: >>> Hi Stam, >>> >>> On 1/10/20 6:45 PM, Stam Markianos-Wright wrote: >>>> Hi all, >>>> >>>> This is a respin of patch: >>>> >>>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html >>>> >>>> which has now been split into two (similar to the Aarch64 version). >>>> >>>> This is patch 1 of 2 and adds Bfloat type support to the ARM back-end. >>>> It also adds a new machine_mode (BFmode) for this type and accompanying >>>> Vector >>>> modes V4BFmode and V8BFmode. >>>> >>>> The second patch in this series uses existing target hooks to restrict >>>> type use. >>>> >>>> Regression testing on arm-none-eabi passed successfully. >>>> >>>> This patch depends on: >>>> >>>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html >>>> >>>> for test suite effective_target update. >>>> >>>> Ok for trunk? >>> >>> This is ok, thanks. >>> >>> You can commit it once the git conversion goes through :) >> >> Committed as r10-6020-g2e87b2f4121fe1d39edb76f4e492dfe327be6a1b >> > > This since commit, I've noticed many ICEs like: > Executing on host: > /aci-gcc-fsf/builds/gcc-fsf-gccsrc-thumb/obj-arm-none-eabi/gcc3/gcc/xgcc > -B/aci-gcc-fsf/builds/gcc-fsf-gccsrc-thumb/obj-arm-none-eabi/gcc3/gcc/ > /gcc/testsuite/gcc.dg/torture/arm-fp16-ops-1.c > -fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers > -fdiagnostics-color=never -fdiagnostics-urls=never-O0 > -mfp16-format=ieee -lm -o ./arm-fp16-ops-1.exe(timeout = > 800) > spawn -ignore SIGHUP > /aci-gcc-fsf/builds/gcc-fsf-gccsrc-thumb/obj-arm-none-eabi/gcc3/gcc/xgcc > -B/aci-gcc-fsf/builds/gcc-fsf-gccsrc-thumb/obj-arm-none-eabi/gcc3/gcc/ > /gcc/testsuite/gcc.dg/torture/arm-fp16-ops-1.c > -fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers > -fdiagnostics-color=never -fdiagnostics-urls=never -O0 > -mfp16-format=ieee -lm -o ./arm-fp16-ops-1.exe > during RTL pass: expand > In file included from /gcc/testsuite/gcc.dg/torture/arm-fp16-ops.h:3, > from /gcc/testsuite/gcc.dg/torture/arm-fp16-ops-1.c:5: > /gcc/testsuite/gcc.dg/torture/arm-fp16-ops.h: In function 'main': > /gcc/testsuite/gcc.dg/torture/arm-fp16-ops.h:31:12: internal compiler > error: in convert_mode_scalar, at expr.c:328 > /gcc/testsuite/gcc.dg/torture/arm-fp16-ops.h:31:3: note: in expansion > of macro 'CHECK' > 0x8cb089 convert_mode_scalar > /gcc/expr.c:325 > 0x8cb089 convert_move(rtx_def*, rtx_def*, int) > /gcc/expr.c:297 > 0x8cb32f convert_modes(machine_mode, machine_mode, rtx_def*, int) > /gcc/expr.c:737 > 0xb8b2a0 expand_binop(machine_mode, optab_tag, rtx_def*, rtx_def*, > rtx_def*, int, optab_methods) > /gcc/optabs.c:1895 > 0x8bdebc expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, > expand_modifier) > /gcc/expr.c:9847 > 0x77e52a expand_gimple_stmt_1 > /gcc/cfgexpand.c:3784 > 0x77e52a expand_gimple_stmt > /gcc/cfgexpand.c:3844 > 0x78068d expand_gimple_basic_block > /gcc/cfgexpand.c:5884 > 0x78279c execute > /gcc/cfgexpand.c:6539 > > This example is for gcc.dg/torture/arm-fp16-ops-1.c target arm-none-eabi. > > You said you saw no regressions, am I missing something? > (this is still true as of todays' daily-bump > bec238768255acf0fe5b0993d05cf99f6331b79e) > > Thanks, > > Christophe Hi Christophe! Yes I think this is a duplicate of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93300 which Martin raised last Friday. I'm working on this! I made the rookie mistake of doing my reg-testing on a non-final version of the patch rather than the _final_ final version - hence not picking this up until it was too late... Sorry about that! I'm working on the fix now :) Cheers, Stam > > > >> Thank you! >> Stam >>> >>> Kyrill >>> >>> >>>> >>>> Cheers, >>>> Stam >>>> >>>> >>>> ACLE documents are at https://developer.arm.com/docs/101028/latest >>>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest >>>> >>>> Details on ARM Bfloat can be found here: >>>> https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bflo
Re: [PING][PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)
On 1/8/20 3:18 PM, Stam Markianos-Wright wrote: > > > On 12/10/19 5:03 PM, Kyrill Tkachov wrote: >> Hi Stam, >> >> On 11/15/19 5:26 PM, Stam Markianos-Wright wrote: >>> Pinging with more correct maintainers this time :) >>> >>> Also would need to backport to gcc7,8,9, but need to get this approved >>> first! >>> >> >> Sorry for the delay. > > Same here now! Sorry totally forget about this in the lead up to Xmas! > > Done the changes marked below and also removed the unnecessary extra #defines > from the test. Ping :) Cheers, Stam > >> >> >>> Thank you, >>> Stam >>> >>> >>> Forwarded Message >>> Subject: Re: [PATCH][GCC][ARM] Arm generates out of range conditional >>> branches in Thumb2 (PR91816) >>> Date: Mon, 21 Oct 2019 10:37:09 +0100 >>> From: Stam Markianos-Wright >>> To: Ramana Radhakrishnan >>> CC: gcc-patches@gcc.gnu.org , nd , >>> James Greenhalgh , Richard Earnshaw >>> >>> >>> >>> >>> On 10/13/19 4:23 PM, Ramana Radhakrishnan wrote: >>> >> >>> >> Patch bootstrapped and regression tested on arm-none-linux-gnueabihf, >>> >> however, on my native Aarch32 setup the test times out when run as part >>> >> of a big "make check-gcc" regression, but not when run individually. >>> >> >>> >> 2019-10-11 Stamatis Markianos-Wright >>> >> >>> >> * config/arm/arm.md: Update b for Thumb2 range checks. >>> >> * config/arm/arm.c: New function arm_gen_far_branch. >>> >> * config/arm/arm-protos.h: New function arm_gen_far_branch >>> >> prototype. >>> >> >>> >> gcc/testsuite/ChangeLog: >>> >> >>> >> 2019-10-11 Stamatis Markianos-Wright >>> >> >>> >> * testsuite/gcc.target/arm/pr91816.c: New test. >>> > >>> >> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h >>> >> index f995974f9bb..1dce333d1c3 100644 >>> >> --- a/gcc/config/arm/arm-protos.h >>> >> +++ b/gcc/config/arm/arm-protos.h >>> >> @@ -570,4 +570,7 @@ void arm_parse_option_features (sbitmap, const >>> cpu_arch_option *, >>> >> >>> >> void arm_initialize_isa (sbitmap, const enum isa_feature *); >>> >> >>> >> +const char * arm_gen_far_branch (rtx *, int,const char * , const char >>> >> *); >>> >> + >>> >> + >>> > >>> > Lets get the nits out of the way. >>> > >>> > Unnecessary extra new line, need a space between int and const above. >>> > >>> > >>> >>> .Fixed! >>> >>> >> #endif /* ! GCC_ARM_PROTOS_H */ >>> >> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c >>> >> index 39e1a1ef9a2..1a693d2ddca 100644 >>> >> --- a/gcc/config/arm/arm.c >>> >> +++ b/gcc/config/arm/arm.c >>> >> @@ -32139,6 +32139,31 @@ arm_run_selftests (void) >>> >> } >>> >> } /* Namespace selftest. */ >>> >> >>> >> + >>> >> +/* Generate code to enable conditional branches in functions over 1 >>> >> MiB. */ >>> >> +const char * >>> >> +arm_gen_far_branch (rtx * operands, int pos_label, const char * dest, >>> >> + const char * branch_format) >>> > >>> > Not sure if this is some munging from the attachment but check >>> > vertical alignment of parameters. >>> > >>> >>> .Fixed! >>> >>> >> +{ >>> >> + rtx_code_label * tmp_label = gen_label_rtx (); >>> >> + char label_buf[256]; >>> >> + char buffer[128]; >>> >> + ASM_GENERATE_INTERNAL_LABEL (label_buf, dest , \ >>> >> + CODE_LABEL_NUMBER (tmp_label)); >>> >> + const char *label_ptr = arm_strip_name_encoding (label_buf); >>> >> + rtx dest_label = operands[pos_label]; >>> >> + operands[pos_label] = tmp_label; >>> >> + >>> >> + snprintf (buffer, sizeof (buffer), "%s%s", branch_format , label_ptr); >>> >> + output_asm_insn (buffer, operands); >>> >> + >>> >> + snprintf (buffer,
[Pingx2][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, vdot - by element) for AArch32 AdvSIMD ARMv8.6 Extension
On 1/10/20 6:48 PM, Stam Markianos-Wright wrote: > > > On 12/18/19 1:25 PM, Stam Markianos-Wright wrote: >> >> >> On 12/13/19 10:22 AM, Stam Markianos-Wright wrote: >>> Hi all, >>> >>> This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product >>> operations (vector/by element) to the ARM back-end. >>> >>> These are: >>> usdot (vector), dot (by element). >>> >>> The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and >>> for ARM they remain optional as of ARMv8.6-a. >>> >>> The functions are declared in arm_neon.h, RTL patterns are defined to >>> generate assembler and tests are added to verify and perform adequate >>> checks. >>> >>> Regression testing on arm-none-eabi passed successfully. >>> >>> This patch depends on: >>> >>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html >>> >>> for ARM CLI updates, and on: >>> >>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html >>> >>> for testsuite effective_target update. >>> >>> Ok for trunk? >> >> .Ping :) >> > Ping :) > > New diff addressing review comments from Aarch64 version of the patch. > > _Change of order of operands in RTL patterns. > _Change tests to use check-function-bodies, compile with optimisation and > check > for exact registers. > _Rename tests to remove "-compile-" in filename. > Ping! Cheers, Stam >>> >>> Cheers, >>> Stam >>> >>> >>> ACLE documents are at https://developer.arm.com/docs/101028/latest >>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest >>> >>> PS. I don't have commit rights, so if someone could commit on my behalf, >>> that would be great :) >>> >>> >>> gcc/ChangeLog: >>> >>> 2019-11-28 Stam Markianos-Wright >>> >>> * config/arm/arm-builtins.c (enum arm_type_qualifiers): >>> (USTERNOP_QUALIFIERS): New define. >>> (USMAC_LANE_QUADTUP_QUALIFIERS): New define. >>> (SUMAC_LANE_QUADTUP_QUALIFIERS): New define. >>> (arm_expand_builtin_args): >>> Add case ARG_BUILTIN_LANE_QUADTUP_INDEX. >>> (arm_expand_builtin_1): Add qualifier_lane_quadtup_index. >>> * config/arm/arm_neon.h (vusdot_s32): New. >>> (vusdot_lane_s32): New. >>> (vusdotq_lane_s32): New. >>> (vsudot_lane_s32): New. >>> (vsudotq_lane_s32): New. >>> * config/arm/arm_neon_builtins.def >>> (usdot,usdot_lane,sudot_lane): New. >>> * config/arm/iterators.md (DOTPROD_I8MM): New. >>> (sup, opsuffix): Add . >>> * config/arm/neon.md (neon_usdot, dot_lane: New. >>> * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New. >>> >>> >>> gcc/testsuite/ChangeLog: >>> >>> 2019-12-12 Stam Markianos-Wright >>> >>> * gcc.target/arm/simd/vdot-compile-2-1.c: New test. >>> * gcc.target/arm/simd/vdot-compile-2-2.c: New test. >>> * gcc.target/arm/simd/vdot-compile-2-3.c: New test. >>> * gcc.target/arm/simd/vdot-compile-2-4.c: New test. >>> >>> >
Re: [GCC][PATCH][AArch64]Add ACLE intrinsics for dot product (usdot - vector, dot - by element) for AArch64 AdvSIMD ARMv8.6 Extension
On 1/9/20 3:48 PM, Richard Sandiford wrote: > OK, thanks. > Committed as r10-6004-g8c197c851e7528baba7cb837f34c05ba2242f705 Thank you! Stam > Richard > > Stam Markianos-Wright writes: >> On 12/30/19 10:21 AM, Richard Sandiford wrote: >>> Stam Markianos-Wright writes: >>>> On 12/20/19 2:13 PM, Richard Sandiford wrote: >>>>> Stam Markianos-Wright writes: >>>>>> +**... >>>>>> +**ret >>>>>> +*/ >>>>>> +int32x2_t ufoo (int32x2_t r, uint8x8_t x, int8x8_t y) >>>>>> +{ >>>>>> + return vusdot_s32 (r, x, y); >>>>>> +} >>>>>> + >>>>> >>>>> If we're using check-function-bodies anyway, it might be slightly more >>>>> robust to compile at -O and check for the exact RA. E.g.: >>>>> >>>>> /* >>>>> **ufoo: >>>>> **usdotv0\.2s, (v1\.8b, v2\.8b|v2\.8b, v1\.8b) >>>>> **ret >>>>> */ >>>>> >>>>> Just a suggestion though -- either way is fine. >>>> >>>> done this too and as per our internal discussion also added one >>>> xx_untied tests for usdot and one for usdot_lane >>>> >>>> That's one xx_untied test for each of the RTL pattern types added in >>>> aarch64-simd.md. Lmk if this is ok! >>>> >>>> Also I found that the way we were using check-function-bodies wasn't >>>> actually checking the assembler correctly, so I've changed that to: >>>> +/* { dg-final { check-function-bodies "**" "" "" } } */ >>>> +/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */ >>>> which seems to perform more checks >>> >>> Ah, OK, hadn't realised that we were cycling through optimisation >>> options already. In that case, it might be better to leave out the >>> -O from the dg-options and instead use: >>> >>> /* { dg-skip-if "" { *-*-* } { { "-fno-fat-lto-objects" } { "-O0" } } } */ >>> >>> (untested). >>> >>> It's unfortunate that we're skipping this for -O0 though. Ideally we'd >>> still compile the code and just skip the dg-final. Does it work if you do: >>> >>> /* { dg-final { check-function-bodies "**" "" {-O[^0]} } } */ >>> /* { dg-skip-if "" { *-*-* } { { "-fno-fat-lto-objects" } } } */ >>> >>> ? Make sure that we actually still run the check-function-bodies when >>> optimisation is enabled. :-) >> >> This works! >> Now we are only doing the following for O0: >> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O0 (test >> for >> excess errors) >> >> whereas for other optimisation levels do all the checks: >> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 (test >> for >> excess errors) >> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 >> check-function-bodies ufoo >> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 >> check-function-bodies ufooq >> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 >> check-function-bodies ufoo_lane >> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 >> check-function-bodies ufoo_laneq >> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 >> check-function-bodies ufooq_lane >> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 >> check-function-bodies ufooq_laneq >> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 >> check-function-bodies sfoo_lane >> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 >> check-function-bodies sfoo_laneq >> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 >> check-function-bodies sfooq_lane >> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 >> check-function-bodies sfooq_laneq >> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 >> check-function-bodies ufoo_untied >> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 >> check-function-bodies ufooq_laneq_untied >> >>> >>> Also, I'm an idiot. The reason I'd used (...|...) in the regexps was >>> that "dot product is commutative". But of course that's not true for >>> these mixed-sign ops, so the string must be:
Re: [GCC][PATCH][AArch64]Add ACLE intrinsics for bfdot for ARMv8.6 Extension
On 1/9/20 3:54 PM, Richard Sandiford wrote: > Please update the names of the testsuite files to match the ones > in the bfloat16_t patch. (Same for the usdot/sudot patch -- sorry > for forgetting there.) > > OK with that change, thanks. > Done and committed as r10-6006-gf275d73a57f1e5a07fbd4978f4b4457a5eaa1e39 Thank you! Stam > Richard > > Stam Markianos-Wright writes: >> On 12/30/19 10:29 AM, Richard Sandiford wrote: >>> Stam Markianos-Wright writes: >>>> diff --git a/gcc/config/aarch64/aarch64-simd.md >>>> b/gcc/config/aarch64/aarch64-simd.md >>>> index >>>> adfda96f077075ad53d4bea2919c4d3b326e49f5..7587bc46ba1c80389ea49fa83a0e6f8a489711e9 >>>> 100644 >>>> --- a/gcc/config/aarch64/aarch64-simd.md >>>> +++ b/gcc/config/aarch64/aarch64-simd.md >>>> @@ -7028,3 +7028,36 @@ >>>> "xtn\t%0., %1." >>>> [(set_attr "type" "neon_shift_imm_narrow_q")] >>>>) >>>> + >>>> +(define_insn "aarch64_bfdot" >>>> + [(set (match_operand:VDQSF 0 "register_operand" "=w") >>>> + (plus:VDQSF >>>> +(unspec:VDQSF >>>> + [(match_operand: 2 "register_operand" "w") >>>> + (match_operand: 3 "register_operand" "w")] >>>> + UNSPEC_BFDOT) >>>> +(match_operand:VDQSF 1 "register_operand" "0")))] >>>> + "TARGET_BF16_SIMD" >>>> + "bfdot\t%0., %2., %3." >>>> + [(set_attr "type" "neon_dot")] >>>> +) >>>> + >>>> + >>>> +(define_insn "aarch64_bfdot_lane" >>> >>> Too many blank lines. >> >> Fixed, sorry I hadn't noticed! >> >>> >>>> + [(set (match_operand:VDQSF 0 "register_operand" "=w") >>>> + (plus:VDQSF >>>> +(unspec:VDQSF >>>> + [(match_operand: 2 "register_operand" "w") >>>> + (match_operand:VBF 3 "register_operand" "w") >>>> + (match_operand:SI 4 "const_int_operand" "n")] >>>> + UNSPEC_BFDOT) >>>> +(match_operand:VDQSF 1 "register_operand" "0")))] >>>> + "TARGET_BF16_SIMD" >>>> +{ >>>> + int nunits = GET_MODE_NUNITS (mode).to_constant (); >>>> + int lane = INTVAL (operands[4]); >>>> + operands[4] = gen_int_mode (ENDIAN_LANE_N (nunits / 2, lane), SImode); >>>> + return "bfdot\t%0., %2., %3.2h[%4]"; >>>> +} >>>> + [(set_attr "type" "neon_dot")] >>>> +) >>>> [...] >>>> diff --git >>>> a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfdot-compile-1.c >>>> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfdot-compile-1.c >>>> new file mode 100644 >>>> index >>>> ..c575dcd3901172a52fa9403c9179d58eea44eb72 >>>> --- /dev/null >>>> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfdot-compile-1.c >>>> @@ -0,0 +1,91 @@ >>>> +/* { dg-do assemble { target { aarch64*-*-* } } } */ >>>> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ >>>> +/* { dg-add-options arm_v8_2a_bf16_neon } */ >>>> +/* { dg-additional-options "-O -save-temps" } */ >>>> +/* { dg-final { check-function-bodies "**" "" } } */ >>>> +/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */ >>> >>> Same comment as for USDOT/SUDOT regarding the dg- markup. >> >> Done! >>> >>>> + >>>> +#include >>>> + >>>> +/* >>>> +**ufoo: >>>> +**bfdot v0.2s, (v1.4h, v2.4h|v2.4h, v1.4h) >>>> +**ret >>>> +*/ >>>> +float32x2_t ufoo(float32x2_t r, bfloat16x4_t x, bfloat16x4_t y) >>>> +{ >>>> + return vbfdot_f32 (r, x, y); >>>> +} >>>> + >>>> +/* >>>> +**ufooq: >>>> +**bfdot v0.4s, (v1.8h, v2.8h|v2.8h, v1.8h) >>>> +**ret >>>> +*/ >>>> +float32x4_t ufooq(float32x4_t r, bfloat16x8_t x, bfloat16x8_t y) >>>> +{ >>>> + return vbfdotq_f32 (r, x, y); >>>> +} >
Re: [GCC][PATCH][ARM] Add Bfloat16_t scalar type, vector types and machine modes to ARM back-end [2/2]
On 1/13/20 10:43 AM, Kyrill Tkachov wrote: > Hi Stam, > > On 1/10/20 6:47 PM, Stam Markianos-Wright wrote: >> Hi all, >> >> This patch is part 2 of Bfloat16_t enablement in the ARM back-end. >> >> This new type is constrained using target hooks TARGET_INVALID_CONVERSION, >> TARGET_INVALID_UNARY_OP, TARGET_INVALID_BINARY_OP so that it may only be used >> through ACLE intrinsics (will be provided in later patches). >> >> Regression testing on arm-none-eabi passed successfully. >> >> Ok for trunk? > > > Ok. > > Thanks, > > Kyrill Committed as r10-6021-g3ea9140170b8a511822b1a873dea1227093f3ccf Thank you! Stam > > >> >> Cheers, >> Stam >> >> >> ACLE documents are at https://developer.arm.com/docs/101028/latest >> ISA documents are at https://developer.arm.com/docs/ddi0596/latest >> >> Details on ARM Bfloat can be found here: >> https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a >> >> >> >> >> >> gcc/ChangeLog: >> >> 2020-01-10 Stam Markianos-Wright >> >> * config/arm/arm.c >> (arm_invalid_conversion): New function for target hook. >> (arm_invalid_unary_op): New function for target hook. >> (arm_invalid_binary_op): New function for target hook. >> >> 2020-01-10 Stam Markianos-Wright >> >> * gcc.target/arm/bfloat16_scalar_typecheck.c: New test. >> * gcc.target/arm/bfloat16_vector_typecheck_1.c: New test. >> * gcc.target/arm/bfloat16_vector_typecheck_2.c: New test. >> >>
Re: [GCC][PATCH][ARM] Add Bfloat16_t scalar type, vector types and machine modes to ARM back-end [1/2]
On 1/13/20 10:05 AM, Kyrill Tkachov wrote: > Hi Stam, > > On 1/10/20 6:45 PM, Stam Markianos-Wright wrote: >> Hi all, >> >> This is a respin of patch: >> >> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html >> >> which has now been split into two (similar to the Aarch64 version). >> >> This is patch 1 of 2 and adds Bfloat type support to the ARM back-end. >> It also adds a new machine_mode (BFmode) for this type and accompanying >> Vector >> modes V4BFmode and V8BFmode. >> >> The second patch in this series uses existing target hooks to restrict type >> use. >> >> Regression testing on arm-none-eabi passed successfully. >> >> This patch depends on: >> >> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html >> >> for test suite effective_target update. >> >> Ok for trunk? > > This is ok, thanks. > > You can commit it once the git conversion goes through :) Committed as r10-6020-g2e87b2f4121fe1d39edb76f4e492dfe327be6a1b Thank you! Stam > > Kyrill > > >> >> Cheers, >> Stam >> >> >> ACLE documents are at https://developer.arm.com/docs/101028/latest >> ISA documents are at https://developer.arm.com/docs/ddi0596/latest >> >> Details on ARM Bfloat can be found here: >> https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a >> >> >> >> >> >> gcc/ChangeLog: >> >> 2020-01-10 Stam Markianos-Wright >> >> * config.gcc: Add arm_bf16.h. >> * config/arm/arm-builtins.c (arm_mangle_builtin_type): Fix comment. >> (arm_simd_builtin_std_type): Add BFmode. >> (arm_init_simd_builtin_types): Define element types for vector types. >> (arm_init_bf16_types): New function. >> (arm_init_builtins): Add arm_init_bf16_types function call. >> * config/arm/arm-modes.def: Add BFmode and V4BF, V8BF vector modes. >> * config/arm/arm-simd-builtin-types.def: Add V4BF, V8BF. >> * config/arm/arm.c (aapcs_vfp_sub_candidate): Add BFmode. >> (arm_hard_regno_mode_ok): Add BFmode and tidy up statements. >> (arm_vector_mode_supported_p): Add V4BF, V8BF. >> (arm_mangle_type): >> * config/arm/arm.h: Add V4BF, V8BF to VALID_NEON_DREG_MODE, >> VALID_NEON_QREG_MODE respectively. Add export arm_bf16_type_node, >> arm_bf16_ptr_type_node. >> * config/arm/arm.md: New enabled_for_bfmode_scalar, >> enabled_for_bfmode_vector attributes. Add BFmode to movhf expand. >> pattern and define_split between ARM registers. >> * config/arm/arm_bf16.h: New file. >> * config/arm/arm_neon.h: Add arm_bf16.h and Bfloat vector types. >> * config/arm/iterators.md (ANY64_BF, VDXMOV, VHFBF, HFBF, fporbf): >> New. >> (VQXMOV): Add V8BF. >> * config/arm/neon.md: Add BF vector types to NEON move patterns. >> * config/arm/vfp.md: Add BFmode to movhf patterns. >> >> gcc/testsuite/ChangeLog: >> >> 2020-01-10 Stam Markianos-Wright >> >> * g++.dg/abi/mangle-neon.C: Add Bfloat vector types. >> * g++.dg/ext/arm-bf16/bf16-mangle-1.C: New test. >> * gcc.target/arm/bfloat16_scalar_1_1.c: New test. >> * gcc.target/arm/bfloat16_scalar_1_2.c: New test. >> * gcc.target/arm/bfloat16_scalar_2_1.c: New test. >> * gcc.target/arm/bfloat16_scalar_2_2.c: New test. >> * gcc.target/arm/bfloat16_scalar_3_1.c: New test. >> * gcc.target/arm/bfloat16_scalar_3_2.c: New test. >> * gcc.target/arm/bfloat16_scalar_4.c: New test. >> * gcc.target/arm/bfloat16_simd_1_1.c: New test. >> * gcc.target/arm/bfloat16_simd_1_2.c: New test. >> * gcc.target/arm/bfloat16_simd_2_1.c: New test. >> * gcc.target/arm/bfloat16_simd_2_2.c: New test. >> * gcc.target/arm/bfloat16_simd_3_1.c: New test. >> * gcc.target/arm/bfloat16_simd_3_2.c: New test. >> >> >>
Re: [GCC][PATCH][Aarch64] Add Bfloat16_t scalar type, vector types and machine modes to Aarch64 back-end [2/2]
On 1/10/20 4:29 PM, Richard Sandiford wrote: > Stam Markianos-Wright writes: >> On 1/9/20 4:13 PM, Stam Markianos-Wright wrote: >>> On 1/9/20 4:07 PM, Richard Sandiford wrote: >>>> Stam Markianos-Wright writes: >>>>> diff --git a/gcc/testsuite/g++.target/aarch64/bfloat_cpp_typecheck.C >>>>> b/gcc/testsuite/g++.target/aarch64/bfloat_cpp_typecheck.C >>>>> new file mode 100644 >>>>> index 000..55cbb0b0ef7 >>>>> --- /dev/null >>>>> +++ b/gcc/testsuite/g++.target/aarch64/bfloat_cpp_typecheck.C >>>>> @@ -0,0 +1,14 @@ >>>>> +/* { dg-do assemble { target { aarch64*-*-* } } } */ >>>>> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ >>>>> +/* { dg-add-options arm_v8_2a_bf16_neon } */ >>>>> +/* { dg-additional-options "-O3 --save-temps" } */ >>>>> + >>>>> +#include >>>>> + >>>>> +void foo (void) >>>>> +{ >>>>> + bfloat16_t (); /* { dg-error {invalid conversion to type 'bfloat16_t'} >>>>> "" >>>>> {target *-*-*} } */ >>>> >>>> The "" {target *-*-*} stuff isn't needed: that's just for when the test >>>> depends on a target selector or if you need to specify a line number >>>> (which comes after the target). >> >> Removed them. >> >>> >>> Ah ok cool. I just had something that worked and was just doing ctrl+c >>> ctrl+v >>> everywhere! >>> >>>> >>>> Same for the rest of the patch. >>>> >>>>> + bfloat16_t a = bfloat16_t(); /* { dg-error {invalid conversion to type >>>>> 'bfloat16_t'} "" {target *-*-*} } */ >>>> >>>> Why's this one an error? Looks like it should be OK. Do we build >>>> bfloat16_t() as a conversion from a zero integer? >>>> >>> Yea that's exactly what it looked like when I went into the debugging! But >>> will >>> investigate a bit further and see if I can fix it for the next revision. >>> >> >> Changed this to dg-bogus with an XFAIL for the purposes of this patch in >> Stage 3 :) > > Yeah. Like we discussed off-list, we'd need to change the target hook > to do this properly. (And if we do change the target hook, it would be > good to make it output the errors itself, like we discussed upthread.) > Something for GCC 11 perhaps... Agreed! > >> diff --git a/gcc/testsuite/g++.target/aarch64/bfloat_cpp_typecheck.C >> b/gcc/testsuite/g++.target/aarch64/bfloat_cpp_typecheck.C >> new file mode 100644 >> index >> ..0a04cfb18e567ae0eec88da8ea37922434c60080 >> --- /dev/null >> +++ b/gcc/testsuite/g++.target/aarch64/bfloat_cpp_typecheck.C >> @@ -0,0 +1,14 @@ >> +/* { dg-do assemble { target { aarch64*-*-* } } } */ >> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ >> +/* { dg-add-options arm_v8_2a_bf16_neon } */ >> +/* { dg-additional-options "-O3 --save-temps" } */ >> + >> +#include >> + >> +void foo (void) >> +{ >> + bfloat16_t (); /* { dg-bogus {invalid conversion to type 'bfloat16_t'} "" >> { xfail *-*-* } } */ >> + bfloat16_t a = bfloat16_t(); /* { dg-error {invalid conversion to type >> 'bfloat16_t'} } */ > > This should be a dg-bogus too. Done and committed as 280130. Diff attached for reference. Cheers, Stam > > OK with that change, thanks. > > Richard > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index ebd3f6cf45b..ce410ddf551 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -21760,6 +21760,55 @@ aarch64_stack_protect_guard (void) return NULL_TREE; } +/* Return the diagnostic message string if conversion from FROMTYPE to + TOTYPE is not allowed, NULL otherwise. */ + +static const char * +aarch64_invalid_conversion (const_tree fromtype, const_tree totype) +{ + if (element_mode (fromtype) != element_mode (totype)) +{ + /* Do no allow conversions to/from BFmode scalar types. */ + if (TYPE_MODE (fromtype) == BFmode) + return N_("invalid conversion from type %"); + if (TYPE_MODE (totype) == BFmode) + return N_("invalid conversion to type %"); +} + + /* Conversion allowed. */ + return NULL; +} + +/* Return the diagnostic message string if the unary operation OP is + not permitted on TYPE, NULL otherwise. */ + +static const char * +aarch64_invalid_unary_op (int op, const_tree typ
Re: [GCC][PATCH][Aarch64] Add Bfloat16_t scalar type, vector types and machine modes to Aarch64 back-end [1/2]
On 1/9/20 3:42 PM, Richard Sandiford wrote: > Thanks for the update, looks great. > > Stam Markianos-Wright writes: >> diff --git a/gcc/config/aarch64/arm_bf16.h b/gcc/config/aarch64/arm_bf16.h >> new file mode 100644 >> index >> ..884b6f3bc7a28c516e54c26a71b1b769f55867a7 >> --- /dev/null >> +++ b/gcc/config/aarch64/arm_bf16.h >> @@ -0,0 +1,32 @@ >> +/* Arm BF16 instrinsics include file. >> + >> + Copyright (C) 2019 Free Software Foundation, Inc. >> + Contributed by Arm. > > Needs to include 2020 now :-) Maybe 2019-2020 since it was posted > in 2019 and would have been changed to 2019-2020 in the automatic update. > > Which reminds me to update my patches too... > > OK for trunk with that change, thanks. Done and committed as 280129. Diff attached for reference (and as an attempt to try and keep myself sane and not mix it all up!) Cheers, Stam > > Richard > diff --git a/gcc/config.gcc b/gcc/config.gcc index c3d6464f3e6adaa1db818a61de00cff8e00ae08e..075e46072d1643302b9587d4e3f14f2e29b4ec8d 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -315,7 +315,7 @@ m32c*-*-*) ;; aarch64*-*-*) cpu_type=aarch64 - extra_headers="arm_fp16.h arm_neon.h arm_acle.h arm_sve.h" + extra_headers="arm_fp16.h arm_neon.h arm_bf16.h arm_acle.h arm_sve.h" c_target_objs="aarch64-c.o" cxx_target_objs="aarch64-c.o" d_target_objs="aarch64-d.o" diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index 1bd2640a1ced352de232fed1cf134b46c69b80f7..b2d6b761489183c262320d62293bec343b315c11 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -68,6 +68,9 @@ #define hi_UPE_HImode #define hf_UPE_HFmode #define qi_UPE_QImode +#define bf_UPE_BFmode +#define v4bf_UP E_V4BFmode +#define v8bf_UP E_V8BFmode #define UP(X) X##_UP #define SIMD_MAX_BUILTIN_ARGS 5 @@ -568,6 +571,10 @@ static tree aarch64_simd_intXI_type_node = NULL_TREE; tree aarch64_fp16_type_node = NULL_TREE; tree aarch64_fp16_ptr_type_node = NULL_TREE; +/* Back-end node type for brain float (bfloat) types. */ +tree aarch64_bf16_type_node = NULL_TREE; +tree aarch64_bf16_ptr_type_node = NULL_TREE; + /* Wrapper around add_builtin_function. NAME is the name of the built-in function, TYPE is the function type, and CODE is the function subcode (relative to AARCH64_BUILTIN_GENERAL). */ @@ -659,6 +666,8 @@ aarch64_simd_builtin_std_type (machine_mode mode, return float_type_node; case E_DFmode: return double_type_node; +case E_BFmode: + return aarch64_bf16_type_node; default: gcc_unreachable (); } @@ -750,6 +759,10 @@ aarch64_init_simd_builtin_types (void) aarch64_simd_types[Float64x1_t].eltype = double_type_node; aarch64_simd_types[Float64x2_t].eltype = double_type_node; + /* Init Bfloat vector types with underlying __bf16 type. */ + aarch64_simd_types[Bfloat16x4_t].eltype = aarch64_bf16_type_node; + aarch64_simd_types[Bfloat16x8_t].eltype = aarch64_bf16_type_node; + for (i = 0; i < nelts; i++) { tree eltype = aarch64_simd_types[i].eltype; @@ -1059,6 +1072,19 @@ aarch64_init_fp16_types (void) aarch64_fp16_ptr_type_node = build_pointer_type (aarch64_fp16_type_node); } +/* Initialize the backend REAL_TYPE type supporting bfloat types. */ +static void +aarch64_init_bf16_types (void) +{ + aarch64_bf16_type_node = make_node (REAL_TYPE); + TYPE_PRECISION (aarch64_bf16_type_node) = 16; + SET_TYPE_MODE (aarch64_bf16_type_node, BFmode); + layout_type (aarch64_bf16_type_node); + + lang_hooks.types.register_builtin_type (aarch64_bf16_type_node, "__bf16"); + aarch64_bf16_ptr_type_node = build_pointer_type (aarch64_bf16_type_node); +} + /* Pointer authentication builtins that will become NOP on legacy platform. Currently, these builtins are for internal use only (libgcc EH unwinder). */ @@ -1214,6 +1240,8 @@ aarch64_general_init_builtins (void) aarch64_init_fp16_types (); + aarch64_init_bf16_types (); + if (TARGET_SIMD) aarch64_init_simd_builtins (); diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def index 6cd8ed0972ad7029e0319aad71d3afbda5684a4f..1eeb8d884520b1a53b8a580f165d42858c03228c 100644 --- a/gcc/config/aarch64/aarch64-modes.def +++ b/gcc/config/aarch64/aarch64-modes.def @@ -69,6 +69,13 @@ VECTOR_MODES (FLOAT, 16); /*V4SF V2DF. */ VECTOR_MODE (FLOAT, DF, 1); /* V1DF. */ VECTOR_MODE (FLOAT, HF, 2); /* V2HF. */ +/* Bfloat16 modes. */ +FLOAT_MODE (BF, 2, 0); +ADJUST_FLOAT_FORMAT (BF, _bfloat_half_format); + +VECTOR_MODE (FLOAT, BF, 4); /* V4BF. */ +VECTOR_MODE (FLOAT, BF, 8); /* V8BF. */ + /* Oct Int: 256-bit inte
Re: [Ping][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, vdot - by element) for AArch32 AdvSIMD ARMv8.6 Extension
On 12/18/19 1:25 PM, Stam Markianos-Wright wrote: > > > On 12/13/19 10:22 AM, Stam Markianos-Wright wrote: >> Hi all, >> >> This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product >> operations (vector/by element) to the ARM back-end. >> >> These are: >> usdot (vector), dot (by element). >> >> The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and >> for ARM they remain optional as of ARMv8.6-a. >> >> The functions are declared in arm_neon.h, RTL patterns are defined to >> generate assembler and tests are added to verify and perform adequate checks. >> >> Regression testing on arm-none-eabi passed successfully. >> >> This patch depends on: >> >> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html >> >> for ARM CLI updates, and on: >> >> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html >> >> for testsuite effective_target update. >> >> Ok for trunk? > > .Ping :) > Ping :) New diff addressing review comments from Aarch64 version of the patch. _Change of order of operands in RTL patterns. _Change tests to use check-function-bodies, compile with optimisation and check for exact registers. _Rename tests to remove "-compile-" in filename. >> >> Cheers, >> Stam >> >> >> ACLE documents are at https://developer.arm.com/docs/101028/latest >> ISA documents are at https://developer.arm.com/docs/ddi0596/latest >> >> PS. I don't have commit rights, so if someone could commit on my behalf, >> that would be great :) >> >> >> gcc/ChangeLog: >> >> 2019-11-28 Stam Markianos-Wright >> >> * config/arm/arm-builtins.c (enum arm_type_qualifiers): >> (USTERNOP_QUALIFIERS): New define. >> (USMAC_LANE_QUADTUP_QUALIFIERS): New define. >> (SUMAC_LANE_QUADTUP_QUALIFIERS): New define. >> (arm_expand_builtin_args): >> Add case ARG_BUILTIN_LANE_QUADTUP_INDEX. >> (arm_expand_builtin_1): Add qualifier_lane_quadtup_index. >> * config/arm/arm_neon.h (vusdot_s32): New. >> (vusdot_lane_s32): New. >> (vusdotq_lane_s32): New. >> (vsudot_lane_s32): New. >> (vsudotq_lane_s32): New. >> * config/arm/arm_neon_builtins.def >> (usdot,usdot_lane,sudot_lane): New. >> * config/arm/iterators.md (DOTPROD_I8MM): New. >> (sup, opsuffix): Add . >> * config/arm/neon.md (neon_usdot, dot_lane: New. >> * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New. >> >> >> gcc/testsuite/ChangeLog: >> >> 2019-12-12 Stam Markianos-Wright >> >> * gcc.target/arm/simd/vdot-compile-2-1.c: New test. >> * gcc.target/arm/simd/vdot-compile-2-2.c: New test. >> * gcc.target/arm/simd/vdot-compile-2-3.c: New test. >> * gcc.target/arm/simd/vdot-compile-2-4.c: New test. >> >> diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index df84560588a..1b4316d0e93 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -86,7 +86,10 @@ enum arm_type_qualifiers qualifier_const_void_pointer = 0x802, /* Lane indices selected in pairs - must be within range of previous argument = a vector. */ - qualifier_lane_pair_index = 0x1000 + qualifier_lane_pair_index = 0x1000, + /* Lane indices selected in quadtuplets - must be within range of previous + argument = a vector. */ + qualifier_lane_quadtup_index = 0x2000 }; /* The qualifier_internal allows generation of a unary builtin from @@ -122,6 +125,13 @@ arm_unsigned_uternop_qualifiers[SIMD_MAX_BUILTIN_ARGS] qualifier_unsigned }; #define UTERNOP_QUALIFIERS (arm_unsigned_uternop_qualifiers) +/* T (T, unsigned T, T). */ +static enum arm_type_qualifiers +arm_usternop_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_unsigned, + qualifier_none }; +#define USTERNOP_QUALIFIERS (arm_usternop_qualifiers) + /* T (T, immediate). */ static enum arm_type_qualifiers arm_binop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] @@ -176,6 +186,20 @@ arm_umac_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] qualifier_unsigned, qualifier_lane_index }; #define UMAC_LANE_QUALIFIERS (arm_umac_lane_qualifiers) +/* T (T, unsigned T, T, lane index). */ +static enum arm_type_qualifiers +arm_usmac_lane_quadtup_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_unsigned, + qualifier_none, qualifier_lane_quadtup_index }; +#define USMAC_LANE_QUADTUP_QUALIFIERS (arm_usmac_lane_quadtup_qualifiers) + +/* T (T, T, unsigend T, lane index). */ +static enum arm_type_qualifiers +a
[GCC][PATCH][ARM] Add Bfloat16_t scalar type, vector types and machine modes to ARM back-end [2/2]
Hi all, This patch is part 2 of Bfloat16_t enablement in the ARM back-end. This new type is constrained using target hooks TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP, TARGET_INVALID_BINARY_OP so that it may only be used through ACLE intrinsics (will be provided in later patches). Regression testing on arm-none-eabi passed successfully. Ok for trunk? Cheers, Stam ACLE documents are at https://developer.arm.com/docs/101028/latest ISA documents are at https://developer.arm.com/docs/ddi0596/latest Details on ARM Bfloat can be found here: https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a gcc/ChangeLog: 2020-01-10 Stam Markianos-Wright * config/arm/arm.c (arm_invalid_conversion): New function for target hook. (arm_invalid_unary_op): New function for target hook. (arm_invalid_binary_op): New function for target hook. 2020-01-10 Stam Markianos-Wright * gcc.target/arm/bfloat16_scalar_typecheck.c: New test. * gcc.target/arm/bfloat16_vector_typecheck_1.c: New test. * gcc.target/arm/bfloat16_vector_typecheck_2.c: New test. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 9bd228b5433..d4180d4166c 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -688,6 +688,15 @@ static const struct attribute_spec arm_attribute_table[] = #undef TARGET_MANGLE_TYPE #define TARGET_MANGLE_TYPE arm_mangle_type +#undef TARGET_INVALID_CONVERSION +#define TARGET_INVALID_CONVERSION arm_invalid_conversion + +#undef TARGET_INVALID_UNARY_OP +#define TARGET_INVALID_UNARY_OP arm_invalid_unary_op + +#undef TARGET_INVALID_BINARY_OP +#define TARGET_INVALID_BINARY_OP arm_invalid_binary_op + #undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV #define TARGET_ATOMIC_ASSIGN_EXPAND_FENV arm_atomic_assign_expand_fenv @@ -32432,6 +32441,55 @@ arm_coproc_ldc_stc_legitimate_address (rtx op) return false; } +/* Return the diagnostic message string if conversion from FROMTYPE to + TOTYPE is not allowed, NULL otherwise. */ + +static const char * +arm_invalid_conversion (const_tree fromtype, const_tree totype) +{ + if (element_mode (fromtype) != element_mode (totype)) +{ + /* Do no allow conversions to/from BFmode scalar types. */ + if (TYPE_MODE (fromtype) == BFmode) + return N_("invalid conversion from type %"); + if (TYPE_MODE (totype) == BFmode) + return N_("invalid conversion to type %"); +} + + /* Conversion allowed. */ + return NULL; +} + +/* Return the diagnostic message string if the unary operation OP is + not permitted on TYPE, NULL otherwise. */ + +static const char * +arm_invalid_unary_op (int op, const_tree type) +{ + /* Reject all single-operand operations on BFmode except for &. */ + if (element_mode (type) == BFmode && op != ADDR_EXPR) +return N_("operation not permitted on type %"); + + /* Operation allowed. */ + return NULL; +} + +/* Return the diagnostic message string if the binary operation OP is + not permitted on TYPE1 and TYPE2, NULL otherwise. */ + +static const char * +arm_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1, + const_tree type2) +{ + /* Reject all 2-operand operations on BFmode. */ + if (element_mode (type1) == BFmode + || element_mode (type2) == BFmode) +return N_("operation not permitted on type %"); + + /* Operation allowed. */ + return NULL; +} + /* Implement TARGET_CAN_CHANGE_MODE_CLASS. In VFPv1, VFP registers could only be accessed in the mode they were diff --git a/gcc/testsuite/g++.target/arm/bfloat_cpp_typecheck.C b/gcc/testsuite/g++.target/arm/bfloat_cpp_typecheck.C new file mode 100644 index 000..3e6f7d83752 --- /dev/null +++ b/gcc/testsuite/g++.target/arm/bfloat_cpp_typecheck.C @@ -0,0 +1,14 @@ +/* { dg-do assemble { target { arm*-*-* } } } */ +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ +/* { dg-add-options arm_v8_2a_bf16_neon } */ +/* { dg-additional-options "-O3 --save-temps" } */ + +#include + +void foo (void) +{ + bfloat16_t (); /* { dg-bogus {invalid conversion to type 'bfloat16_t'} "" { xfail *-*-* } } */ + bfloat16_t a = bfloat16_t(); /* { dg-bogus {invalid conversion to type 'bfloat16_t'} "" { xfail *-*-* } } */ + bfloat16_t (0x1234); /* { dg-error {invalid conversion to type 'bfloat16_t'} } */ + bfloat16_t (0.1); /* { dg-error {invalid conversion to type 'bfloat16_t'} } */ +} diff --git a/gcc/testsuite/gcc.target/arm/bfloat16_scalar_typecheck.c b/gcc/testsuite/gcc.target/arm/bfloat16_scalar_typecheck.c new file mode 100644 index 000..672641e6630 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/bfloat16_scalar_typecheck.c @@ -0,0 +1,219 @@ +/* { dg-do assemble { target { arm*-*-* } } } */ +/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */ +/* { dg-require-effective-tar
[GCC][PATCH][ARM] Add Bfloat16_t scalar type, vector types and machine modes to ARM back-end [1/2]
Hi all, This is a respin of patch: https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html which has now been split into two (similar to the Aarch64 version). This is patch 1 of 2 and adds Bfloat type support to the ARM back-end. It also adds a new machine_mode (BFmode) for this type and accompanying Vector modes V4BFmode and V8BFmode. The second patch in this series uses existing target hooks to restrict type use. Regression testing on arm-none-eabi passed successfully. This patch depends on: https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html for test suite effective_target update. Ok for trunk? Cheers, Stam ACLE documents are at https://developer.arm.com/docs/101028/latest ISA documents are at https://developer.arm.com/docs/ddi0596/latest Details on ARM Bfloat can be found here: https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a gcc/ChangeLog: 2020-01-10 Stam Markianos-Wright * config.gcc: Add arm_bf16.h. * config/arm/arm-builtins.c (arm_mangle_builtin_type): Fix comment. (arm_simd_builtin_std_type): Add BFmode. (arm_init_simd_builtin_types): Define element types for vector types. (arm_init_bf16_types): New function. (arm_init_builtins): Add arm_init_bf16_types function call. * config/arm/arm-modes.def: Add BFmode and V4BF, V8BF vector modes. * config/arm/arm-simd-builtin-types.def: Add V4BF, V8BF. * config/arm/arm.c (aapcs_vfp_sub_candidate): Add BFmode. (arm_hard_regno_mode_ok): Add BFmode and tidy up statements. (arm_vector_mode_supported_p): Add V4BF, V8BF. (arm_mangle_type): * config/arm/arm.h: Add V4BF, V8BF to VALID_NEON_DREG_MODE, VALID_NEON_QREG_MODE respectively. Add export arm_bf16_type_node, arm_bf16_ptr_type_node. * config/arm/arm.md: New enabled_for_bfmode_scalar, enabled_for_bfmode_vector attributes. Add BFmode to movhf expand. pattern and define_split between ARM registers. * config/arm/arm_bf16.h: New file. * config/arm/arm_neon.h: Add arm_bf16.h and Bfloat vector types. * config/arm/iterators.md (ANY64_BF, VDXMOV, VHFBF, HFBF, fporbf): New. (VQXMOV): Add V8BF. * config/arm/neon.md: Add BF vector types to NEON move patterns. * config/arm/vfp.md: Add BFmode to movhf patterns. gcc/testsuite/ChangeLog: 2020-01-10 Stam Markianos-Wright * g++.dg/abi/mangle-neon.C: Add Bfloat vector types. * g++.dg/ext/arm-bf16/bf16-mangle-1.C: New test. * gcc.target/arm/bfloat16_scalar_1_1.c: New test. * gcc.target/arm/bfloat16_scalar_1_2.c: New test. * gcc.target/arm/bfloat16_scalar_2_1.c: New test. * gcc.target/arm/bfloat16_scalar_2_2.c: New test. * gcc.target/arm/bfloat16_scalar_3_1.c: New test. * gcc.target/arm/bfloat16_scalar_3_2.c: New test. * gcc.target/arm/bfloat16_scalar_4.c: New test. * gcc.target/arm/bfloat16_simd_1_1.c: New test. * gcc.target/arm/bfloat16_simd_1_2.c: New test. * gcc.target/arm/bfloat16_simd_2_1.c: New test. * gcc.target/arm/bfloat16_simd_2_2.c: New test. * gcc.target/arm/bfloat16_simd_3_1.c: New test. * gcc.target/arm/bfloat16_simd_3_2.c: New test. diff --git a/gcc/config.gcc b/gcc/config.gcc index c3d6464f3e6adaa1db818a61de00cff8e00ae08e..6a7a4725fe5e99fba16b40b18cfebb84984d06b8 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -344,7 +344,7 @@ arc*-*-*) arm*-*-*) cpu_type=arm extra_objs="arm-builtins.o aarch-common.o" - extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h arm_cmse.h" + extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h arm_cmse.h arm_bf16.h" target_type_format_char='%' c_target_objs="arm-c.o" cxx_target_objs="arm-c.o" diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index df84560588a842ce3c69c589367625f6098cb5bb..7f279cca6688c6f11948159666ee647ae533c61d 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -315,12 +315,14 @@ arm_set_sat_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define v8qi_UP E_V8QImode #define v4hi_UP E_V4HImode #define v4hf_UP E_V4HFmode +#define v4bf_UP E_V4BFmode #define v2si_UP E_V2SImode #define v2sf_UP E_V2SFmode #define di_UPE_DImode #define v16qi_UP E_V16QImode #define v8hi_UP E_V8HImode #define v8hf_UP E_V8HFmode +#define v8bf_UP E_V8BFmode #define v4si_UP E_V4SImode #define v4sf_UP E_V4SFmode #define v2di_UP E_V2DImode @@ -328,9 +330,10 @@ arm_set_sat_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define ei_UP E_EImode #define oi_UP E_OImode #define hf_UP E_HFmode +#define bf_UPE_BFmode #define si_UP E_SImode #define void_UP E_VOIDmode - +#define sf_UP E_SFmode #define UP(X) X##_UP typedef struct { @@ -806,6 +809,11 @@ static struct arm_s
Re: [GCC][PATCH][Aarch64] Add Bfloat16_t scalar type, vector types and machine modes to Aarch64 back-end [2/2]
On 1/9/20 4:13 PM, Stam Markianos-Wright wrote: > > > On 1/9/20 4:07 PM, Richard Sandiford wrote: >> Stam Markianos-Wright writes: >>> diff --git a/gcc/testsuite/g++.target/aarch64/bfloat_cpp_typecheck.C >>> b/gcc/testsuite/g++.target/aarch64/bfloat_cpp_typecheck.C >>> new file mode 100644 >>> index 000..55cbb0b0ef7 >>> --- /dev/null >>> +++ b/gcc/testsuite/g++.target/aarch64/bfloat_cpp_typecheck.C >>> @@ -0,0 +1,14 @@ >>> +/* { dg-do assemble { target { aarch64*-*-* } } } */ >>> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ >>> +/* { dg-add-options arm_v8_2a_bf16_neon } */ >>> +/* { dg-additional-options "-O3 --save-temps" } */ >>> + >>> +#include >>> + >>> +void foo (void) >>> +{ >>> + bfloat16_t (); /* { dg-error {invalid conversion to type 'bfloat16_t'} >>> "" >>> {target *-*-*} } */ >> >> The "" {target *-*-*} stuff isn't needed: that's just for when the test >> depends on a target selector or if you need to specify a line number >> (which comes after the target). Removed them. > > Ah ok cool. I just had something that worked and was just doing ctrl+c ctrl+v > everywhere! > >> >> Same for the rest of the patch. >> >>> + bfloat16_t a = bfloat16_t(); /* { dg-error {invalid conversion to type >>> 'bfloat16_t'} "" {target *-*-*} } */ >> >> Why's this one an error? Looks like it should be OK. Do we build >> bfloat16_t() as a conversion from a zero integer? >> > Yea that's exactly what it looked like when I went into the debugging! But > will > investigate a bit further and see if I can fix it for the next revision. > Changed this to dg-bogus with an XFAIL for the purposes of this patch in Stage 3 :) > Thank you so much for the help in getting these fixed :D > > Cheers, > Stam > >> Looks good otherwise, thanks, but I think we should try to support >> the line above if we can. >> >> Richard >> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index ebd3f6cf45bc0b5118c4c39e323e6380d64c885e..ce410ddf5515407a4680e186b04c6b6a40ae2562 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -21760,6 +21760,55 @@ aarch64_stack_protect_guard (void) return NULL_TREE; } +/* Return the diagnostic message string if conversion from FROMTYPE to + TOTYPE is not allowed, NULL otherwise. */ + +static const char * +aarch64_invalid_conversion (const_tree fromtype, const_tree totype) +{ + if (element_mode (fromtype) != element_mode (totype)) +{ + /* Do no allow conversions to/from BFmode scalar types. */ + if (TYPE_MODE (fromtype) == BFmode) + return N_("invalid conversion from type %"); + if (TYPE_MODE (totype) == BFmode) + return N_("invalid conversion to type %"); +} + + /* Conversion allowed. */ + return NULL; +} + +/* Return the diagnostic message string if the unary operation OP is + not permitted on TYPE, NULL otherwise. */ + +static const char * +aarch64_invalid_unary_op (int op, const_tree type) +{ + /* Reject all single-operand operations on BFmode except for &. */ + if (element_mode (type) == BFmode && op != ADDR_EXPR) +return N_("operation not permitted on type %"); + + /* Operation allowed. */ + return NULL; +} + +/* Return the diagnostic message string if the binary operation OP is + not permitted on TYPE1 and TYPE2, NULL otherwise. */ + +static const char * +aarch64_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1, + const_tree type2) +{ + /* Reject all 2-operand operations on BFmode. */ + if (element_mode (type1) == BFmode + || element_mode (type2) == BFmode) +return N_("operation not permitted on type %"); + + /* Operation allowed. */ + return NULL; +} + /* Implement TARGET_ASM_FILE_END for AArch64. This adds the AArch64 GNU NOTE section at the end if needed. */ #define GNU_PROPERTY_AARCH64_FEATURE_1_AND 0xc000 @@ -22010,6 +22059,15 @@ aarch64_libgcc_floating_mode_supported_p #undef TARGET_MANGLE_TYPE #define TARGET_MANGLE_TYPE aarch64_mangle_type +#undef TARGET_INVALID_CONVERSION +#define TARGET_INVALID_CONVERSION aarch64_invalid_conversion + +#undef TARGET_INVALID_UNARY_OP +#define TARGET_INVALID_UNARY_OP aarch64_invalid_unary_op + +#undef TARGET_INVALID_BINARY_OP +#define TARGET_INVALID_BINARY_OP aarch64_invalid_binary_op + #undef TARGET_VERIFY_TYPE_CONTEXT #define TARGET_VERIFY_TYPE_CONTEXT aarch64_verify_type_context diff --git a/gcc/testsuite/g++.target/aarch64/bfloat_cpp_typecheck.C b/gcc/testsuite/g++.target/aarch64/bfloat_cpp_typecheck.C new file mode 100644 index
Re: [GCC][PATCH][Aarch64] Add Bfloat16_t scalar type, vector types and machine modes to Aarch64 back-end [2/2]
On 1/9/20 4:07 PM, Richard Sandiford wrote: > Stam Markianos-Wright writes: >> diff --git a/gcc/testsuite/g++.target/aarch64/bfloat_cpp_typecheck.C >> b/gcc/testsuite/g++.target/aarch64/bfloat_cpp_typecheck.C >> new file mode 100644 >> index 000..55cbb0b0ef7 >> --- /dev/null >> +++ b/gcc/testsuite/g++.target/aarch64/bfloat_cpp_typecheck.C >> @@ -0,0 +1,14 @@ >> +/* { dg-do assemble { target { aarch64*-*-* } } } */ >> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ >> +/* { dg-add-options arm_v8_2a_bf16_neon } */ >> +/* { dg-additional-options "-O3 --save-temps" } */ >> + >> +#include >> + >> +void foo (void) >> +{ >> + bfloat16_t (); /* { dg-error {invalid conversion to type 'bfloat16_t'} "" >> {target *-*-*} } */ > > The "" {target *-*-*} stuff isn't needed: that's just for when the test > depends on a target selector or if you need to specify a line number > (which comes after the target). Ah ok cool. I just had something that worked and was just doing ctrl+c ctrl+v everywhere! > > Same for the rest of the patch. > >> + bfloat16_t a = bfloat16_t(); /* { dg-error {invalid conversion to type >> 'bfloat16_t'} "" {target *-*-*} } */ > > Why's this one an error? Looks like it should be OK. Do we build > bfloat16_t() as a conversion from a zero integer? > Yea that's exactly what it looked like when I went into the debugging! But will investigate a bit further and see if I can fix it for the next revision. Thank you so much for the help in getting these fixed :D Cheers, Stam > Looks good otherwise, thanks, but I think we should try to support > the line above if we can. > > Richard >
Re: [GCC][PATCH][Aarch64] Add Bfloat16_t scalar type, vector types and machine modes to Aarch64 back-end [1/2]
On 1/7/20 5:14 PM, Richard Sandiford wrote: > Thanks for the update. The new patch looks really good, just some > minor comments. > > Stam Markianos-Wright writes: >> [...] >> Also I've update the filenames of all our tests to make them a bit clearer: >> >> C tests: >> >> __ bfloat16_scalar_compile_1.c to bfloat16_scalar_compile_3.c: Compilation of >> scalar moves/loads/stores with "-march8.2-a+bf16", "-march8.2-a and +bf16 >> target >> pragma", "-march8.2-a" (now does not error out at all). There now include >> register asms to check more MOV alternatives. >> >> __ bfloat16_scalar_compile_4.c: The _Complex error test. >> >> __ bfloat16_simd_compile_1.c to bfloat16_simd_compile_3.c: Likewise to >> x_scalar_x, but also include (vector) 0x1234.. compilation (no assembler >> scan). > > Sounds good to me, although TBH the "_compile" feels a bit redundant. Yes, true that! Removed it. > >> I had also done a small c++ test, but have chosen to shift that to the [2/2] >> patch because it is currently being blocked by target_invalid_conversion. > > OK. Does that include the mangling test? Aaah no, this is the test checking for bfloat16_t(), bfloat16_t (0x1234), bfloat16_t(0.25), etc. (which are more of language-level checks) Oh! I had forgotten about the mangling, so I've added it in this revision. > >> [...] >>>>> - a test that involves moving constants, for both scalars and vectors. >>>>> You can create zero scalar constants in C++ using bfloat16_t() etc. >>>>> For vectors it's possible to do things like: >>>>> >>>>>typedef short v2bf __attribute__((vector_size(4))); >>>>>v2hi foo (void) { return (v2hi) 0x12345678; } >>>>> >>>>> The same sort of things should work for bfloat16x4_t and >>>>> bfloat16x8_t. >>>> >>>> Leaving this as an open issue for now because I'm not 100% sure what we >>>> should/shouldn't be allowing past the tree-level target hooks. >>>> >>>> If we do want to block this we would do this in the [2/2] patch. >>>> I will come back to it and create a scan-assembler test when I'm more >>>> clear on >>>> what we should and shouldn't allow at the higher level :) >>> >>> FWIW, I'm not sure we should go out of our way to disallow this. >>> Preventing bfloat16_t() in C++ would IMO be unnatural. And the >>> "(vector) vector-sized-integer" syntax specifically treats the vector >>> as a bundle of bits without really caring what the element type is. >>> Even if we did manage to forbid the conversion in that context, >>> it would still be possible to achieve the same thing using: >>> >>> v2hi >>> foo (void) >>> { >>>union { v2hi v; unsigned int i; } u; >>>u.i = 0x12345678; >>>return u.v; >>> } >>> >> Added the compilation of "(vector) vector-sized-integer" in the vector tests. >> >> But target_invalid_conversion in the [2/2] patch is a complication to this >> (as >> with bfloat_16t() in c++. >> >> I was under the impression that the original intent of bfloat was for it to >> be >> storage only, with any initialisation happening through the float32 convert >> intrinsic. >> >> Either I'd be happy to allow it, but it does feel like we'd slightly be going >> against what's the ACLE currently. >> However, looking back at it now, it only mentions using ACLE intrinsics over >> C >> operators, so I'd be happy to allow this for vectors. >> >> For scalars though, if we e.g. were to allow: >> >> bfloat16_t (0x1234); >> >> on a single bfloat, I don't see how we could still block conversions like: >> >> bfloat16_t scalar1 = 0.1; >> bfloat16_t scalar2 = 0; >> bfloat16_t scalar3 = is_a_float; >> >> Agreed that the union {} would still always slip through, though. > > It wasn't clear sorry, but I meant literally "bfloat16_t()", i.e. > construction with zero initialisation. I agree we don't want to > support "bfloat16_t(0.25)" etc. Added to [2/2] as mentioned above. > >> [...] >>>> diff --git a/gcc/testsuite/gcc.target/aarch64/bfloat16_compile_1.c >>>> b/gcc/testsuite/gcc.target/aarch64/bfloat16_compile_1.c >>>> new file mode 100644 >>>> index 000..f2bef671deb >>>> --- /dev/null
Re: [GCC][PATCH][Aarch64] Add Bfloat16_t scalar type, vector types and machine modes to Aarch64 back-end [2/2]
On 1/7/20 3:26 PM, Richard Sandiford wrote: > Stam Markianos-Wright writes: >> On 12/19/19 10:08 AM, Richard Sandiford wrote: >>> Stam Markianos-Wright writes: >>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >>>> index f57469b6e23..f40f6432fd4 100644 >>>> --- a/gcc/config/aarch64/aarch64.c >>>> +++ b/gcc/config/aarch64/aarch64.c >>>> @@ -21661,6 +21661,68 @@ aarch64_stack_protect_guard (void) >>>> return NULL_TREE; >>>>} >>>> >>>> +/* Return the diagnostic message string if conversion from FROMTYPE to >>>> + TOTYPE is not allowed, NULL otherwise. */ >>>> + >>>> +static const char * >>>> +aarch64_invalid_conversion (const_tree fromtype, const_tree totype) >>>> +{ >>>> + static char templ[100]; >>>> + if ((GET_MODE_INNER (TYPE_MODE (fromtype)) == BFmode >>>> + || GET_MODE_INNER (TYPE_MODE (totype)) == BFmode) >>>> + && TYPE_MODE (fromtype) != TYPE_MODE (totype)) >>>> + { >>>> +snprintf (templ, sizeof (templ), \ >>>> + "incompatible types when assigning to type '%s' from type '%s'", >>>> + IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (totype))), >>>> + IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (fromtype; >>>> +return N_(templ); >>>> + } >>>> + /* Conversion allowed. */ >>>> + return NULL; >>>> +} >>>> + >>> >>> This won't handle translation properly. We also have no guarantee that >>> the formatted string will fit in 100 characters since at least one of >>> the type names is unconstrained. (Also, not all types have names.) >>> >> >> Hi Richard. I'm sending an email here to show you what I have done here, too >> :) >> >> Currently I have the following: >> >> static const char * >> aarch64_invalid_conversion (const_tree fromtype, const_tree totype) >> { >> static char templ[100]; >> if (TYPE_MODE (fromtype) != TYPE_MODE (totype) >> && ((TYPE_MODE (fromtype) == BFmode && !VECTOR_TYPE_P (fromtype)) >>|| (TYPE_MODE (totype) == BFmode && !VECTOR_TYPE_P (totype > > Just: > > if (TYPE_MODE (fromtype) != TYPE_MODE (totype) > && (TYPE_MODE (fromtype) == BFmode || TYPE_MODE (fromtype) == > BFmode)) > > should be enough. Types that have BFmode can't also be vectors. Yep, agreed. > >> { >> if (TYPE_NAME (fromtype) != NULL && TYPE_NAME (totype) != NULL) >> { >>snprintf (templ, sizeof (templ), >> "incompatible types when assigning to type '%s' from type '%s'", >> IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (totype))), >> IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (fromtype; >>return N_(templ); >> } >> else >> { >>snprintf (templ, sizeof (templ), >> "incompatible types for assignment"); >>return N_(templ); >> } > > This still has the problem I mentioned above though: DECL_NAMEs are > supplied by the user and can be arbitrary lengths, so there's no > guarantee that the error message fits in the 100-character buffer. > We would get a truncated message if the buffer isn't big enough. > > As far as translation goes: the arguments to diagnostic functions > like "error" are untranslated strings, which the diagnostic functions > then translate internally. po/exgettext scans the source tree looking > for strings that need to be translatable and collects them all in po/gcc.pot. > Constant format strings in calls to known diagnostic functions get picked > up automatically (see ABOUT-GCC-NLS), but others need to be marked with > N_(). This N_() is simply a no-op wrapper macro that marks the argument > as needing translation. It has no effect if the argument isn't a > constant string. > > The interface of this hook is to return an untranslated diagnostic string > that gets passed to error. A better interface would be to let the hook > raise its own error and return a boolean result, but that isn't what > we have. > > So in the above, it's "incompatible types for assignment" that needs to > be wrapped in N_(). Wrapping templ has no effect. > > This is also why the first arm doesn't work for translation. It constructs > and returns an arbitrary new string that won't have been entered into > gcc.pot (an
Re: [GCC][PATCH][AArch64]Add ACLE intrinsics for bfdot for ARMv8.6 Extension
On 12/30/19 10:29 AM, Richard Sandiford wrote: > Stam Markianos-Wright writes: >> diff --git a/gcc/config/aarch64/aarch64-simd.md >> b/gcc/config/aarch64/aarch64-simd.md >> index >> adfda96f077075ad53d4bea2919c4d3b326e49f5..7587bc46ba1c80389ea49fa83a0e6f8a489711e9 >> 100644 >> --- a/gcc/config/aarch64/aarch64-simd.md >> +++ b/gcc/config/aarch64/aarch64-simd.md >> @@ -7028,3 +7028,36 @@ >> "xtn\t%0., %1." >> [(set_attr "type" "neon_shift_imm_narrow_q")] >> ) >> + >> +(define_insn "aarch64_bfdot" >> + [(set (match_operand:VDQSF 0 "register_operand" "=w") >> +(plus:VDQSF >> + (unspec:VDQSF >> + [(match_operand: 2 "register_operand" "w") >> +(match_operand: 3 "register_operand" "w")] >> +UNSPEC_BFDOT) >> + (match_operand:VDQSF 1 "register_operand" "0")))] >> + "TARGET_BF16_SIMD" >> + "bfdot\t%0., %2., %3." >> + [(set_attr "type" "neon_dot")] >> +) >> + >> + >> +(define_insn "aarch64_bfdot_lane" > > Too many blank lines. Fixed, sorry I hadn't noticed! > >> + [(set (match_operand:VDQSF 0 "register_operand" "=w") >> +(plus:VDQSF >> + (unspec:VDQSF >> + [(match_operand: 2 "register_operand" "w") >> +(match_operand:VBF 3 "register_operand" "w") >> +(match_operand:SI 4 "const_int_operand" "n")] >> +UNSPEC_BFDOT) >> + (match_operand:VDQSF 1 "register_operand" "0")))] >> + "TARGET_BF16_SIMD" >> +{ >> + int nunits = GET_MODE_NUNITS (mode).to_constant (); >> + int lane = INTVAL (operands[4]); >> + operands[4] = gen_int_mode (ENDIAN_LANE_N (nunits / 2, lane), SImode); >> + return "bfdot\t%0., %2., %3.2h[%4]"; >> +} >> + [(set_attr "type" "neon_dot")] >> +) >> [...] >> diff --git >> a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfdot-compile-1.c >> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfdot-compile-1.c >> new file mode 100644 >> index >> ..c575dcd3901172a52fa9403c9179d58eea44eb72 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfdot-compile-1.c >> @@ -0,0 +1,91 @@ >> +/* { dg-do assemble { target { aarch64*-*-* } } } */ >> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ >> +/* { dg-add-options arm_v8_2a_bf16_neon } */ >> +/* { dg-additional-options "-O -save-temps" } */ >> +/* { dg-final { check-function-bodies "**" "" } } */ >> +/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */ > > Same comment as for USDOT/SUDOT regarding the dg- markup. Done! > >> + >> +#include >> + >> +/* >> +**ufoo: >> +** bfdot v0.2s, (v1.4h, v2.4h|v2.4h, v1.4h) >> +** ret >> +*/ >> +float32x2_t ufoo(float32x2_t r, bfloat16x4_t x, bfloat16x4_t y) >> +{ >> + return vbfdot_f32 (r, x, y); >> +} >> + >> +/* >> +**ufooq: >> +** bfdot v0.4s, (v1.8h, v2.8h|v2.8h, v1.8h) >> +** ret >> +*/ >> +float32x4_t ufooq(float32x4_t r, bfloat16x8_t x, bfloat16x8_t y) >> +{ >> + return vbfdotq_f32 (r, x, y); >> +} > > The (...|...)s here are correct. Yep. > >> + >> +/* >> +**ufoo_lane: >> +** bfdot v0.2s, (v1.4h, v2.2h\[0\]|v2.4h, v1.2h\[0\]) >> +** ret >> +*/ >> +float32x2_t ufoo_lane(float32x2_t r, bfloat16x4_t x, bfloat16x4_t y) >> +{ >> + return vbfdot_lane_f32 (r, x, y, 0); >> +} >> + >> +/* >> +**ufooq_laneq: >> +** bfdot v0.4s, (v1.8h, v2.2h\[2\]|v2.8h, v1.2h\[2\]) >> +** ret >> +*/ >> +float32x4_t ufooq_laneq(float32x4_t r, bfloat16x8_t x, bfloat16x8_t y) >> +{ >> + return vbfdotq_laneq_f32 (r, x, y, 2); >> +} >> + >> +/* >> +**ufoo_laneq: >> +** bfdot v0.2s, (v1.4h, v2.2h\[3\]|v2.4h, v1.2h\[3\]) >> +** ret >> +*/ >> +float32x2_t ufoo_laneq(float32x2_t r, bfloat16x4_t x, bfloat16x8_t y) >> +{ >> + return vbfdot_laneq_f32 (r, x, y, 3); >> +} >> + >> +/* >> +**ufooq_lane: >> +** bfdot v0.4s, (v1.8h, v2.2h\[1\]|v2.8h, v1.2h\[1\]) >> +** ret >> +*/ >> +float32x4_t ufooq_lane(float32x4_t r, bf
Re: [GCC][PATCH][AArch64]Add ACLE intrinsics for dot product (usdot - vector, dot - by element) for AArch64 AdvSIMD ARMv8.6 Extension
On 12/30/19 10:21 AM, Richard Sandiford wrote: > Stam Markianos-Wright writes: >> On 12/20/19 2:13 PM, Richard Sandiford wrote: >>> Stam Markianos-Wright writes: >>>> +**... >>>> +**ret >>>> +*/ >>>> +int32x2_t ufoo (int32x2_t r, uint8x8_t x, int8x8_t y) >>>> +{ >>>> + return vusdot_s32 (r, x, y); >>>> +} >>>> + >>> >>> If we're using check-function-bodies anyway, it might be slightly more >>> robust to compile at -O and check for the exact RA. E.g.: >>> >>> /* >>> **ufoo: >>> **usdotv0\.2s, (v1\.8b, v2\.8b|v2\.8b, v1\.8b) >>> **ret >>> */ >>> >>> Just a suggestion though -- either way is fine. >> >> done this too and as per our internal discussion also added one >> xx_untied tests for usdot and one for usdot_lane >> >> That's one xx_untied test for each of the RTL pattern types added in >> aarch64-simd.md. Lmk if this is ok! >> >> Also I found that the way we were using check-function-bodies wasn't >> actually checking the assembler correctly, so I've changed that to: >> +/* { dg-final { check-function-bodies "**" "" "" } } */ >> +/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */ >> which seems to perform more checks > > Ah, OK, hadn't realised that we were cycling through optimisation > options already. In that case, it might be better to leave out the > -O from the dg-options and instead use: > > /* { dg-skip-if "" { *-*-* } { { "-fno-fat-lto-objects" } { "-O0" } } } */ > > (untested). > > It's unfortunate that we're skipping this for -O0 though. Ideally we'd > still compile the code and just skip the dg-final. Does it work if you do: > > /* { dg-final { check-function-bodies "**" "" {-O[^0]} } } */ > /* { dg-skip-if "" { *-*-* } { { "-fno-fat-lto-objects" } } } */ > > ? Make sure that we actually still run the check-function-bodies when > optimisation is enabled. :-) This works! Now we are only doing the following for O0: PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O0 (test for excess errors) whereas for other optimisation levels do all the checks: PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 (test for excess errors) PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 check-function-bodies ufoo PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 check-function-bodies ufooq PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 check-function-bodies ufoo_lane PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 check-function-bodies ufoo_laneq PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 check-function-bodies ufooq_lane PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 check-function-bodies ufooq_laneq PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 check-function-bodies sfoo_lane PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 check-function-bodies sfoo_laneq PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 check-function-bodies sfooq_lane PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 check-function-bodies sfooq_laneq PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 check-function-bodies ufoo_untied PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 check-function-bodies ufooq_laneq_untied > > Also, I'm an idiot. The reason I'd used (...|...) in the regexps was > that "dot product is commutative". But of course that's not true for > these mixed-sign ops, so the string must be: > > usdot v0\.2s, v1\.8b, v2\.8b > > The patch copied the (...|...) regexps above to the lane tests, but those > wouldn't be commutative even if the operands had the same type. Ahh, makes sense now. Done :) Cheers, Stam > > Thanks, > Richard > diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index 1bd2640a1ced352de232fed1cf134b46c69b80f7..702b317d94d2fc6ebe59609727ad853f3f5cc652 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -107,6 +107,9 @@ enum aarch64_type_qualifiers /* Lane indices selected in pairs. - must be in range, and flipped for bigendian. */ qualifier_lane_pair_index = 0x800, + /* Lane indices selected in quadtuplets. - must be in range, and flipped for + bigendian. */ + qualifier_lane_quadtup_index = 0x1000, }; typedef struct @@ -173,6 +176,10 @@ aarch64_types_
Re: [PING][PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)
On 12/10/19 5:03 PM, Kyrill Tkachov wrote: > Hi Stam, > > On 11/15/19 5:26 PM, Stam Markianos-Wright wrote: >> Pinging with more correct maintainers this time :) >> >> Also would need to backport to gcc7,8,9, but need to get this approved >> first! >> > > Sorry for the delay. Same here now! Sorry totally forget about this in the lead up to Xmas! Done the changes marked below and also removed the unnecessary extra #defines from the test. > > >> Thank you, >> Stam >> >> >> Forwarded Message >> Subject: Re: [PATCH][GCC][ARM] Arm generates out of range conditional >> branches in Thumb2 (PR91816) >> Date: Mon, 21 Oct 2019 10:37:09 +0100 >> From: Stam Markianos-Wright >> To: Ramana Radhakrishnan >> CC: gcc-patches@gcc.gnu.org , nd , >> James Greenhalgh , Richard Earnshaw >> >> >> >> >> On 10/13/19 4:23 PM, Ramana Radhakrishnan wrote: >> >> >> >> Patch bootstrapped and regression tested on arm-none-linux-gnueabihf, >> >> however, on my native Aarch32 setup the test times out when run as part >> >> of a big "make check-gcc" regression, but not when run individually. >> >> >> >> 2019-10-11 Stamatis Markianos-Wright >> >> >> >> * config/arm/arm.md: Update b for Thumb2 range checks. >> >> * config/arm/arm.c: New function arm_gen_far_branch. >> >> * config/arm/arm-protos.h: New function arm_gen_far_branch >> >> prototype. >> >> >> >> gcc/testsuite/ChangeLog: >> >> >> >> 2019-10-11 Stamatis Markianos-Wright >> >> >> >> * testsuite/gcc.target/arm/pr91816.c: New test. >> > >> >> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h >> >> index f995974f9bb..1dce333d1c3 100644 >> >> --- a/gcc/config/arm/arm-protos.h >> >> +++ b/gcc/config/arm/arm-protos.h >> >> @@ -570,4 +570,7 @@ void arm_parse_option_features (sbitmap, const >> cpu_arch_option *, >> >> >> >> void arm_initialize_isa (sbitmap, const enum isa_feature *); >> >> >> >> +const char * arm_gen_far_branch (rtx *, int,const char * , const char *); >> >> + >> >> + >> > >> > Lets get the nits out of the way. >> > >> > Unnecessary extra new line, need a space between int and const above. >> > >> > >> >> .Fixed! >> >> >> #endif /* ! GCC_ARM_PROTOS_H */ >> >> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c >> >> index 39e1a1ef9a2..1a693d2ddca 100644 >> >> --- a/gcc/config/arm/arm.c >> >> +++ b/gcc/config/arm/arm.c >> >> @@ -32139,6 +32139,31 @@ arm_run_selftests (void) >> >> } >> >> } /* Namespace selftest. */ >> >> >> >> + >> >> +/* Generate code to enable conditional branches in functions over 1 MiB. >> >> */ >> >> +const char * >> >> +arm_gen_far_branch (rtx * operands, int pos_label, const char * dest, >> >> + const char * branch_format) >> > >> > Not sure if this is some munging from the attachment but check >> > vertical alignment of parameters. >> > >> >> .Fixed! >> >> >> +{ >> >> + rtx_code_label * tmp_label = gen_label_rtx (); >> >> + char label_buf[256]; >> >> + char buffer[128]; >> >> + ASM_GENERATE_INTERNAL_LABEL (label_buf, dest , \ >> >> + CODE_LABEL_NUMBER (tmp_label)); >> >> + const char *label_ptr = arm_strip_name_encoding (label_buf); >> >> + rtx dest_label = operands[pos_label]; >> >> + operands[pos_label] = tmp_label; >> >> + >> >> + snprintf (buffer, sizeof (buffer), "%s%s", branch_format , label_ptr); >> >> + output_asm_insn (buffer, operands); >> >> + >> >> + snprintf (buffer, sizeof (buffer), "b\t%%l0%d\n%s:", pos_label, >> >> label_ptr); >> >> + operands[pos_label] = dest_label; >> >> + output_asm_insn (buffer, operands); >> >> + return ""; >> >> +} >> >> + >> >> + >> > >> > Unnecessary extra newline. >> > >> >> .Fixed! >> >> >> #undef TARGET_RUN_TARGET_SELFTESTS >> >> #define TARGET_RUN_TARGET_SELFTESTS selftest::arm_run_selftests >>
Re: [GCC][PATCH][Aarch64] Add Bfloat16_t scalar type, vector types and machine modes to Aarch64 back-end [2/2]
On 12/19/19 10:08 AM, Richard Sandiford wrote: > Stam Markianos-Wright writes: >> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >> index f57469b6e23..f40f6432fd4 100644 >> --- a/gcc/config/aarch64/aarch64.c >> +++ b/gcc/config/aarch64/aarch64.c >> @@ -21661,6 +21661,68 @@ aarch64_stack_protect_guard (void) >> return NULL_TREE; >> } >> >> +/* Return the diagnostic message string if conversion from FROMTYPE to >> + TOTYPE is not allowed, NULL otherwise. */ >> + >> +static const char * >> +aarch64_invalid_conversion (const_tree fromtype, const_tree totype) >> +{ >> + static char templ[100]; >> + if ((GET_MODE_INNER (TYPE_MODE (fromtype)) == BFmode >> + || GET_MODE_INNER (TYPE_MODE (totype)) == BFmode) >> + && TYPE_MODE (fromtype) != TYPE_MODE (totype)) >> + { >> +snprintf (templ, sizeof (templ), \ >> + "incompatible types when assigning to type '%s' from type '%s'", >> + IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (totype))), >> + IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (fromtype; >> +return N_(templ); >> + } >> + /* Conversion allowed. */ >> + return NULL; >> +} >> + > > This won't handle translation properly. We also have no guarantee that > the formatted string will fit in 100 characters since at least one of > the type names is unconstrained. (Also, not all types have names.) > Hi Richard. I'm sending an email here to show you what I have done here, too :) Currently I have the following: static const char * aarch64_invalid_conversion (const_tree fromtype, const_tree totype) { static char templ[100]; if (TYPE_MODE (fromtype) != TYPE_MODE (totype) && ((TYPE_MODE (fromtype) == BFmode && !VECTOR_TYPE_P (fromtype)) || (TYPE_MODE (totype) == BFmode && !VECTOR_TYPE_P (totype { if (TYPE_NAME (fromtype) != NULL && TYPE_NAME (totype) != NULL) { snprintf (templ, sizeof (templ), "incompatible types when assigning to type '%s' from type '%s'", IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (totype))), IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (fromtype; return N_(templ); } else { snprintf (templ, sizeof (templ), "incompatible types for assignment"); return N_(templ); } } /* Conversion allowed. */ return NULL; } This blocks the conversion only if the two types are of different modes and one of them is a BFmode scalar. Doing it like this seems to block all scalar-sized assignments: C: typedef bfloat16_t vbf __attribute__((vector_size(2))); vbf foo3 (void) { return (vbf) 0x1234; } bfloat16_t foo1 (void) { return (bfloat16_t) 0x1234; } bfloat16_t scalar1_3 = 0; bfloat16_t scalar1_4 = 0.1; bfloat16_t scalar1_5 = is_a_float; bfloat16x4_t vector2_8 = { 0.0, 0, n2, is_a_float }; // (blocked on each element assignment) C++: bfloat16_t c1 (void) { return bfloat16_t (0x1234); } bfloat16_t c2 (void) { return bfloat16_t (0.1); } But then it allows vector initialisation from binary: C: bfloat16x4_t foo1 (void) { return (bfloat16x4_t) 0x1234567812345678; } C++: bfloat16x4_t foo1 (void) { return bfloat16x4_t (0x1234567812345678); } typedef bfloat16_t v2bf __attribute__((vector_size(4))); v2bf foo3 (void) { return v2bf (0x12345678); } I also need to check with a colleague who is on holiday if any of this impacts the vector-reinterpret intrinsics that he was working on... Let me know of your thoughts! Cheers, Stam > Unfortunately the interface of the current hook doesn't allow for good > diagnostics. We'll just have to return a fixed string. > > Formatting nit: braced block should be indented two spaces more > than the "if (...)". > > Same comment for the other hooks. Done. Will be in next revision > >> +/* Return the diagnostic message string if the unary operation OP is >> + not permitted on TYPE, NULL otherwise. */ >> + >> +static const char * >> +aarch64_invalid_unary_op (int op, const_tree type) >> +{ >> + static char templ[100]; >> + /* Reject all single-operand operations on BFmode except for &. */ >> + if (GET_MODE_INNER (TYPE_MODE (type)) == BFmode && op != ADDR_EXPR) >> + { >> +snprintf (templ, sizeof (templ), >> + "operation not permitted on type '%s'", >> + IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type; >> +return N_(templ); >> + } >> + /* Operation allowed. */ >> + return NULL; >> +} > > The problem with testing TYPE_MODE is that we'll then miss things > that don
Re: [GCC][PATCH][Aarch64] Add Bfloat16_t scalar type, vector types and machine modes to Aarch64 back-end [1/2]
On 23/12/2019 16:57, Richard Sandiford wrote: > Stam Markianos-Wright writes: >> On 12/19/19 10:01 AM, Richard Sandiford wrote: >>>> + >>>> +#pragma GCC push_options >>>> +#pragma GCC target ("arch=armv8.2-a+bf16") >>>> +#ifdef __ARM_FEATURE_BF16_SCALAR_ARITHMETIC >>>> + >>>> +typedef __bf16 bfloat16_t; >>>> + >>>> + >>>> +#endif >>>> +#pragma GCC pop_options >>>> + >>>> +#endif >>> >>> Are you sure we need the #ifdef? The target pragma should guarantee >>> that the macro's defined. >>> >>> But the validity of the typedef shouldn't depend on target options, >>> so AFAICT this should just be: >>> >>> typedef __bf16 bfloat16_t; >> >> Ok so it's a case of "what do we want to happen if the user tries to use >> bfloats >> without +bf16 enabled. >> >> So the intent of the ifdef was to not have bfloat16_t be visible if the macro >> wasn't defined (i.e. not having any bf16 support), but I see now that this >> was >> being negated by the target macro, anyway! Oops, my bad for not really >> understanding that, sorry! >> >> If we have the types always visible, then the user may use them, resulting >> in an >> ICE. >> >> But even if the #ifdef worked this still doesn't stop the user from trying to >> use __bf16 or __Bfloat16x4_t, __Bfloat16x8_t , which would still do produce >> an >> ICE, so it's not a perfect solution anyway... > > Right. Or they could use #pragma GCC target to switch to a different > non-bf16 target after including arm_bf16.h. > >> One other thing I tried was the below change to aarch64-builtins.c which >> stops >> __bf16 or the vector types from being registered at all: >> >> --- a/gcc/config/aarch64/aarch64-builtins.c >> +++ b/gcc/config/aarch64/aarch64-builtins.c >> @@ -759,26 +759,32 @@ aarch64_init_simd_builtin_types (void) >> aarch64_simd_types[Float64x1_t].eltype = double_type_node; >> aarch64_simd_types[Float64x2_t].eltype = double_type_node; >> >> - /* Init Bfloat vector types with underlying __bf16 type. */ >> - aarch64_simd_types[Bfloat16x4_t].eltype = aarch64_bf16_type_node; >> - aarch64_simd_types[Bfloat16x8_t].eltype = aarch64_bf16_type_node; >> + if (TARGET_BF16_SIMD) >> +{ >> + /* Init Bfloat vector types with underlying __bf16 type. */ >> + aarch64_simd_types[Bfloat16x4_t].eltype = aarch64_bf16_type_node; >> + aarch64_simd_types[Bfloat16x8_t].eltype = aarch64_bf16_type_node; >> +} >> >> for (i = 0; i < nelts; i++) >> { >> tree eltype = aarch64_simd_types[i].eltype; >> machine_mode mode = aarch64_simd_types[i].mode; >> >> - if (aarch64_simd_types[i].itype == NULL) >> + if (eltype != NULL) >>{ >> - aarch64_simd_types[i].itype >> - = build_distinct_type_copy >> - (build_vector_type (eltype, GET_MODE_NUNITS (mode))); >> - SET_TYPE_STRUCTURAL_EQUALITY (aarch64_simd_types[i].itype); >> - } >> + if (aarch64_simd_types[i].itype == NULL) >> + { >> + aarch64_simd_types[i].itype >> + = build_distinct_type_copy >> + (build_vector_type (eltype, GET_MODE_NUNITS (mode))); >> + SET_TYPE_STRUCTURAL_EQUALITY (aarch64_simd_types[i].itype); >> + } >> >> - tdecl = add_builtin_type (aarch64_simd_types[i].name, >> - aarch64_simd_types[i].itype); >> - TYPE_NAME (aarch64_simd_types[i].itype) = tdecl; >> + tdecl = add_builtin_type (aarch64_simd_types[i].name, >> + aarch64_simd_types[i].itype); >> + TYPE_NAME (aarch64_simd_types[i].itype) = tdecl; >> + } >> } >> >> #define AARCH64_BUILD_SIGNED_TYPE(mode) \ >> @@ -1240,7 +1246,8 @@ aarch64_general_init_builtins (void) >> >> aarch64_init_fp16_types (); >> >> - aarch64_init_bf16_types (); >> + if (TARGET_BF16_FP) >> +aarch64_init_bf16_types (); >> >> if (TARGET_SIMD) >> aarch64_init_simd_builtins (); >> >> >> >> But the problem in that case was that it the types could not be re-enabled >> using >> a target pragma like: >> >> #pragma GCC push_options >> #pragma GCC target ("+bf16") >> >> Inside the test
Re: [GCC][PATCH][AArch64]Add ACLE intrinsics for bfdot for ARMv8.6 Extension
On 12/20/19 2:36 PM, Richard Sandiford wrote: > Stam Markianos-Wright writes: >> Hi all, >> >> This patch adds the ARMv8.6 Extension ACLE intrinsics for the bfloat bfdot >> operation. >> >> The functions are declared in arm_neon.h with the armv8.2-a+bf16 target >> option >> as required. >> >> RTL patterns are defined to generate assembler. >> >> Tests added to verify expected assembly and perform adequate lane checks. >> >> This patch depends on: >> >> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html >> >> for testuite effective_target update and on: >> >> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01323.html >> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01324.html >> >> for back-end Bfloat enablement. >> >> Cheers, >> Stam >> >> >> gcc/ChangeLog: >> >> 2019-11-04 Stam Markianos-Wright >> >> * config/aarch64/aarch64-simd-builtins.def (aarch64_bfdot, >> aarch64_bfdot_lane, aarch64_bfdot_laneq): New. >> * config/aarch64/aarch64-simd.md >> (aarch64_bfdot, aarch64_bfdot_lane): New. >> * config/aarch64/arm_neon.h (vbfdot_f32, vbfdotq_f32, vbfdot_lane_f32, >> vbfdotq_lane_f32, vbfdot_laneq_f32, vbfdotq_laneq_f32): New. >> * config/aarch64/iterators.md (UNSPEC_BFDOT, VBF, isquadop, Vbfdottype, >> VBFMLA_W): New. > > Changelog nit: the continuation lines should be indened by a tab only. Yes, sorry, that's my email client messing things up again! Fixed locally and will carry over when I do the commit. > >> diff --git a/gcc/config/aarch64/aarch64-simd.md >> b/gcc/config/aarch64/aarch64-simd.md >> index >> c4858ab7cffd786066646a5cd95a168311990b76..bdc26c190610580e57e9749804b7729ee4e34793 >> 100644 >> --- a/gcc/config/aarch64/aarch64-simd.md >> +++ b/gcc/config/aarch64/aarch64-simd.md >> @@ -7027,3 +7027,37 @@ >> "xtn\t%0., %1." >> [(set_attr "type" "neon_shift_imm_narrow_q")] >> ) >> + >> +(define_insn "aarch64_bfdot" >> + [(set (match_operand:VDQSF 0 "register_operand" "=w") >> +(plus:VDQSF (match_operand:VDQSF 1 "register_operand" "0") >> +(unspec:VDQSF [(match_operand: 2 >> +"register_operand" "w") >> + (match_operand: 3 >> +"register_operand" "w")] >> + UNSPEC_BFDOT)))] > > The operands to the plus should be the other way around, so that > the more complicated operand comes first, > Done >> + "TARGET_BF16_SIMD" >> + "bfdot\t%0., %2., %3." >> + [(set_attr "type" "neon_dot")] >> +) >> + >> + >> +(define_insn "aarch64_bfdot_lane" >> + [(set (match_operand:VDQSF 0 "register_operand" "=w") >> +(plus:VDQSF (match_operand:VDQSF 1 "register_operand" "0") >> +(unspec:VDQSF [(match_operand: 2 >> +"register_operand" "w") >> + (match_operand: VBF 3 > > Nit: should be no space before "VBF". Done > >> +"register_operand" "w") >> + (match_operand:SI 4 >> +"const_int_operand" "n")] >> + UNSPEC_BFDOT)))] >> + "TARGET_BF16_SIMD" >> +{ >> + int nunits = GET_MODE_NUNITS (mode).to_constant (); >> + int lane = INTVAL (operands[4]); >> + operands[4] = gen_int_mode (ENDIAN_LANE_N (nunits / 2, lane), SImode); > > Should only be one space after "=". Done > >> + return "bfdot\t%0., %2., %3.2h[%4]"; >> +} >> + [(set_attr "type" "neon_dot")] >> +) >> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h >> index >> 5996df0a612caff3c881fc15b0aa12b8f91a193b..0357d97cc4143c3a9c56260d9a9cc24138afc049 >> 100644 >> --- a/gcc/config/aarch64/arm_neon.h >> +++ b/gcc/config/aarch64/arm_neon.h >> @@ -34612,6 +34612,57 @@ vrnd64xq_f64 (float64x2_t __a) >> >> #include "arm_bf16.h" >> >> +#pragma GCC push_options >> +#pragma GCC target ("a
Re: [GCC][PATCH][AArch64]Add ACLE intrinsics for dot product (usdot - vector, dot - by element) for AArch64 AdvSIMD ARMv8.6 Extension
On 12/20/19 2:13 PM, Richard Sandiford wrote: > Stam Markianos-Wright writes: >> diff --git a/gcc/config/aarch64/aarch64-simd.md >> b/gcc/config/aarch64/aarch64-simd.md >> index >> ad4676bc167f08951e693916c7ef796e3501762a..eba71f004ef67af654f9c512b720aa6cfdd1d7fc >> 100644 >> --- a/gcc/config/aarch64/aarch64-simd.md >> +++ b/gcc/config/aarch64/aarch64-simd.md >> @@ -506,6 +506,19 @@ >> [(set_attr "type" "neon_dot")] >> ) >> >> +;; These instructions map to the __builtins for the armv8.6a I8MM usdot >> +;; (vector) Dot Product operation. >> +(define_insn "aarch64_usdot" >> + [(set (match_operand:VS 0 "register_operand" "=w") >> +(plus:VS (match_operand:VS 1 "register_operand" "0") >> +(unspec:VS [(match_operand: 2 "register_operand" "w") >> +(match_operand: 3 "register_operand" "w")] >> +UNSPEC_USDOT)))] >> + "TARGET_SIMD && TARGET_I8MM" >> + "usdot\\t%0., %2., %3." >> + [(set_attr "type" "neon_dot")] >> +) >> + >> ;; These expands map to the Dot Product optab the vectorizer checks for. >> ;; The auto-vectorizer expects a dot product builtin that also does an >> ;; accumulation into the provided register. > > Sorry for not raising it last time, but this should just be "TARGET_I8MM". > TARGET_SIMD is always true when TARGET_I8MM is. Oh no worries! Thank you so much for the detailed feedback, every time :D Fixed/ > >> @@ -573,6 +586,25 @@ >> [(set_attr "type" "neon_dot")] >> ) >> >> +;; These instructions map to the __builtins for the armv8.6a I8MM usdot, >> sudot >> +;; (by element) Dot Product operations. >> +(define_insn "aarch64_dot_lane" >> + [(set (match_operand:VS 0 "register_operand" "=w") >> +(plus:VS (match_operand:VS 1 "register_operand" "0") >> +(unspec:VS [(match_operand: 2 "register_operand" "w") >> +(match_operand:VB 3 "register_operand" "w") >> +(match_operand:SI 4 "immediate_operand" "i")] >> +DOTPROD_I8MM)))] >> + "TARGET_SIMD && TARGET_I8MM" >> + { >> +int nunits = GET_MODE_NUNITS (mode).to_constant (); >> +int lane = INTVAL (operands[4]); >> +operands[4] = gen_int_mode (ENDIAN_LANE_N (nunits / 4, lane), SImode); >> +return "dot\\t%0., %2., >> %3.4b[%4]"; >> + } >> + [(set_attr "type" "neon_dot")] >> +) >> + >> (define_expand "copysign3" >> [(match_operand:VHSDF 0 "register_operand") >> (match_operand:VHSDF 1 "register_operand") > > Same here. Another thing I should have noticed last time is that the > canonical order for (plus ...) is to have the more complicated expression > first. Operand 1 and the (unpec ...) should therefore be the other > way around in the expression above. (Having operand 1 "later" than > operands 2, 3 and 4 is OK.) Done. > >> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h >> index >> 8b861601a48b2150aa5768d717c61e0d1416747f..95b92dff69343e2b6c74174b39f3cd9d9838ddab >> 100644 >> --- a/gcc/config/aarch64/arm_neon.h >> +++ b/gcc/config/aarch64/arm_neon.h >> @@ -34606,6 +34606,89 @@ vrnd64xq_f64 (float64x2_t __a) >> >> #pragma GCC pop_options >> >> +/* AdvSIMD 8-bit Integer Matrix Multiply (I8MM) intrinsics. */ >> + >> +#pragma GCC push_options >> +#pragma GCC target ("arch=armv8.2-a+i8mm") >> + >> +__extension__ extern __inline int32x2_t >> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) >> +vusdot_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b) >> +{ >> + return __builtin_aarch64_usdotv8qi_ssus (__r, __a, __b); >> +} >> + >> +__extension__ extern __inline int32x4_t >> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) >> +vusdotq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t __b) >> +{ >> + return __builtin_aarch64_usdotv16qi_ssus (__r, __a, __b); >> +} >> + >> +__extension__ extern __inline int32x2_t >> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) >> +vusdot_lane_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b, const int &g
Re: [GCC][PATCH][Aarch64] Add Bfloat16_t scalar type, vector types and machine modes to Aarch64 back-end [1/2]
On 12/19/19 10:01 AM, Richard Sandiford wrote: > Stam Markianos-Wright writes: >> [...] >> @@ -659,6 +666,8 @@ aarch64_simd_builtin_std_type (machine_mode mode, >> return float_type_node; >> case E_DFmode: >> return double_type_node; >> +case E_BFmode: >> + return aarch64_bf16_type_node; >> default: >> gcc_unreachable (); >> } >> @@ -750,6 +759,11 @@ aarch64_init_simd_builtin_types (void) >> aarch64_simd_types[Float64x1_t].eltype = double_type_node; >> aarch64_simd_types[Float64x2_t].eltype = double_type_node; >> >> + >> +/* Init Bfloat vector types with underlying uint types. */ >> + aarch64_simd_types[Bfloat16x4_t].eltype = aarch64_bf16_type_node; >> + aarch64_simd_types[Bfloat16x8_t].eltype = aarch64_bf16_type_node; > > Formatting nits: too many blank lines, comment should be indented > to match the code. Done :) > >> + >> for (i = 0; i < nelts; i++) >> { >> tree eltype = aarch64_simd_types[i].eltype; >> @@ -1059,6 +1073,19 @@ aarch64_init_fp16_types (void) >> aarch64_fp16_ptr_type_node = build_pointer_type (aarch64_fp16_type_node); >> } >> >> +/* Initialize the backend REAL_TYPE type supporting bfloat types. */ >> +static void >> +aarch64_init_bf16_types (void) >> +{ >> + aarch64_bf16_type_node = make_node (REAL_TYPE); >> + TYPE_PRECISION (aarch64_bf16_type_node) = 16; >> + SET_TYPE_MODE (aarch64_bf16_type_node, BFmode); >> + layout_type (aarch64_bf16_type_node); >> + >> + (*lang_hooks.types.register_builtin_type) (aarch64_bf16_type_node, >> "__bf16"); > > This style is mostly a carry-over from pre-ANSI days. New code > can just use "lang_hooks.types.register_builtin_type (...)". Ahh good to know, thanks! Done > >> + aarch64_bf16_ptr_type_node = build_pointer_type (aarch64_bf16_type_node); >> +} >> + >> /* Pointer authentication builtins that will become NOP on legacy platform. >> Currently, these builtins are for internal use only (libgcc EH >> unwinder). */ >> >> [...] >> diff --git a/gcc/config/aarch64/aarch64-simd-builtin-types.def >> b/gcc/config/aarch64/aarch64-simd-builtin-types.def >> index b015694293c..3b387377f38 100644 >> --- a/gcc/config/aarch64/aarch64-simd-builtin-types.def >> +++ b/gcc/config/aarch64/aarch64-simd-builtin-types.def >> @@ -50,3 +50,5 @@ >> ENTRY (Float32x4_t, V4SF, none, 13) >> ENTRY (Float64x1_t, V1DF, none, 13) >> ENTRY (Float64x2_t, V2DF, none, 13) >> + ENTRY (Bfloat16x4_t, V4BF, none, 15) >> + ENTRY (Bfloat16x8_t, V8BF, none, 15) > > Should be 14 (number of characters + 2 for "__"). Would be good to have > a test for correct C++ mangling. Done, thank you for pointing it out!! > >> [...] >> @@ -101,10 +101,10 @@ >> [(set_attr "type" "neon_dup")] >> ) >> >> -(define_insn "*aarch64_simd_mov" >> - [(set (match_operand:VD 0 "nonimmediate_operand" >> +(define_insn "*aarch64_simd_mov" >> + [(set (match_operand:VDMOV 0 "nonimmediate_operand" >> "=w, m, m, w, ?r, ?w, ?r, w") >> -(match_operand:VD 1 "general_operand" >> +(match_operand:VDMOV 1 "general_operand" >> "m, Dz, w, w, w, r, r, Dn"))] >> "TARGET_SIMD >> && (register_operand (operands[0], mode) >> @@ -126,13 +126,14 @@ >> } >> [(set_attr "type" "neon_load1_1reg, store_8, neon_store1_1reg,\ >> neon_logic, neon_to_gp, f_mcr,\ >> - mov_reg, neon_move")] >> + mov_reg, neon_move") >> +(set_attr "arch" "*,notbf16,*,*,*,*,*,notbf16")] >> ) > > Together with the changes to the arch attribute: > >> @@ -378,6 +378,12 @@ >> (and (eq_attr "arch" "fp16") >> (match_test "TARGET_FP_F16INST")) >> >> +(and (eq_attr "arch" "fp16_notbf16") >> + (match_test "TARGET_FP_F16INST && !TARGET_BF16_FP")) >> + >> +(and (eq_attr "arch" "notbf16") >> + (match_test "!TARGET_BF16_SIMD")) >> + >> (and (eq_attr "arch" "sve") >> (match_test "TARGET_SVE"))) >> (const_string "yes") > > this will dis
[GCC][PATCH][ARM] Add Bfloat16_t scalar type, vector types and machine modes to ARM back-end
Hi all, This patch was developed at the same time as the aarch64 version. Richards' feedback on that one also applies here and we'll be addressing them in a respin. However, it's still useful to get this up for everyone (including ARM maintainers) to look and and comment, too. For reference , the latest emails in the Aarch64 thread are at: https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01364.html https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01362.html (The respin will also be split into two in a similar fashion to the Aarch64 version) Regression testing on arm-none-eabi passed successfully. This patch depends on: https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html for test suite effective_target update. Cheers, Stam ACLE documents are at https://developer.arm.com/docs/101028/latest ISA documents are at https://developer.arm.com/docs/ddi0596/latest Details on ARM Bfloat can be found here: https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a gcc/ChangeLog: 2019-12-16 Stam Markianos-Wright * config.gcc: Add arm_bf16.h. * config/arm/arm-builtins.c (arm_mangle_builtin_type): Fix comment. (arm_simd_builtin_std_type): Add BFmode. (arm_init_simd_builtin_types): Define element types for vector types. (arm_init_bf16_types): New function. (arm_init_builtins): Add arm_init_bf16_types function call. * config/arm/arm-modes.def: Add BFmode and V4BF, V8BF vector modes. * config/arm/arm-simd-builtin-types.def: Add V4BF, V8BF. * config/arm/arm.c (aapcs_vfp_sub_candidate): Add BFmode. (arm_hard_regno_mode_ok): Add BFmode and tidy up statements. (arm_vector_mode_supported_p): Add V4BF, V8BF. (arm_invalid_conversion): New function for target hook. (arm_invalid_unary_op): New function for target hook. (arm_invalid_binary_op): New function for target hook. * config/arm/arm.h: Add V4BF, V8BF to VALID_NEON_DREG_MODE, VALID_NEON_QREG_MODE respectively. Add export arm_bf16_type_node, arm_bf16_ptr_type_node. * config/arm/arm.md: New enabled_for_bfmode_scalar, enabled_for_bfmode_vector attributes. Add BFmode to movhf expand. pattern and define_split between ARM registers. * config/arm/arm_bf16.h: New file. * config/arm/arm_neon.h: Add arm_bf16.h and Bfloat vector types. * config/arm/iterators.md (ANY64_BF, VDXMOV, VHFBF, HFBF, fporbf): New. (VQXMOV): Add V8BF. * config/arm/neon.md: Add BF vector types to NEON move patterns. * config/arm/vfp.md: Add BFmode to movhf_vfp pattern. 2019-12-16 Stam Markianos-Wright * gcc.target/arm/bfloat16_compile-1.c: New test. * gcc.target/arm/bfloat16_compile-2.c: New test. * gcc.target/arm/bfloat16_compile-3.c: New test. * gcc.target/arm/bfloat16_compile-4.c: New test. * gcc.target/arm/bfloat16_scalar_typecheck.c: New test. * gcc.target/arm/bfloat16_vector_typecheck1.c: New test. * gcc.target/arm/bfloat16_vector_typecheck2.c: New test. diff --git a/gcc/config.gcc b/gcc/config.gcc index 5aa0130135fa3ce95df502b3f84e78832b368375..bf1b6319643cf21970495f846392983255bd 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -344,7 +344,7 @@ arc*-*-*) arm*-*-*) cpu_type=arm extra_objs="arm-builtins.o aarch-common.o" - extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h arm_cmse.h" + extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h arm_cmse.h arm_bf16.h" target_type_format_char='%' c_target_objs="arm-c.o" cxx_target_objs="arm-c.o" diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 2d902d0b325bc1fe5e22831ef8a59a2bb37c1225..b998a4b935d522ca9ec7b5a928fc6bcc6649d5a3 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -315,12 +315,14 @@ arm_set_sat_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define v8qi_UP E_V8QImode #define v4hi_UP E_V4HImode #define v4hf_UP E_V4HFmode +#define v4bf_UP E_V4BFmode #define v2si_UP E_V2SImode #define v2sf_UP E_V2SFmode #define di_UPE_DImode #define v16qi_UP E_V16QImode #define v8hi_UP E_V8HImode #define v8hf_UP E_V8HFmode +#define v8bf_UP E_V8BFmode #define v4si_UP E_V4SImode #define v4sf_UP E_V4SFmode #define v2di_UP E_V2DImode @@ -328,9 +330,10 @@ arm_set_sat_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define ei_UP E_EImode #define oi_UP E_OImode #define hf_UP E_HFmode +#define bf_UPE_BFmode #define si_UP E_SImode #define void_UP E_VOIDmode - +#define sf_UP E_SFmode #define UP(X) X##_UP typedef struct { @@ -806,6 +809,11 @@ static struct arm_simd_type_info arm_simd_types [] = { /* The user-visible __fp16 type. */ tree arm_fp16_type_node = NULL_TREE; + +/* Back-end node type for brain float (bfloat) types. */ +tree arm_bf16_t
[GCC][PATCH][AArch64]Add ACLE intrinsics for bfdot for ARMv8.6 Extension
Hi all, This patch adds the ARMv8.6 Extension ACLE intrinsics for the bfloat bfdot operation. The functions are declared in arm_neon.h with the armv8.2-a+bf16 target option as required. RTL patterns are defined to generate assembler. Tests added to verify expected assembly and perform adequate lane checks. This patch depends on: https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html for testuite effective_target update and on: https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01323.html https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01324.html for back-end Bfloat enablement. Cheers, Stam gcc/ChangeLog: 2019-11-04 Stam Markianos-Wright * config/aarch64/aarch64-simd-builtins.def (aarch64_bfdot, aarch64_bfdot_lane, aarch64_bfdot_laneq): New. * config/aarch64/aarch64-simd.md (aarch64_bfdot, aarch64_bfdot_lane): New. * config/aarch64/arm_neon.h (vbfdot_f32, vbfdotq_f32, vbfdot_lane_f32, vbfdotq_lane_f32, vbfdot_laneq_f32, vbfdotq_laneq_f32): New. * config/aarch64/iterators.md (UNSPEC_BFDOT, VBF, isquadop, Vbfdottype, VBFMLA_W): New. gcc/testsuite/ChangeLog: 2019-11-04 Stam Markianos-Wright * gcc.target/aarch64/advsimd-intrinsics/bfdot-compile-1.c: New. * gcc.target/aarch64/advsimd-intrinsics/bfdot-compile-2.c: New. * gcc.target/aarch64/advsimd-intrinsics/bfdot-compile-3.c: New. diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index f4ca35a59704c761fe2ac2b6d401fff7c8aba80d..6c5b61c37bcb340f963861723c6e365e32f6ca95 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -682,3 +682,8 @@ BUILTIN_VSFDF (UNOP, frint32x, 0) BUILTIN_VSFDF (UNOP, frint64z, 0) BUILTIN_VSFDF (UNOP, frint64x, 0) + + /* Implemented by aarch64_bfdot{_lane}{q}. */ + VAR2 (TERNOP, bfdot, 0, v2sf, v4sf) + VAR2 (QUADOP_LANE_PAIR, bfdot_lane, 0, v2sf, v4sf) + VAR2 (QUADOP_LANE_PAIR, bfdot_laneq, 0, v2sf, v4sf) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index c4858ab7cffd786066646a5cd95a168311990b76..bdc26c190610580e57e9749804b7729ee4e34793 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -7027,3 +7027,37 @@ "xtn\t%0., %1." [(set_attr "type" "neon_shift_imm_narrow_q")] ) + +(define_insn "aarch64_bfdot" + [(set (match_operand:VDQSF 0 "register_operand" "=w") + (plus:VDQSF (match_operand:VDQSF 1 "register_operand" "0") + (unspec:VDQSF [(match_operand: 2 + "register_operand" "w") + (match_operand: 3 + "register_operand" "w")] + UNSPEC_BFDOT)))] + "TARGET_BF16_SIMD" + "bfdot\t%0., %2., %3." + [(set_attr "type" "neon_dot")] +) + + +(define_insn "aarch64_bfdot_lane" + [(set (match_operand:VDQSF 0 "register_operand" "=w") + (plus:VDQSF (match_operand:VDQSF 1 "register_operand" "0") + (unspec:VDQSF [(match_operand: 2 + "register_operand" "w") + (match_operand: VBF 3 + "register_operand" "w") + (match_operand:SI 4 + "const_int_operand" "n")] + UNSPEC_BFDOT)))] + "TARGET_BF16_SIMD" +{ + int nunits = GET_MODE_NUNITS (mode).to_constant (); + int lane = INTVAL (operands[4]); + operands[4] = gen_int_mode (ENDIAN_LANE_N (nunits / 2, lane), SImode); + return "bfdot\t%0., %2., %3.2h[%4]"; +} + [(set_attr "type" "neon_dot")] +) diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 5996df0a612caff3c881fc15b0aa12b8f91a193b..0357d97cc4143c3a9c56260d9a9cc24138afc049 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -34612,6 +34612,57 @@ vrnd64xq_f64 (float64x2_t __a) #include "arm_bf16.h" +#pragma GCC push_options +#pragma GCC target ("arch=armv8.2-a+bf16") + +__extension__ extern __inline float32x2_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vbfdot_f32 (float32x2_t __r, bfloat16x4_t __a, bfloat16x4_t __b) +{ + return __builtin_aarch64_bfdotv2sf (__r, __a, __b); +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vbfdotq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b) +{ + return __builtin_aarch64_bfdotv4sf (__r, __a, __b); +} + +__extension__ extern __inline float32x2_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vbfdot_lane_f32 \ + (float32x2_t __r, bfloat16x4_t __a, bfloat16x4_t __b, const int __index) +{ + return __builtin_aarch64_bfdot_lanev2sf (__r, __a, __b, __index); +} + +__extension__ extern __inline float32x4_t +__attrib
Re: [GCC][testsuite][ARM][AArch64] Add ARM v8.6 effective target checks to target-supports.exp
On 12/18/19 4:47 PM, Richard Sandiford wrote: > Stam Markianos-Wright writes: >> On 12/13/19 11:15 AM, Richard Sandiford wrote: >>> Stam Markianos-Wright writes: >>>> Hi all, >>>> >>>> This small patch adds support for the ARM v8.6 extensions +bf16 and >>>> +i8mm to the testsuite. This will be tested through other upcoming >>>> patches, which is why we are not providing any explicit tests here. >>>> >>>> Ok for trunk? >>>> >>>> Also I don't have commit rights, so if someone could commit on my >>>> behalf, that would be great :) >>>> >>>> The functionality here depends on CLI patches: >>>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02415.html >>>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html >>>> >>>> but this patch applies cleanly without them, too. >>>> >>>> Cheers, >>>> Stam >>>> >>>> >>>> gcc/testsuite/ChangeLog: >>>> >>>> 2019-12-11 Stam Markianos-Wright >>>> >>>>* lib/target-supports.exp >>>>(check_effective_target_arm_v8_2a_i8mm_ok_nocache): New. >>>>(check_effective_target_arm_v8_2a_i8mm_ok): New. >>>>(add_options_for_arm_v8_2a_i8mm): New. >>>>(check_effective_target_arm_v8_2a_bf16_neon_ok_nocache): New. >>>>(check_effective_target_arm_v8_2a_bf16_neon_ok): New. >>>>(add_options_for_arm_v8_2a_bf16_neon): New. >>> >>> The new effective-target keywords need to be documented in >>> doc/sourcebuild.texi. >> >> Added in new diff :) >> >>> >>> LGTM otherwise. For: >>> >>>> diff --git a/gcc/testsuite/lib/target-supports.exp >>>> b/gcc/testsuite/lib/target-supports.exp >>>> index 5b4cc02f921..36fb63e9929 100644 >>>> --- a/gcc/testsuite/lib/target-supports.exp >>>> +++ b/gcc/testsuite/lib/target-supports.exp >>>> @@ -4781,6 +4781,49 @@ proc add_options_for_arm_v8_2a_dotprod_neon { flags >>>> } { >>>>return "$flags $et_arm_v8_2a_dotprod_neon_flags" >>>>} >>>> >>>> +# Return 1 if the target supports ARMv8.2+i8mm Adv.SIMD Dot Product >>>> +# instructions, 0 otherwise. The test is valid for ARM and for AArch64. >>>> +# Record the command line options needed. >>>> + >>>> +proc check_effective_target_arm_v8_2a_i8mm_ok_nocache { } { >>>> +global et_arm_v8_2a_i8mm_flags >>>> +set et_arm_v8_2a_i8mm_flags "" >>>> + >>>> +if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } { >>>> +return 0; >>>> +} >>>> + >>>> +# Iterate through sets of options to find the compiler flags that >>>> +# need to be added to the -march option. >>>> +foreach flags {"" "-mfloat-abi=hard -mfpu=neon-fp-armv8" >>>> "-mfloat-abi=softfp -mfpu=neon-fp-armv8" } { >>>> +if { [check_no_compiler_messages_nocache \ >>>> + arm_v8_2a_i8mm_ok object { >>>> +#include >>>> +#if !defined (__ARM_FEATURE_MATMUL_INT8) >>>> +#error "__ARM_FEATURE_MATMUL_INT8 not defined" >>>> +#endif >>>> +} "$flags -march=armv8.2-a+i8mm"] } { >>>> +set et_arm_v8_2a_i8mm_flags "$flags -march=armv8.2-a+i8mm" >>>> +return 1 >>>> +} >>>> +} >>> >>> I wondered whether it would be better to add no options if testing >>> with something that already supports i8mm (e.g. -march=armv8.6). >>> That would mean trying: >>> >>> "" "-march=armv8.2-a+i8mm" "-march=armv8.2-a+i8mm -mfloat-abi..." ... >>> >>> instead. But there are arguments both ways, and the above follows >>> existing style, so OK. >> >> Not quite sure if I understanding this right, but I think that's what >> the "" option in foreach flags{} is for? >> >> i.e. currently what I'm seeing is: >> >> +/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */ >> +/* { dg-add-options arm_v8_2a_i8mm } */ >> >> will pull through the first option that compiles to object file with no >> errors (check_no_compiler_messages_nocache arm_v8_2a
Re: [GCC][PATCH][AArch64]Add ACLE intrinsics for dot product (usdot - vector, dot - by element) for AArch64 AdvSIMD ARMv8.6 Extension
On 12/13/19 11:02 AM, Richard Sandiford wrote: > Stam Markianos-Wright writes: >> @@ -573,6 +586,44 @@ >> [(set_attr "type" "neon_dot")] >> ) >> >> +;; These instructions map to the __builtins for the armv8.6a I8MM usdot, >> sudot >> +;; (by element) Dot Product operations. >> +(define_insn "aarch64_dot_lane" >> + [(set (match_operand:VS 0 "register_operand" "=w") >> +(plus:VS (match_operand:VS 1 "register_operand" "0") >> +(unspec:VS [(match_operand: 2 "register_operand" "w") >> +(match_operand:V8QI 3 "register_operand" "") >> +(match_operand:SI 4 "immediate_operand" "i")] >> +DOTPROD_I8MM)))] >> + "TARGET_SIMD && TARGET_I8MM" >> + { >> +int nunits = GET_MODE_NUNITS (V8QImode).to_constant (); >> +int lane = INTVAL (operands[4]); >> +operands[4] >> += gen_int_mode (ENDIAN_LANE_N (nunits / 4, lane), SImode); >> +return "dot\\t%0., %2., %3.4b[%4]"; >> + } >> + [(set_attr "type" "neon_dot")] >> +) >> + >> +(define_insn "aarch64_dot_laneq" >> + [(set (match_operand:VS 0 "register_operand" "=w") >> +(plus:VS (match_operand:VS 1 "register_operand" "0") >> +(unspec:VS [(match_operand: 2 "register_operand" "w") >> +(match_operand:V16QI 3 "register_operand" "") > > Using seems a bit redundant when it's always "w" in this context, > but either's fine. Done! > >> +(match_operand:SI 4 "immediate_operand" "i")] >> +DOTPROD_I8MM)))] >> + "TARGET_SIMD && TARGET_I8MM" >> + { >> +int nunits = GET_MODE_NUNITS (V16QImode).to_constant (); >> +int lane = INTVAL (operands[4]); >> +operands[4] >> += gen_int_mode (ENDIAN_LANE_N (nunits / 4, lane), SImode); > > Nit: = should be indented two spaces more, and there should be only > one space afterwards. But the statement fits on one line, so probably > better not to have the line break at all. I put put all onto one line. > >> +return "dot\\t%0., %2., %3.4b[%4]"; >> + } >> + [(set_attr "type" "neon_dot")] >> +) > > These two patterns can be merged using :VB for operand 3. Merged them. I also changed the tests to use the new check-function-bodies according to downstream comments. This helps check that the assembler scans are done in the right order and ensures that the correct assembler was generated from the right function call (as opposed to "somewhere in the output file"). Hope this looks better :D Cheers, Stam > > LGTM otherwise, thanks. > > Richard > diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index c35a1b1f0299ce5af8ca1a3df0209614f7bd0f25..6bd26889f2f26a9f82dd6d40f50125eaeee41740 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -107,6 +107,9 @@ enum aarch64_type_qualifiers /* Lane indices selected in pairs. - must be in range, and flipped for bigendian. */ qualifier_lane_pair_index = 0x800, + /* Lane indices selected in quadtuplets. - must be in range, and flipped for + bigendian. */ + qualifier_lane_quadtup_index = 0x1000, }; typedef struct @@ -173,6 +176,10 @@ aarch64_types_ternopu_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned, qualifier_immediate }; #define TYPES_TERNOPUI (aarch64_types_ternopu_imm_qualifiers) +static enum aarch64_type_qualifiers +aarch64_types_ternop_ssus_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_unsigned, qualifier_none }; +#define TYPES_TERNOP_SSUS (aarch64_types_ternop_ssus_qualifiers) static enum aarch64_type_qualifiers @@ -191,6 +198,19 @@ aarch64_types_quadopu_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] qualifier_unsigned, qualifier_lane_index }; #define TYPES_QUADOPU_LANE (aarch64_types_quadopu_lane_qualifiers) +static enum aarch64_type_qualifiers +aarch64_types_quadopssus_lane_quadtup_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_unsigned, + qualifier_none, qualifier_lane_quadtup_index }; +#define TYPES_QUADOPSSUS_LANE_QUADTUP \ + (aarch64_types_quadopssus_lane_quadtup_qualifiers) +static enum aarch64_type_qualifiers +aarch64_types_quadopsssu_lane_quadtup_qualifiers[SIMD_MAX_BUILTIN_ARGS] + =
[PATCH, committed] Add myself to MAINTAINERS.
Hi all, I have committed the attached patch adding myself to the Write After Approval section of the MAINTAINERS file. Cheers, Stam (commits r279573, r279575) 2019-12-19 Stam Markianos-Wright * MAINTAINERS (write_after_approval): Add myself. diff --git a/MAINTAINERS b/MAINTAINERS index e31fb19760e..3d78697e191 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -499,6 +499,7 @@ Luis Machado Ziga Mahkovec Matthew Malcomson Mikhail Maltsev +Stamatis Markianos-Wright Patrick Marlier Simon Martin Alejandro Martinez
[GCC][PATCH][Aarch64] Add Bfloat16_t scalar type, vector types and machine modes to Aarch64 back-end [2/2]
Hi all, This patch is part 2 of Bfloat16_t enablement in the Aarch64 back-end. This new type is constrained using target hooks TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP, TARGET_INVALID_BINARY_OP so that it may only be used through ACLE intrinsics (will be provided in later patches). Regression testing on aarch64-none-elf passed successfully. Ok for trunk? Cheers, Stam ACLE documents are at https://developer.arm.com/docs/101028/latest ISA documents are at https://developer.arm.com/docs/ddi0596/latest Details on ARM Bfloat can be found here: https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a PS. I don't have commit rights, so if someone could commit on my behalf, that would be great :) gcc/ChangeLog: 2019-12-16 Stam Markianos-Wright * config/aarch64/aarch64.c (aarch64_invalid_conversion): New function for target hook. (aarch64_invalid_unary_op): Likewise. (aarch64_invalid_binary_op): Likewise. (TARGET_INVALID_CONVERSION): Add back-end define for target hook. (TARGET_INVALID_UNARY_OP): Likewise. (TARGET_INVALID_BINARY_OP): Likewise. gcc/testsuite/ChangeLog: 2019-12-16 Stam Markianos-Wright * gcc.target/aarch64/bfloat16_scalar_typecheck.c: New test. * gcc.target/aarch64/bfloat16_vector_typecheck1.c: New test. * gcc.target/aarch64/bfloat16_vector_typecheck2.c: New test. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index f57469b6e23..f40f6432fd4 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -21661,6 +21661,68 @@ aarch64_stack_protect_guard (void) return NULL_TREE; } +/* Return the diagnostic message string if conversion from FROMTYPE to + TOTYPE is not allowed, NULL otherwise. */ + +static const char * +aarch64_invalid_conversion (const_tree fromtype, const_tree totype) +{ + static char templ[100]; + if ((GET_MODE_INNER (TYPE_MODE (fromtype)) == BFmode + || GET_MODE_INNER (TYPE_MODE (totype)) == BFmode) + && TYPE_MODE (fromtype) != TYPE_MODE (totype)) + { +snprintf (templ, sizeof (templ), \ + "incompatible types when assigning to type '%s' from type '%s'", + IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (totype))), + IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (fromtype; +return N_(templ); + } + /* Conversion allowed. */ + return NULL; +} + +/* Return the diagnostic message string if the unary operation OP is + not permitted on TYPE, NULL otherwise. */ + +static const char * +aarch64_invalid_unary_op (int op, const_tree type) +{ + static char templ[100]; + /* Reject all single-operand operations on BFmode except for &. */ + if (GET_MODE_INNER (TYPE_MODE (type)) == BFmode && op != ADDR_EXPR) + { +snprintf (templ, sizeof (templ), + "operation not permitted on type '%s'", + IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type; +return N_(templ); + } + /* Operation allowed. */ + return NULL; +} + +/* Return the diagnostic message string if the binary operation OP is + not permitted on TYPE1 and TYPE2, NULL otherwise. */ + +static const char * +aarch64_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1, + const_tree type2) +{ + static char templ[100]; + /* Reject all 2-operand operations on BFmode. */ + if (GET_MODE_INNER (TYPE_MODE (type1)) == BFmode + || GET_MODE_INNER (TYPE_MODE (type2)) == BFmode) + { +snprintf (templ, sizeof (templ), \ + "operation not permitted on types '%s', '%s'", + IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type1))), + IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type2; +return N_(templ); + } + /* Operation allowed. */ + return NULL; +} + /* Implement TARGET_ASM_FILE_END for AArch64. This adds the AArch64 GNU NOTE section at the end if needed. */ #define GNU_PROPERTY_AARCH64_FEATURE_1_AND 0xc000 @@ -21911,6 +21973,15 @@ aarch64_libgcc_floating_mode_supported_p #undef TARGET_MANGLE_TYPE #define TARGET_MANGLE_TYPE aarch64_mangle_type +#undef TARGET_INVALID_CONVERSION +#define TARGET_INVALID_CONVERSION aarch64_invalid_conversion + +#undef TARGET_INVALID_UNARY_OP +#define TARGET_INVALID_UNARY_OP aarch64_invalid_unary_op + +#undef TARGET_INVALID_BINARY_OP +#define TARGET_INVALID_BINARY_OP aarch64_invalid_binary_op + #undef TARGET_VERIFY_TYPE_CONTEXT #define TARGET_VERIFY_TYPE_CONTEXT aarch64_verify_type_context diff --git a/gcc/testsuite/gcc.target/aarch64/bfloat16_scalar_typecheck.c b/gcc/testsuite/gcc.target/aarch64/bfloat16_scalar_typecheck.c new file mode 100644 index 000..6f6a6af9587 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/bfloat16_scalar_typecheck.c @@ -0,0 +1,83 @@ +/* { dg-do compile { target { aarch64*-*-* } } } */ +/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */ +/* { dg-option
[GCC][PATCH][Aarch64] Add Bfloat16_t scalar type, vector types and machine modes to Aarch64 back-end [1/2]
Hi all, This patch adds Bfloat type support to the ARM back-end. It also adds a new machine_mode (BFmode) for this type and accompanying Vector modes V4BFmode and V8BFmode. The second patch in this series uses existing target hooks to restrict type use. Regression testing on aarch64-none-elf passed successfully. This patch depends on: https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html for test suite effective_target update. Ok for trunk? Cheers, Stam ACLE documents are at https://developer.arm.com/docs/101028/latest ISA documents are at https://developer.arm.com/docs/ddi0596/latest Details on ARM Bfloat can be found here: https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a PS. I don't have commit rights, so if someone could commit on my behalf, that would be great :) gcc/ChangeLog: 2019-12-16 Stam Markianos-Wright * config.gcc: Add arm_bf16.h. * config/aarch64/aarch64-builtins.c (aarch64_simd_builtin_std_type): Add BFmode. (aarch64_init_simd_builtin_types): Add element types for vector types. (aarch64_init_bf16_types): New function. (aarch64_general_init_builtins): Add arm_init_bf16_types function call. * config/aarch64/aarch64-modes.def: Add BFmode and vector modes. * config/aarch64/aarch64-simd-builtin-types.def: * config/aarch64/aarch64-simd.md: Add BF types to NEON move patterns. * config/aarch64/aarch64.c (aarch64_classify_vector_mode): Add BF modes. (aarch64_gimplify_va_arg_expr): Add BFmode. * config/aarch64/aarch64.h (AARCH64_VALID_SIMD_DREG_MODE): Add V4BF. (AARCH64_VALID_SIMD_QREG_MODE): Add V8BF. * config/aarch64/aarch64.md: New enabled_for_bfmode_scalar, enabled_for_bfmode_vector attributes. Add BFmode to movhf pattern. * config/aarch64/arm_bf16.h: New file. * config/aarch64/arm_neon.h: Add arm_bf16.h and Bfloat vector types. * config/aarch64/iterators.md (HFBF, GPF_TF_F16_MOV, VDMOV, VQMOV, VALL_F16MOV): New. gcc/testsuite/ChangeLog: 2019-12-16 Stam Markianos-Wright * gcc.target/aarch64/bfloat16_compile.c: New test. diff --git a/gcc/config.gcc b/gcc/config.gcc index 9802f436e06..b49c110ccaf 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -315,7 +315,7 @@ m32c*-*-*) ;; aarch64*-*-*) cpu_type=aarch64 - extra_headers="arm_fp16.h arm_neon.h arm_acle.h arm_sve.h" + extra_headers="arm_fp16.h arm_neon.h arm_bf16.h arm_acle.h arm_sve.h" c_target_objs="aarch64-c.o" cxx_target_objs="aarch64-c.o" d_target_objs="aarch64-d.o" diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index c35a1b1f029..3ba2f12166f 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -68,6 +68,9 @@ #define hi_UPE_HImode #define hf_UPE_HFmode #define qi_UPE_QImode +#define bf_UPE_BFmode +#define v4bf_UP E_V4BFmode +#define v8bf_UP E_V8BFmode #define UP(X) X##_UP #define SIMD_MAX_BUILTIN_ARGS 5 @@ -568,6 +571,10 @@ static tree aarch64_simd_intXI_type_node = NULL_TREE; tree aarch64_fp16_type_node = NULL_TREE; tree aarch64_fp16_ptr_type_node = NULL_TREE; +/* Back-end node type for brain float (bfloat) types. */ +tree aarch64_bf16_type_node = NULL_TREE; +tree aarch64_bf16_ptr_type_node = NULL_TREE; + /* Wrapper around add_builtin_function. NAME is the name of the built-in function, TYPE is the function type, and CODE is the function subcode (relative to AARCH64_BUILTIN_GENERAL). */ @@ -659,6 +666,8 @@ aarch64_simd_builtin_std_type (machine_mode mode, return float_type_node; case E_DFmode: return double_type_node; +case E_BFmode: + return aarch64_bf16_type_node; default: gcc_unreachable (); } @@ -750,6 +759,11 @@ aarch64_init_simd_builtin_types (void) aarch64_simd_types[Float64x1_t].eltype = double_type_node; aarch64_simd_types[Float64x2_t].eltype = double_type_node; + +/* Init Bfloat vector types with underlying uint types. */ + aarch64_simd_types[Bfloat16x4_t].eltype = aarch64_bf16_type_node; + aarch64_simd_types[Bfloat16x8_t].eltype = aarch64_bf16_type_node; + for (i = 0; i < nelts; i++) { tree eltype = aarch64_simd_types[i].eltype; @@ -1059,6 +1073,19 @@ aarch64_init_fp16_types (void) aarch64_fp16_ptr_type_node = build_pointer_type (aarch64_fp16_type_node); } +/* Initialize the backend REAL_TYPE type supporting bfloat types. */ +static void +aarch64_init_bf16_types (void) +{ + aarch64_bf16_type_node = make_node (REAL_TYPE); + TYPE_PRECISION (aarch64_bf16_type_node) = 16; + SET_TYPE_MODE (aarch64_bf16_type_node, BFmode); + layout_type (aarch64_bf16_type_node); + + (*lang_hooks.types.register_builtin_type) (aarch64_bf16_type_node, "__bf16"); + aarch64_
[Ping][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, vdot - by element) for AArch32 AdvSIMD ARMv8.6 Extension
On 12/13/19 10:22 AM, Stam Markianos-Wright wrote: > Hi all, > > This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product > operations (vector/by element) to the ARM back-end. > > These are: > usdot (vector), dot (by element). > > The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and > for ARM they remain optional as of ARMv8.6-a. > > The functions are declared in arm_neon.h, RTL patterns are defined to > generate assembler and tests are added to verify and perform adequate > checks. > > Regression testing on arm-none-eabi passed successfully. > > This patch depends on: > > https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html > > for ARM CLI updates, and on: > > https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html > > for testsuite effective_target update. > > Ok for trunk? .Ping :) > > Cheers, > Stam > > > ACLE documents are at https://developer.arm.com/docs/101028/latest > ISA documents are at https://developer.arm.com/docs/ddi0596/latest > > PS. I don't have commit rights, so if someone could commit on my behalf, > that would be great :) > > > gcc/ChangeLog: > > 2019-11-28 Stam Markianos-Wright > > * config/arm/arm-builtins.c (enum arm_type_qualifiers): > (USTERNOP_QUALIFIERS): New define. > (USMAC_LANE_QUADTUP_QUALIFIERS): New define. > (SUMAC_LANE_QUADTUP_QUALIFIERS): New define. > (arm_expand_builtin_args): > Add case ARG_BUILTIN_LANE_QUADTUP_INDEX. > (arm_expand_builtin_1): Add qualifier_lane_quadtup_index. > * config/arm/arm_neon.h (vusdot_s32): New. > (vusdot_lane_s32): New. > (vusdotq_lane_s32): New. > (vsudot_lane_s32): New. > (vsudotq_lane_s32): New. > * config/arm/arm_neon_builtins.def > (usdot,usdot_lane,sudot_lane): New. > * config/arm/iterators.md (DOTPROD_I8MM): New. > (sup, opsuffix): Add . > * config/arm/neon.md (neon_usdot, dot_lane: New. > * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New. > > > gcc/testsuite/ChangeLog: > > 2019-12-12 Stam Markianos-Wright > > * gcc.target/arm/simd/vdot-compile-2-1.c: New test. > * gcc.target/arm/simd/vdot-compile-2-2.c: New test. > * gcc.target/arm/simd/vdot-compile-2-3.c: New test. > * gcc.target/arm/simd/vdot-compile-2-4.c: New test. > >
Re: [GCC][testsuite][ARM][AArch64] Add ARM v8.6 effective target checks to target-supports.exp
On 12/13/19 11:15 AM, Richard Sandiford wrote: > Stam Markianos-Wright writes: >> Hi all, >> >> This small patch adds support for the ARM v8.6 extensions +bf16 and >> +i8mm to the testsuite. This will be tested through other upcoming >> patches, which is why we are not providing any explicit tests here. >> >> Ok for trunk? >> >> Also I don't have commit rights, so if someone could commit on my >> behalf, that would be great :) >> >> The functionality here depends on CLI patches: >> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02415.html >> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html >> >> but this patch applies cleanly without them, too. >> >> Cheers, >> Stam >> >> >> gcc/testsuite/ChangeLog: >> >> 2019-12-11 Stam Markianos-Wright >> >> * lib/target-supports.exp >> (check_effective_target_arm_v8_2a_i8mm_ok_nocache): New. >> (check_effective_target_arm_v8_2a_i8mm_ok): New. >> (add_options_for_arm_v8_2a_i8mm): New. >> (check_effective_target_arm_v8_2a_bf16_neon_ok_nocache): New. >> (check_effective_target_arm_v8_2a_bf16_neon_ok): New. >> (add_options_for_arm_v8_2a_bf16_neon): New. > > The new effective-target keywords need to be documented in > doc/sourcebuild.texi. Added in new diff :) > > LGTM otherwise. For: > >> diff --git a/gcc/testsuite/lib/target-supports.exp >> b/gcc/testsuite/lib/target-supports.exp >> index 5b4cc02f921..36fb63e9929 100644 >> --- a/gcc/testsuite/lib/target-supports.exp >> +++ b/gcc/testsuite/lib/target-supports.exp >> @@ -4781,6 +4781,49 @@ proc add_options_for_arm_v8_2a_dotprod_neon { flags } >> { >> return "$flags $et_arm_v8_2a_dotprod_neon_flags" >> } >> >> +# Return 1 if the target supports ARMv8.2+i8mm Adv.SIMD Dot Product >> +# instructions, 0 otherwise. The test is valid for ARM and for AArch64. >> +# Record the command line options needed. >> + >> +proc check_effective_target_arm_v8_2a_i8mm_ok_nocache { } { >> +global et_arm_v8_2a_i8mm_flags >> +set et_arm_v8_2a_i8mm_flags "" >> + >> +if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } { >> +return 0; >> +} >> + >> +# Iterate through sets of options to find the compiler flags that >> +# need to be added to the -march option. >> +foreach flags {"" "-mfloat-abi=hard -mfpu=neon-fp-armv8" >> "-mfloat-abi=softfp -mfpu=neon-fp-armv8" } { >> +if { [check_no_compiler_messages_nocache \ >> + arm_v8_2a_i8mm_ok object { >> +#include >> +#if !defined (__ARM_FEATURE_MATMUL_INT8) >> +#error "__ARM_FEATURE_MATMUL_INT8 not defined" >> +#endif >> +} "$flags -march=armv8.2-a+i8mm"] } { >> +set et_arm_v8_2a_i8mm_flags "$flags -march=armv8.2-a+i8mm" >> +return 1 >> +} >> +} > > I wondered whether it would be better to add no options if testing > with something that already supports i8mm (e.g. -march=armv8.6). > That would mean trying: > >"" "-march=armv8.2-a+i8mm" "-march=armv8.2-a+i8mm -mfloat-abi..." ... > > instead. But there are arguments both ways, and the above follows > existing style, so OK. Not quite sure if I understanding this right, but I think that's what the "" option in foreach flags{} is for? i.e. currently what I'm seeing is: +/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */ +/* { dg-add-options arm_v8_2a_i8mm } */ will pull through the first option that compiles to object file with no errors (check_no_compiler_messages_nocache arm_v8_2a_i8mm_ok object). So in a lot of cases it should just be fine for "" and only pull in -march=armv8.2-a+i8mm. I think that's right? Lmk if I'm not reading it properly! Cheers, Stam > > Thanks, > Richard > diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index 85573a49a2b..73408d12cbe 100644 --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -1877,6 +1877,18 @@ ARM target supports extensions to generate the @code{VFMAL} and @code{VFMLS} half-precision floating-point instructions available from ARMv8.2-A and onwards. Some multilibs may be incompatible with these options. +@item arm_v8_2a_bf16_neon_ok +@anchor{arm_v8_2a_bf16_neon_ok} +ARM target supports options to generate instructions from ARMv8.2-A with +the BFloat16 extension (bf16). Some multilibs may be incompatible with these +options. + +@ite
[GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, vdot - by element) for AArch32 AdvSIMD ARMv8.6 Extension
Hi all, This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product operations (vector/by element) to the ARM back-end. These are: usdot (vector), dot (by element). The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and for ARM they remain optional as of ARMv8.6-a. The functions are declared in arm_neon.h, RTL patterns are defined to generate assembler and tests are added to verify and perform adequate checks. Regression testing on arm-none-eabi passed successfully. This patch depends on: https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html for ARM CLI updates, and on: https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html for testsuite effective_target update. Ok for trunk? Cheers, Stam ACLE documents are at https://developer.arm.com/docs/101028/latest ISA documents are at https://developer.arm.com/docs/ddi0596/latest PS. I don't have commit rights, so if someone could commit on my behalf, that would be great :) gcc/ChangeLog: 2019-11-28 Stam Markianos-Wright * config/arm/arm-builtins.c (enum arm_type_qualifiers): (USTERNOP_QUALIFIERS): New define. (USMAC_LANE_QUADTUP_QUALIFIERS): New define. (SUMAC_LANE_QUADTUP_QUALIFIERS): New define. (arm_expand_builtin_args): Add case ARG_BUILTIN_LANE_QUADTUP_INDEX. (arm_expand_builtin_1): Add qualifier_lane_quadtup_index. * config/arm/arm_neon.h (vusdot_s32): New. (vusdot_lane_s32): New. (vusdotq_lane_s32): New. (vsudot_lane_s32): New. (vsudotq_lane_s32): New. * config/arm/arm_neon_builtins.def (usdot,usdot_lane,sudot_lane): New. * config/arm/iterators.md (DOTPROD_I8MM): New. (sup, opsuffix): Add . * config/arm/neon.md (neon_usdot, dot_lane: New. * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New. gcc/testsuite/ChangeLog: 2019-12-12 Stam Markianos-Wright * gcc.target/arm/simd/vdot-compile-2-1.c: New test. * gcc.target/arm/simd/vdot-compile-2-2.c: New test. * gcc.target/arm/simd/vdot-compile-2-3.c: New test. * gcc.target/arm/simd/vdot-compile-2-4.c: New test. diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 2d902d0b325bc1fe5e22831ef8a59a2bb37c1225..a63c1a978fb1d436065ce9f5f082249c4ebf5ade 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -86,7 +86,10 @@ enum arm_type_qualifiers qualifier_const_void_pointer = 0x802, /* Lane indices selected in pairs - must be within range of previous argument = a vector. */ - qualifier_lane_pair_index = 0x1000 + qualifier_lane_pair_index = 0x1000, + /* Lane indices selected in quadtuplets - must be within range of previous + argument = a vector. */ + qualifier_lane_quadtup_index = 0x2000 }; /* The qualifier_internal allows generation of a unary builtin from @@ -122,6 +125,13 @@ arm_unsigned_uternop_qualifiers[SIMD_MAX_BUILTIN_ARGS] qualifier_unsigned }; #define UTERNOP_QUALIFIERS (arm_unsigned_uternop_qualifiers) +/* T (T, unsigned T, T). */ +static enum arm_type_qualifiers +arm_usternop_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_unsigned, + qualifier_none }; +#define USTERNOP_QUALIFIERS (arm_usternop_qualifiers) + /* T (T, immediate). */ static enum arm_type_qualifiers arm_binop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] @@ -176,6 +186,20 @@ arm_umac_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] qualifier_unsigned, qualifier_lane_index }; #define UMAC_LANE_QUALIFIERS (arm_umac_lane_qualifiers) +/* T (T, unsigned T, T, lane index). */ +static enum arm_type_qualifiers +arm_usmac_lane_quadtup_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_unsigned, + qualifier_none, qualifier_lane_quadtup_index }; +#define USMAC_LANE_QUADTUP_QUALIFIERS (arm_usmac_lane_quadtup_qualifiers) + +/* T (T, T, unsigend T, lane index). */ +static enum arm_type_qualifiers +arm_sumac_lane_quadtup_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_none, + qualifier_unsigned, qualifier_lane_quadtup_index }; +#define SUMAC_LANE_QUADTUP_QUALIFIERS (arm_sumac_lane_quadtup_qualifiers) + /* T (T, T, immediate). */ static enum arm_type_qualifiers arm_ternop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] @@ -2148,6 +2172,7 @@ typedef enum { ARG_BUILTIN_LANE_INDEX, ARG_BUILTIN_STRUCT_LOAD_STORE_LANE_INDEX, ARG_BUILTIN_LANE_PAIR_INDEX, + ARG_BUILTIN_LANE_QUADTUP_INDEX, ARG_BUILTIN_NEON_MEMORY, ARG_BUILTIN_MEMORY, ARG_BUILTIN_STOP @@ -2296,11 +2321,24 @@ arm_expand_builtin_args (rtx target, machine_mode map_mode, int fcode, if (CONST_INT_P (op[argc])) { machine_mode vmode = mode[argc - 1]; - neon_lane_bounds (op[argc], 0, GET_MODE_NUNITS (vmode) / 2, exp); + neon_lane_bounds (op[argc], 0, +GET_MODE_NUNITS (vmode) / 2, exp); + } + /* If the lane index
[GCC][PATCH][AArch64]Add ACLE intrinsics for dot product (usdot - vector, dot - by element) for AArch64 AdvSIMD ARMv8.6 Extension
Hi all, This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product operations (vector/by element) to AArch64. These are: usdot (vector), dot (by element). The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and are then enabled by default from ARMv8.6a. The functions are declared in arm_neon.h, RTL patterns are defined to generate assembler and tests are added to verify them and perform adequate checks. Regression testing on aarch64-none-elf passed successfully. This patch depends on: https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02415.html for Aaarch64 CLI updates, and on: https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html for the testsuite effective_target update. Ok for trunk? Cheers, Stam ACLE documents are at https://developer.arm.com/docs/101028/latest ISA documents are at https://developer.arm.com/docs/ddi0596/latest PS. I don't have commit rights, so if someone could commit on my behalf, that would be great :) gcc/ChangeLog: 2019-11-28 Stam Markianos-Wright * config/aarch64/aarch64-builtins.c: (enum aarch64_type_qualifiers) New qualifier_lane_quadtup_index, TYPES_TERNOP_SSUS, TYPES_QUADOPSSUS_LANE_QUADTUP, TYPES_QUADOPSSSU_LANE_QUADTUP. (aarch64_simd_expand_args): Add case SIMD_ARG_LANE_QUADTUP_INDEX. (aarch64_simd_expand_builtin): Add qualifier_lane_quadtup_index. * config/aarch64/aarch64-simd-builtins.def (usdot, usdot_lane, usdot_laneq, sudot_lane,sudot_laneq): New. * config/aarch64/aarch64-simd.md (aarch64_usdot): New . (aarch64_dot_lane): New. (aarch64_dot_laneq): New. * config/aarch64/arm_neon.h (vusdot_s32): New. (vusdotq_s32): New. (vusdot_lane_s32): New. (vsudot_lane_s32): New. * config/aarch64/iterators.md (DOTPROD_I8MM): New iterator. (UNSPEC_USDOT, UNSPEC_SUDOT): New unspecs. gcc/testsuite/ChangeLog: 2019-11-28 Stam Markianos-Wright * gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-3.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-4.c: New test. diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index c35a1b1f0299ce5af8ca1a3df0209614f7bd0f25..6bd26889f2f26a9f82dd6d40f50125eaeee41740 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -107,6 +107,9 @@ enum aarch64_type_qualifiers /* Lane indices selected in pairs. - must be in range, and flipped for bigendian. */ qualifier_lane_pair_index = 0x800, + /* Lane indices selected in quadtuplets. - must be in range, and flipped for + bigendian. */ + qualifier_lane_quadtup_index = 0x1000, }; typedef struct @@ -173,6 +176,10 @@ aarch64_types_ternopu_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned, qualifier_immediate }; #define TYPES_TERNOPUI (aarch64_types_ternopu_imm_qualifiers) +static enum aarch64_type_qualifiers +aarch64_types_ternop_ssus_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_unsigned, qualifier_none }; +#define TYPES_TERNOP_SSUS (aarch64_types_ternop_ssus_qualifiers) static enum aarch64_type_qualifiers @@ -191,6 +198,19 @@ aarch64_types_quadopu_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] qualifier_unsigned, qualifier_lane_index }; #define TYPES_QUADOPU_LANE (aarch64_types_quadopu_lane_qualifiers) +static enum aarch64_type_qualifiers +aarch64_types_quadopssus_lane_quadtup_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_unsigned, + qualifier_none, qualifier_lane_quadtup_index }; +#define TYPES_QUADOPSSUS_LANE_QUADTUP \ + (aarch64_types_quadopssus_lane_quadtup_qualifiers) +static enum aarch64_type_qualifiers +aarch64_types_quadopsssu_lane_quadtup_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_none, + qualifier_unsigned, qualifier_lane_quadtup_index }; +#define TYPES_QUADOPSSSU_LANE_QUADTUP \ + (aarch64_types_quadopsssu_lane_quadtup_qualifiers) + static enum aarch64_type_qualifiers aarch64_types_quadopu_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned, @@ -1260,6 +1280,7 @@ typedef enum SIMD_ARG_LANE_INDEX, SIMD_ARG_STRUCT_LOAD_STORE_LANE_INDEX, SIMD_ARG_LANE_PAIR_INDEX, + SIMD_ARG_LANE_QUADTUP_INDEX, SIMD_ARG_STOP } builtin_simd_arg; @@ -1349,9 +1370,25 @@ aarch64_simd_expand_args (rtx target, int icode, int have_retval, op[opc] = gen_int_mode (ENDIAN_LANE_N (nunits / 2, lane), SImode); } - /* Fall through - if the lane index isn't a constant then - the next case will error. */ - /* FALLTHRU */ + /* If the lane index isn't a constant
[GCC][testsuite][ARM][AArch64] Add ARM v8.6 effective target checks to target-supports.exp
Hi all, This small patch adds support for the ARM v8.6 extensions +bf16 and +i8mm to the testsuite. This will be tested through other upcoming patches, which is why we are not providing any explicit tests here. Ok for trunk? Also I don't have commit rights, so if someone could commit on my behalf, that would be great :) The functionality here depends on CLI patches: https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02415.html https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html but this patch applies cleanly without them, too. Cheers, Stam gcc/testsuite/ChangeLog: 2019-12-11 Stam Markianos-Wright * lib/target-supports.exp (check_effective_target_arm_v8_2a_i8mm_ok_nocache): New. (check_effective_target_arm_v8_2a_i8mm_ok): New. (add_options_for_arm_v8_2a_i8mm): New. (check_effective_target_arm_v8_2a_bf16_neon_ok_nocache): New. (check_effective_target_arm_v8_2a_bf16_neon_ok): New. (add_options_for_arm_v8_2a_bf16_neon): New. diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 5b4cc02f921..36fb63e9929 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -4781,6 +4781,49 @@ proc add_options_for_arm_v8_2a_dotprod_neon { flags } { return "$flags $et_arm_v8_2a_dotprod_neon_flags" } +# Return 1 if the target supports ARMv8.2+i8mm Adv.SIMD Dot Product +# instructions, 0 otherwise. The test is valid for ARM and for AArch64. +# Record the command line options needed. + +proc check_effective_target_arm_v8_2a_i8mm_ok_nocache { } { +global et_arm_v8_2a_i8mm_flags +set et_arm_v8_2a_i8mm_flags "" + +if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } { +return 0; +} + +# Iterate through sets of options to find the compiler flags that +# need to be added to the -march option. +foreach flags {"" "-mfloat-abi=hard -mfpu=neon-fp-armv8" "-mfloat-abi=softfp -mfpu=neon-fp-armv8" } { +if { [check_no_compiler_messages_nocache \ + arm_v8_2a_i8mm_ok object { +#include +#if !defined (__ARM_FEATURE_MATMUL_INT8) +#error "__ARM_FEATURE_MATMUL_INT8 not defined" +#endif +} "$flags -march=armv8.2-a+i8mm"] } { +set et_arm_v8_2a_i8mm_flags "$flags -march=armv8.2-a+i8mm" +return 1 +} +} + +return 0; +} + +proc check_effective_target_arm_v8_2a_i8mm_ok { } { +return [check_cached_effective_target arm_v8_2a_i8mm_ok \ +check_effective_target_arm_v8_2a_i8mm_ok_nocache] +} + +proc add_options_for_arm_v8_2a_i8mm { flags } { +if { ! [check_effective_target_arm_v8_2a_i8mm_ok] } { +return "$flags" +} +global et_arm_v8_2a_i8mm_flags +return "$flags $et_arm_v8_2a_i8mm_flags" +} + # Return 1 if the target supports FP16 VFMAL and VFMSL # instructions, 0 otherwise. # Record the command line options needed. @@ -4826,6 +4869,45 @@ proc add_options_for_arm_fp16fml_neon { flags } { return "$flags $et_arm_fp16fml_neon_flags" } +# Return 1 if the target supports BFloat16 SIMD instructions, 0 otherwise. +# The test is valid for ARM and for AArch64. + +proc check_effective_target_arm_v8_2a_bf16_neon_ok_nocache { } { +global et_arm_v8_2a_bf16_neon_flags +set et_arm_v8_2a_bf16_neon_flags "" + +if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } { +return 0; +} + +foreach flags {"" "-mfloat-abi=hard -mfpu=neon-fp-armv8" "-mfloat-abi=softfp -mfpu=neon-fp-armv8" } { +if { [check_no_compiler_messages_nocache arm_v8_2a_bf16_neon_ok object { +#include +#if !defined (__ARM_FEATURE_BF16_VECTOR_ARITHMETIC) +#error "__ARM_FEATURE_BF16_VECTOR_ARITHMETIC not defined" +#endif +} "$flags -march=armv8.2-a+bf16"] } { +set et_arm_v8_2a_bf16_neon_flags "$flags -march=armv8.2-a+bf16" +return 1 +} +} + +return 0; +} + +proc check_effective_target_arm_v8_2a_bf16_neon_ok { } { +return [check_cached_effective_target arm_v8_2a_bf16_neon_ok \ +check_effective_target_arm_v8_2a_bf16_neon_ok_nocache] +} + +proc add_options_for_arm_v8_2a_bf16_neon { flags } { +if { ! [check_effective_target_arm_v8_2a_bf16_neon_ok] } { +return "$flags" +} +global et_arm_v8_2a_bf16_neon_flags +return "$flags $et_arm_v8_2a_bf16_neon_flags" +} + # Return 1 if the target supports executing ARMv8 NEON instructions, 0 # otherwise.
Re: Ping: [GCC][PATCH] Add ARM-specific Bfloat format support to middle-end
On 12/11/19 3:48 AM, Jeff Law wrote: > On Mon, 2019-12-09 at 13:40 +0000, Stam Markianos-Wright wrote: >> >> On 12/3/19 10:31 AM, Stam Markianos-Wright wrote: >>> >>> On 12/2/19 9:27 PM, Joseph Myers wrote: >>>> On Mon, 2 Dec 2019, Jeff Law wrote: >>>> >>>>>> 2019-11-13 Stam Markianos-Wright < >>>>>> stam.markianos-wri...@arm.com> >>>>>> >>>>>> * real.c (struct arm_bfloat_half_format, >>>>>> encode_arm_bfloat_half, decode_arm_bfloat_half): New. >>>>>> * real.h (arm_bfloat_half_format): New. >>>>>> >>>>>> >>>>> Generally OK. Please consider using "arm_bfloat_half" instead >>>>> of >>>>> "bfloat_half" for the name field in the arm_bfloat_half_format >>>>> structure. I'm not sure if that's really visible externally, >>>>> but it >>> Hi both! Agreed that we want to be conservative. See latest diff >>> attached with the name field change (also pasted below). >> >> .Ping :) > Sorry if I wasn't clear. WIth the name change I considered this OK for > the trunk. Please install on the trunk. > > If you don't have commit privs let me know. Ahh ok gotcha! Sorry I'm new here, and yes, I don't have commit privileges, yet! Cheers, Stam > > > Jeff >
[PING][PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)
On 12/2/19 4:43 PM, Stam Markianos-Wright wrote: > > > On 11/15/19 5:26 PM, Stam Markianos-Wright wrote: >> Pinging with more correct maintainers this time :) >> >> Also would need to backport to gcc7,8,9, but need to get this approved >> first! >> >> Thank you, >> Stam >> >> >> Forwarded Message >> Subject: Re: [PATCH][GCC][ARM] Arm generates out of range conditional >> branches in Thumb2 (PR91816) >> Date: Mon, 21 Oct 2019 10:37:09 +0100 >> From: Stam Markianos-Wright >> To: Ramana Radhakrishnan >> CC: gcc-patches@gcc.gnu.org , nd >> , James Greenhalgh , Richard >> Earnshaw >> >> >> >> On 10/13/19 4:23 PM, Ramana Radhakrishnan wrote: >>>> >>>> Patch bootstrapped and regression tested on arm-none-linux-gnueabihf, >>>> however, on my native Aarch32 setup the test times out when run as part >>>> of a big "make check-gcc" regression, but not when run individually. >>>> >>>> 2019-10-11 Stamatis Markianos-Wright >>>> >>>> * config/arm/arm.md: Update b for Thumb2 range checks. >>>> * config/arm/arm.c: New function arm_gen_far_branch. >>>> * config/arm/arm-protos.h: New function arm_gen_far_branch >>>> prototype. >>>> >>>> gcc/testsuite/ChangeLog: >>>> >>>> 2019-10-11 Stamatis Markianos-Wright >>>> >>>> * testsuite/gcc.target/arm/pr91816.c: New test. >>> >>>> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h >>>> index f995974f9bb..1dce333d1c3 100644 >>>> --- a/gcc/config/arm/arm-protos.h >>>> +++ b/gcc/config/arm/arm-protos.h >>>> @@ -570,4 +570,7 @@ void arm_parse_option_features (sbitmap, const >>>> cpu_arch_option *, >>>> void arm_initialize_isa (sbitmap, const enum isa_feature *); >>>> +const char * arm_gen_far_branch (rtx *, int,const char * , const >>>> char *); >>>> + >>>> + >>> >>> Lets get the nits out of the way. >>> >>> Unnecessary extra new line, need a space between int and const above. >>> >>> >> >> .Fixed! >> >>>> #endif /* ! GCC_ARM_PROTOS_H */ >>>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c >>>> index 39e1a1ef9a2..1a693d2ddca 100644 >>>> --- a/gcc/config/arm/arm.c >>>> +++ b/gcc/config/arm/arm.c >>>> @@ -32139,6 +32139,31 @@ arm_run_selftests (void) >>>> } >>>> } /* Namespace selftest. */ >>>> + >>>> +/* Generate code to enable conditional branches in functions over 1 >>>> MiB. */ >>>> +const char * >>>> +arm_gen_far_branch (rtx * operands, int pos_label, const char * dest, >>>> + const char * branch_format) >>> >>> Not sure if this is some munging from the attachment but check >>> vertical alignment of parameters. >>> >> >> .Fixed! >> >>>> +{ >>>> + rtx_code_label * tmp_label = gen_label_rtx (); >>>> + char label_buf[256]; >>>> + char buffer[128]; >>>> + ASM_GENERATE_INTERNAL_LABEL (label_buf, dest , \ >>>> + CODE_LABEL_NUMBER (tmp_label)); >>>> + const char *label_ptr = arm_strip_name_encoding (label_buf); >>>> + rtx dest_label = operands[pos_label]; >>>> + operands[pos_label] = tmp_label; >>>> + >>>> + snprintf (buffer, sizeof (buffer), "%s%s", branch_format , >>>> label_ptr); >>>> + output_asm_insn (buffer, operands); >>>> + >>>> + snprintf (buffer, sizeof (buffer), "b\t%%l0%d\n%s:", pos_label, >>>> label_ptr); >>>> + operands[pos_label] = dest_label; >>>> + output_asm_insn (buffer, operands); >>>> + return ""; >>>> +} >>>> + >>>> + >>> >>> Unnecessary extra newline. >>> >> >> .Fixed! >> >>>> #undef TARGET_RUN_TARGET_SELFTESTS >>>> #define TARGET_RUN_TARGET_SELFTESTS selftest::arm_run_selftests >>>> #endif /* CHECKING_P */ >>>> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md >>>> index f861c72ccfc..634fd0a59da 100644 >>>> --- a/gcc/config/arm/arm.md >>>> +++ b/gcc/config/arm/arm.md >>>> @@ -6686,9 +
Ping: [GCC][PATCH] Add ARM-specific Bfloat format support to middle-end
On 12/3/19 10:31 AM, Stam Markianos-Wright wrote: > > > On 12/2/19 9:27 PM, Joseph Myers wrote: >> On Mon, 2 Dec 2019, Jeff Law wrote: >> >>>> 2019-11-13 Stam Markianos-Wright >>>> >>>> * real.c (struct arm_bfloat_half_format, >>>> encode_arm_bfloat_half, decode_arm_bfloat_half): New. >>>> * real.h (arm_bfloat_half_format): New. >>>> >>>> >>> Generally OK. Please consider using "arm_bfloat_half" instead of >>> "bfloat_half" for the name field in the arm_bfloat_half_format >>> structure. I'm not sure if that's really visible externally, but it >> > Hi both! Agreed that we want to be conservative. See latest diff > attached with the name field change (also pasted below). .Ping :) > >> Isn't this the same format used by AVX512_BF16 / Intel DL Boost (albeit >> with Arm and Intel using different rounding modes)? > > Yes it is remarkably similar, but there's really only so much variation > you can have with what is half an f32! > > Cheers, > Stam > > >> > > > diff --git a/gcc/real.h b/gcc/real.h > index 0f660c9c671..2b337bb7f7d 100644 > --- a/gcc/real.h > +++ b/gcc/real.h > @@ -368,6 +368,7 @@ extern const struct real_format decimal_double_format; > extern const struct real_format decimal_quad_format; > extern const struct real_format ieee_half_format; > extern const struct real_format arm_half_format; > +extern const struct real_format arm_bfloat_half_format; > > > /* > == */ > diff --git a/gcc/real.c b/gcc/real.c > index 134240a6be9..07b63b6f27e 100644 > --- a/gcc/real.c > +++ b/gcc/real.c > @@ -4799,6 +4799,116 @@ decode_ieee_half (const struct real_format *fmt, > REAL_VALUE_TYPE *r, > } > } > > +/* Encode arm_bfloat types. */ > +static void > +encode_arm_bfloat_half (const struct real_format *fmt, long *buf, > + const REAL_VALUE_TYPE *r) > +{ > + unsigned long image, sig, exp; > + unsigned long sign = r->sign; > + bool denormal = (r->sig[SIGSZ-1] & SIG_MSB) == 0; > + > + image = sign << 15; > + sig = (r->sig[SIGSZ-1] >> (HOST_BITS_PER_LONG - 8)) & 0x7f; > + > + switch (r->cl) > + { > + case rvc_zero: > + break; > + > + case rvc_inf: > + if (fmt->has_inf) > + image |= 255 << 7; > + else > + image |= 0x7fff; > + break; > + > + case rvc_nan: > + if (fmt->has_nans) > + { > + if (r->canonical) > + sig = (fmt->canonical_nan_lsbs_set ? (1 << 6) - 1 : 0); > + if (r->signalling == fmt->qnan_msb_set) > + sig &= ~(1 << 6); > + else > + sig |= 1 << 6; > + if (sig == 0) > + sig = 1 << 5; > + > + image |= 255 << 7; > + image |= sig; > + } > + else > + image |= 0x7fff; > + break; > + > + case rvc_normal: > + if (denormal) > + exp = 0; > + else > + exp = REAL_EXP (r) + 127 - 1; > + image |= exp << 7; > + image |= sig; > + break; > + > + default: > + gcc_unreachable (); > + } > + > + buf[0] = image; > +} > + > +/* Decode arm_bfloat types. */ > +static void > +decode_arm_bfloat_half (const struct real_format *fmt, REAL_VALUE_TYPE *r, > + const long *buf) > +{ > + unsigned long image = buf[0] & 0x; > + bool sign = (image >> 15) & 1; > + int exp = (image >> 7) & 0xff; > + > + memset (r, 0, sizeof (*r)); > + image <<= HOST_BITS_PER_LONG - 8; > + image &= ~SIG_MSB; > + > + if (exp == 0) > + { > + if (image && fmt->has_denorm) > + { > + r->cl = rvc_normal; > + r->sign = sign; > + SET_REAL_EXP (r, -126); > + r->sig[SIGSZ-1] = image << 1; > + normalize (r); > + } > + else if (fmt->has_signed_zero) > + r->sign = sign; > + } > + else if (exp == 255 && (fmt->has_nans || fmt->has_inf)) > + { > + if (image) > + { > + r->cl = rvc_nan; > + r->sign = sign; > + r->signalling = (((image >> (HOST_BITS_PER_LONG - 2)) & 1) > + ^ fmt->qnan_msb_set); > + r->sig[SIGSZ-1] = image; > + } > + else > + { > + r->cl = rvc_inf; > + r->sign = sign; > + } > + } > + else > + { > + r->cl = rvc_normal; &g
Re: [GCC][PATCH] Add ARM-specific Bfloat format support to middle-end
On 12/2/19 9:27 PM, Joseph Myers wrote: > On Mon, 2 Dec 2019, Jeff Law wrote: > >>> 2019-11-13 Stam Markianos-Wright >>> >>> * real.c (struct arm_bfloat_half_format, >>> encode_arm_bfloat_half, decode_arm_bfloat_half): New. >>> * real.h (arm_bfloat_half_format): New. >>> >>> >> Generally OK. Please consider using "arm_bfloat_half" instead of >> "bfloat_half" for the name field in the arm_bfloat_half_format >> structure. I'm not sure if that's really visible externally, but it > Hi both! Agreed that we want to be conservative. See latest diff attached with the name field change (also pasted below). > Isn't this the same format used by AVX512_BF16 / Intel DL Boost (albeit > with Arm and Intel using different rounding modes)? Yes it is remarkably similar, but there's really only so much variation you can have with what is half an f32! Cheers, Stam > diff --git a/gcc/real.h b/gcc/real.h index 0f660c9c671..2b337bb7f7d 100644 --- a/gcc/real.h +++ b/gcc/real.h @@ -368,6 +368,7 @@ extern const struct real_format decimal_double_format; extern const struct real_format decimal_quad_format; extern const struct real_format ieee_half_format; extern const struct real_format arm_half_format; +extern const struct real_format arm_bfloat_half_format; /* == */ diff --git a/gcc/real.c b/gcc/real.c index 134240a6be9..07b63b6f27e 100644 --- a/gcc/real.c +++ b/gcc/real.c @@ -4799,6 +4799,116 @@ decode_ieee_half (const struct real_format *fmt, REAL_VALUE_TYPE *r, } } +/* Encode arm_bfloat types. */ +static void +encode_arm_bfloat_half (const struct real_format *fmt, long *buf, + const REAL_VALUE_TYPE *r) +{ + unsigned long image, sig, exp; + unsigned long sign = r->sign; + bool denormal = (r->sig[SIGSZ-1] & SIG_MSB) == 0; + + image = sign << 15; + sig = (r->sig[SIGSZ-1] >> (HOST_BITS_PER_LONG - 8)) & 0x7f; + + switch (r->cl) +{ +case rvc_zero: + break; + +case rvc_inf: + if (fmt->has_inf) + image |= 255 << 7; + else + image |= 0x7fff; + break; + +case rvc_nan: + if (fmt->has_nans) + { + if (r->canonical) + sig = (fmt->canonical_nan_lsbs_set ? (1 << 6) - 1 : 0); + if (r->signalling == fmt->qnan_msb_set) + sig &= ~(1 << 6); + else + sig |= 1 << 6; + if (sig == 0) + sig = 1 << 5; + + image |= 255 << 7; + image |= sig; + } + else + image |= 0x7fff; + break; + +case rvc_normal: + if (denormal) + exp = 0; + else + exp = REAL_EXP (r) + 127 - 1; + image |= exp << 7; + image |= sig; + break; + +default: + gcc_unreachable (); +} + + buf[0] = image; +} + +/* Decode arm_bfloat types. */ +static void +decode_arm_bfloat_half (const struct real_format *fmt, REAL_VALUE_TYPE *r, + const long *buf) +{ + unsigned long image = buf[0] & 0x; + bool sign = (image >> 15) & 1; + int exp = (image >> 7) & 0xff; + + memset (r, 0, sizeof (*r)); + image <<= HOST_BITS_PER_LONG - 8; + image &= ~SIG_MSB; + + if (exp == 0) +{ + if (image && fmt->has_denorm) + { + r->cl = rvc_normal; + r->sign = sign; + SET_REAL_EXP (r, -126); + r->sig[SIGSZ-1] = image << 1; + normalize (r); + } + else if (fmt->has_signed_zero) + r->sign = sign; +} + else if (exp == 255 && (fmt->has_nans || fmt->has_inf)) +{ + if (image) + { + r->cl = rvc_nan; + r->sign = sign; + r->signalling = (((image >> (HOST_BITS_PER_LONG - 2)) & 1) + ^ fmt->qnan_msb_set); + r->sig[SIGSZ-1] = image; + } + else + { + r->cl = rvc_inf; + r->sign = sign; + } +} + else +{ + r->cl = rvc_normal; + r->sign = sign; + SET_REAL_EXP (r, exp - 127 + 1); + r->sig[SIGSZ-1] = image | SIG_MSB; +} +} + /* Half-precision format, as specified in IEEE 754R. */ const struct real_format ieee_half_format = { @@ -4848,6 +4958,33 @@ const struct real_format arm_half_format = false, "arm_half" }; + +/* ARM Bfloat half-precision format. This format resembles a truncated + (16-bit) version of the 32-bit IEEE 754 single-precision floating-point + format. */ +const struct real_format arm_bfloat_half_format = + { +encode_arm_bfloat_half, +decode_arm_bfloat_half, +2, +8, +8, +-125, +128, +15, +15, +0, +false
[PING][PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)
On 11/15/19 5:26 PM, Stam Markianos-Wright wrote: > Pinging with more correct maintainers this time :) > > Also would need to backport to gcc7,8,9, but need to get this approved > first! > > Thank you, > Stam > > > Forwarded Message > Subject: Re: [PATCH][GCC][ARM] Arm generates out of range conditional > branches in Thumb2 (PR91816) > Date: Mon, 21 Oct 2019 10:37:09 +0100 > From: Stam Markianos-Wright > To: Ramana Radhakrishnan > CC: gcc-patches@gcc.gnu.org , nd , > James Greenhalgh , Richard Earnshaw > > > > > On 10/13/19 4:23 PM, Ramana Radhakrishnan wrote: >>> >>> Patch bootstrapped and regression tested on arm-none-linux-gnueabihf, >>> however, on my native Aarch32 setup the test times out when run as part >>> of a big "make check-gcc" regression, but not when run individually. >>> >>> 2019-10-11 Stamatis Markianos-Wright >>> >>> * config/arm/arm.md: Update b for Thumb2 range checks. >>> * config/arm/arm.c: New function arm_gen_far_branch. >>> * config/arm/arm-protos.h: New function arm_gen_far_branch >>> prototype. >>> >>> gcc/testsuite/ChangeLog: >>> >>> 2019-10-11 Stamatis Markianos-Wright >>> >>> * testsuite/gcc.target/arm/pr91816.c: New test. >> >>> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h >>> index f995974f9bb..1dce333d1c3 100644 >>> --- a/gcc/config/arm/arm-protos.h >>> +++ b/gcc/config/arm/arm-protos.h >>> @@ -570,4 +570,7 @@ void arm_parse_option_features (sbitmap, const >>> cpu_arch_option *, >>> void arm_initialize_isa (sbitmap, const enum isa_feature *); >>> +const char * arm_gen_far_branch (rtx *, int,const char * , const >>> char *); >>> + >>> + >> >> Lets get the nits out of the way. >> >> Unnecessary extra new line, need a space between int and const above. >> >> > > .Fixed! > >>> #endif /* ! GCC_ARM_PROTOS_H */ >>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c >>> index 39e1a1ef9a2..1a693d2ddca 100644 >>> --- a/gcc/config/arm/arm.c >>> +++ b/gcc/config/arm/arm.c >>> @@ -32139,6 +32139,31 @@ arm_run_selftests (void) >>> } >>> } /* Namespace selftest. */ >>> + >>> +/* Generate code to enable conditional branches in functions over 1 >>> MiB. */ >>> +const char * >>> +arm_gen_far_branch (rtx * operands, int pos_label, const char * dest, >>> + const char * branch_format) >> >> Not sure if this is some munging from the attachment but check >> vertical alignment of parameters. >> > > .Fixed! > >>> +{ >>> + rtx_code_label * tmp_label = gen_label_rtx (); >>> + char label_buf[256]; >>> + char buffer[128]; >>> + ASM_GENERATE_INTERNAL_LABEL (label_buf, dest , \ >>> + CODE_LABEL_NUMBER (tmp_label)); >>> + const char *label_ptr = arm_strip_name_encoding (label_buf); >>> + rtx dest_label = operands[pos_label]; >>> + operands[pos_label] = tmp_label; >>> + >>> + snprintf (buffer, sizeof (buffer), "%s%s", branch_format , >>> label_ptr); >>> + output_asm_insn (buffer, operands); >>> + >>> + snprintf (buffer, sizeof (buffer), "b\t%%l0%d\n%s:", pos_label, >>> label_ptr); >>> + operands[pos_label] = dest_label; >>> + output_asm_insn (buffer, operands); >>> + return ""; >>> +} >>> + >>> + >> >> Unnecessary extra newline. >> > > .Fixed! > >>> #undef TARGET_RUN_TARGET_SELFTESTS >>> #define TARGET_RUN_TARGET_SELFTESTS selftest::arm_run_selftests >>> #endif /* CHECKING_P */ >>> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md >>> index f861c72ccfc..634fd0a59da 100644 >>> --- a/gcc/config/arm/arm.md >>> +++ b/gcc/config/arm/arm.md >>> @@ -6686,9 +6686,16 @@ >>> ;; And for backward branches we have >>> ;; (neg_range - neg_base_offs + pc_offs) = (neg_range - (-2 or >>> -4) + 4). >>> ;; >>> +;; In 16-bit Thumb these ranges are: >>> ;; For a 'b' pos_range = 2046, neg_range = -2048 giving >>> (-2040->2048). >>> ;; For a 'b' pos_range = 254, neg_range = -256 giving (-250 >>> ->256). >>> +;; In 32-bit Thumb these ranges are: >>> +;; For a 'b' +/- 16MB is not
Ping: [GCC][PATCH] Add ARM-specific Bfloat format support to middle-end
On 11/25/19 2:54 PM, Stam Markianos-Wright wrote: > > On 11/15/19 12:02 PM, Stam Markianos-Wright wrote: >> Hi all, >> >> This patch adds support for a new real_format for ARM Brain Floating >> Point numbers to the middle end. This is to be used exclusively in the >> ARM back-end. >> >> The encode_arm_bfloat_half and decode_arm_bfloat_half functions are >> provided to satisfy real_format struct requirements, but are never >> intended to be called, which is why they are provided without an >> explicit test. >> >> Details on ARM Bfloat can be found here: >> https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a >> >> >> >> Regtested on aarch64-none-elf for sanity. >> >> Is this ok for trunk? > > Ping. >> >> Also, I do not have commit rights, so could someone commit this on my >> behalf? > > Ping. > > Thank you :) > >> >> Thank you! >> Stam Markianos-Wright >> >> >> 2019-11-13 Stam Markianos-Wright >> >> * real.c (struct arm_bfloat_half_format, >> encode_arm_bfloat_half, decode_arm_bfloat_half): New. >> * real.h (arm_bfloat_half_format): New. >> >>
Ping: [GCC][PATCH] Add ARM-specific Bfloat format support to middle-end
On 11/15/19 12:02 PM, Stam Markianos-Wright wrote: > Hi all, > > This patch adds support for a new real_format for ARM Brain Floating > Point numbers to the middle end. This is to be used exclusively in the > ARM back-end. > > The encode_arm_bfloat_half and decode_arm_bfloat_half functions are > provided to satisfy real_format struct requirements, but are never > intended to be called, which is why they are provided without an > explicit test. > > Details on ARM Bfloat can be found here: > https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a > > > > Regtested on aarch64-none-elf for sanity. > > Is this ok for trunk? Ping. > > Also, I do not have commit rights, so could someone commit this on my > behalf? Ping. Thank you :) > > Thank you! > Stam Markianos-Wright > > > 2019-11-13 Stam Markianos-Wright > > * real.c (struct arm_bfloat_half_format, > encode_arm_bfloat_half, decode_arm_bfloat_half): New. > * real.h (arm_bfloat_half_format): New. > >
[PING][PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)
Pinging with more correct maintainers this time :) Also would need to backport to gcc7,8,9, but need to get this approved first! Thank you, Stam Forwarded Message Subject: Re: [PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816) Date: Mon, 21 Oct 2019 10:37:09 +0100 From: Stam Markianos-Wright To: Ramana Radhakrishnan CC: gcc-patches@gcc.gnu.org , nd , James Greenhalgh , Richard Earnshaw On 10/13/19 4:23 PM, Ramana Radhakrishnan wrote: >> >> Patch bootstrapped and regression tested on arm-none-linux-gnueabihf, >> however, on my native Aarch32 setup the test times out when run as part >> of a big "make check-gcc" regression, but not when run individually. >> >> 2019-10-11 Stamatis Markianos-Wright >> >> * config/arm/arm.md: Update b for Thumb2 range checks. >> * config/arm/arm.c: New function arm_gen_far_branch. >> * config/arm/arm-protos.h: New function arm_gen_far_branch >> prototype. >> >> gcc/testsuite/ChangeLog: >> >> 2019-10-11 Stamatis Markianos-Wright >> >> * testsuite/gcc.target/arm/pr91816.c: New test. > >> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h >> index f995974f9bb..1dce333d1c3 100644 >> --- a/gcc/config/arm/arm-protos.h >> +++ b/gcc/config/arm/arm-protos.h >> @@ -570,4 +570,7 @@ void arm_parse_option_features (sbitmap, const >> cpu_arch_option *, >> >> void arm_initialize_isa (sbitmap, const enum isa_feature *); >> >> +const char * arm_gen_far_branch (rtx *, int,const char * , const char *); >> + >> + > > Lets get the nits out of the way. > > Unnecessary extra new line, need a space between int and const above. > > .Fixed! >> #endif /* ! GCC_ARM_PROTOS_H */ >> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c >> index 39e1a1ef9a2..1a693d2ddca 100644 >> --- a/gcc/config/arm/arm.c >> +++ b/gcc/config/arm/arm.c >> @@ -32139,6 +32139,31 @@ arm_run_selftests (void) >> } >> } /* Namespace selftest. */ >> >> + >> +/* Generate code to enable conditional branches in functions over 1 MiB. */ >> +const char * >> +arm_gen_far_branch (rtx * operands, int pos_label, const char * dest, >> +const char * branch_format) > > Not sure if this is some munging from the attachment but check > vertical alignment of parameters. > .Fixed! >> +{ >> + rtx_code_label * tmp_label = gen_label_rtx (); >> + char label_buf[256]; >> + char buffer[128]; >> + ASM_GENERATE_INTERNAL_LABEL (label_buf, dest , \ >> +CODE_LABEL_NUMBER (tmp_label)); >> + const char *label_ptr = arm_strip_name_encoding (label_buf); >> + rtx dest_label = operands[pos_label]; >> + operands[pos_label] = tmp_label; >> + >> + snprintf (buffer, sizeof (buffer), "%s%s", branch_format , label_ptr); >> + output_asm_insn (buffer, operands); >> + >> + snprintf (buffer, sizeof (buffer), "b\t%%l0%d\n%s:", pos_label, >> label_ptr); >> + operands[pos_label] = dest_label; >> + output_asm_insn (buffer, operands); >> + return ""; >> +} >> + >> + > > Unnecessary extra newline. > .Fixed! >> #undef TARGET_RUN_TARGET_SELFTESTS >> #define TARGET_RUN_TARGET_SELFTESTS selftest::arm_run_selftests >> #endif /* CHECKING_P */ >> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md >> index f861c72ccfc..634fd0a59da 100644 >> --- a/gcc/config/arm/arm.md >> +++ b/gcc/config/arm/arm.md >> @@ -6686,9 +6686,16 @@ >> ;; And for backward branches we have >> ;; (neg_range - neg_base_offs + pc_offs) = (neg_range - (-2 or -4) + 4). >> ;; >> +;; In 16-bit Thumb these ranges are: >> ;; For a 'b' pos_range = 2046, neg_range = -2048 giving >> (-2040->2048). >> ;; For a 'b' pos_range = 254, neg_range = -256 giving (-250 ->256). >> >> +;; In 32-bit Thumb these ranges are: >> +;; For a 'b' +/- 16MB is not checked for. >> +;; For a 'b' pos_range = 1048574, neg_range = -1048576 giving >> +;; (-1048568 -> 1048576). >> + >> + > > Unnecessary extra newline. > .Fixed! >> (define_expand "cbranchsi4" >> [(set (pc) (if_then_else >>(match_operator 0 "expandable_comparison_operator" >> @@ -6947,22 +6954,42 @@ >>(pc)))] >> "TARGET_32BIT" >> "* >> - if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2) >> -{ &
[GCC][PATCH] Add ARM-specific Bfloat format support to middle-end
Hi all, This patch adds support for a new real_format for ARM Brain Floating Point numbers to the middle end. This is to be used exclusively in the ARM back-end. The encode_arm_bfloat_half and decode_arm_bfloat_half functions are provided to satisfy real_format struct requirements, but are never intended to be called, which is why they are provided without an explicit test. Details on ARM Bfloat can be found here: https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a Regtested on aarch64-none-elf for sanity. Is this ok for trunk? Also, I do not have commit rights, so could someone commit this on my behalf? Thank you! Stam Markianos-Wright 2019-11-13 Stam Markianos-Wright * real.c (struct arm_bfloat_half_format, encode_arm_bfloat_half, decode_arm_bfloat_half): New. * real.h (arm_bfloat_half_format): New. diff --git a/gcc/real.h b/gcc/real.h index 0f660c9c671..2b337bb7f7d 100644 --- a/gcc/real.h +++ b/gcc/real.h @@ -368,6 +368,7 @@ extern const struct real_format decimal_double_format; extern const struct real_format decimal_quad_format; extern const struct real_format ieee_half_format; extern const struct real_format arm_half_format; +extern const struct real_format arm_bfloat_half_format; /* == */ diff --git a/gcc/real.c b/gcc/real.c index 90067f0087b..671a21241d8 100644 --- a/gcc/real.c +++ b/gcc/real.c @@ -4799,6 +4799,116 @@ decode_ieee_half (const struct real_format *fmt, REAL_VALUE_TYPE *r, } } +/* Encode arm_bfloat types. */ +static void +encode_arm_bfloat_half (const struct real_format *fmt, long *buf, + const REAL_VALUE_TYPE *r) +{ + unsigned long image, sig, exp; + unsigned long sign = r->sign; + bool denormal = (r->sig[SIGSZ-1] & SIG_MSB) == 0; + + image = sign << 15; + sig = (r->sig[SIGSZ-1] >> (HOST_BITS_PER_LONG - 8)) & 0x7f; + + switch (r->cl) +{ +case rvc_zero: + break; + +case rvc_inf: + if (fmt->has_inf) + image |= 255 << 7; + else + image |= 0x7fff; + break; + +case rvc_nan: + if (fmt->has_nans) + { + if (r->canonical) + sig = (fmt->canonical_nan_lsbs_set ? (1 << 6) - 1 : 0); + if (r->signalling == fmt->qnan_msb_set) + sig &= ~(1 << 6); + else + sig |= 1 << 6; + if (sig == 0) + sig = 1 << 5; + + image |= 255 << 7; + image |= sig; + } + else + image |= 0x7fff; + break; + +case rvc_normal: + if (denormal) + exp = 0; + else + exp = REAL_EXP (r) + 127 - 1; + image |= exp << 7; + image |= sig; + break; + +default: + gcc_unreachable (); +} + + buf[0] = image; +} + +/* Decode arm_bfloat types. */ +static void +decode_arm_bfloat_half (const struct real_format *fmt, REAL_VALUE_TYPE *r, + const long *buf) +{ + unsigned long image = buf[0] & 0x; + bool sign = (image >> 15) & 1; + int exp = (image >> 7) & 0xff; + + memset (r, 0, sizeof (*r)); + image <<= HOST_BITS_PER_LONG - 8; + image &= ~SIG_MSB; + + if (exp == 0) +{ + if (image && fmt->has_denorm) + { + r->cl = rvc_normal; + r->sign = sign; + SET_REAL_EXP (r, -126); + r->sig[SIGSZ-1] = image << 1; + normalize (r); + } + else if (fmt->has_signed_zero) + r->sign = sign; +} + else if (exp == 255 && (fmt->has_nans || fmt->has_inf)) +{ + if (image) + { + r->cl = rvc_nan; + r->sign = sign; + r->signalling = (((image >> (HOST_BITS_PER_LONG - 2)) & 1) + ^ fmt->qnan_msb_set); + r->sig[SIGSZ-1] = image; + } + else + { + r->cl = rvc_inf; + r->sign = sign; + } +} + else +{ + r->cl = rvc_normal; + r->sign = sign; + SET_REAL_EXP (r, exp - 127 + 1); + r->sig[SIGSZ-1] = image | SIG_MSB; +} +} + /* Half-precision format, as specified in IEEE 754R. */ const struct real_format ieee_half_format = { @@ -4848,6 +4958,33 @@ const struct real_format arm_half_format = false, "arm_half" }; + +/* ARM Bfloat half-precision format. This format resembles a truncated + (16-bit) version of the 32-bit IEEE 754 single-precision floating-point + format. */ +const struct real_format arm_bfloat_half_format = + { +encode_arm_bfloat_half, +decode_arm_bfloat_half, +2, +8, +8, +-125, +128, +15, +15, +0, +false, +true, +true, +true, +true, +true, +true, +false, +"bfloat_half" + }; + /* A synthetic "format" for internal arithmetic. It's the size of the internal significand minus the two bits needed for proper rounding.
Re: [PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)
On 10/13/19 4:23 PM, Ramana Radhakrishnan wrote: >> >> Patch bootstrapped and regression tested on arm-none-linux-gnueabihf, >> however, on my native Aarch32 setup the test times out when run as part >> of a big "make check-gcc" regression, but not when run individually. >> >> 2019-10-11 Stamatis Markianos-Wright >> >> * config/arm/arm.md: Update b for Thumb2 range checks. >> * config/arm/arm.c: New function arm_gen_far_branch. >> * config/arm/arm-protos.h: New function arm_gen_far_branch >> prototype. >> >> gcc/testsuite/ChangeLog: >> >> 2019-10-11 Stamatis Markianos-Wright >> >> * testsuite/gcc.target/arm/pr91816.c: New test. > >> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h >> index f995974f9bb..1dce333d1c3 100644 >> --- a/gcc/config/arm/arm-protos.h >> +++ b/gcc/config/arm/arm-protos.h >> @@ -570,4 +570,7 @@ void arm_parse_option_features (sbitmap, const >> cpu_arch_option *, >> >> void arm_initialize_isa (sbitmap, const enum isa_feature *); >> >> +const char * arm_gen_far_branch (rtx *, int,const char * , const char *); >> + >> + > > Lets get the nits out of the way. > > Unnecessary extra new line, need a space between int and const above. > > .Fixed! >> #endif /* ! GCC_ARM_PROTOS_H */ >> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c >> index 39e1a1ef9a2..1a693d2ddca 100644 >> --- a/gcc/config/arm/arm.c >> +++ b/gcc/config/arm/arm.c >> @@ -32139,6 +32139,31 @@ arm_run_selftests (void) >> } >> } /* Namespace selftest. */ >> >> + >> +/* Generate code to enable conditional branches in functions over 1 MiB. */ >> +const char * >> +arm_gen_far_branch (rtx * operands, int pos_label, const char * dest, >> +const char * branch_format) > > Not sure if this is some munging from the attachment but check > vertical alignment of parameters. > .Fixed! >> +{ >> + rtx_code_label * tmp_label = gen_label_rtx (); >> + char label_buf[256]; >> + char buffer[128]; >> + ASM_GENERATE_INTERNAL_LABEL (label_buf, dest , \ >> +CODE_LABEL_NUMBER (tmp_label)); >> + const char *label_ptr = arm_strip_name_encoding (label_buf); >> + rtx dest_label = operands[pos_label]; >> + operands[pos_label] = tmp_label; >> + >> + snprintf (buffer, sizeof (buffer), "%s%s", branch_format , label_ptr); >> + output_asm_insn (buffer, operands); >> + >> + snprintf (buffer, sizeof (buffer), "b\t%%l0%d\n%s:", pos_label, >> label_ptr); >> + operands[pos_label] = dest_label; >> + output_asm_insn (buffer, operands); >> + return ""; >> +} >> + >> + > > Unnecessary extra newline. > .Fixed! >> #undef TARGET_RUN_TARGET_SELFTESTS >> #define TARGET_RUN_TARGET_SELFTESTS selftest::arm_run_selftests >> #endif /* CHECKING_P */ >> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md >> index f861c72ccfc..634fd0a59da 100644 >> --- a/gcc/config/arm/arm.md >> +++ b/gcc/config/arm/arm.md >> @@ -6686,9 +6686,16 @@ >> ;; And for backward branches we have >> ;; (neg_range - neg_base_offs + pc_offs) = (neg_range - (-2 or -4) + 4). >> ;; >> +;; In 16-bit Thumb these ranges are: >> ;; For a 'b' pos_range = 2046, neg_range = -2048 giving >> (-2040->2048). >> ;; For a 'b' pos_range = 254, neg_range = -256 giving (-250 ->256). >> >> +;; In 32-bit Thumb these ranges are: >> +;; For a 'b' +/- 16MB is not checked for. >> +;; For a 'b' pos_range = 1048574, neg_range = -1048576 giving >> +;; (-1048568 -> 1048576). >> + >> + > > Unnecessary extra newline. > .Fixed! >> (define_expand "cbranchsi4" >> [(set (pc) (if_then_else >>(match_operator 0 "expandable_comparison_operator" >> @@ -6947,22 +6954,42 @@ >>(pc)))] >> "TARGET_32BIT" >> "* >> - if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2) >> -{ >> - arm_ccfsm_state += 2; >> - return \"\"; >> -} >> - return \"b%d1\\t%l0\"; >> + if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2) >> + { >> +arm_ccfsm_state += 2; >> +return \"\"; >> + } >> + switch (get_attr_length (insn)) >> + { >> +// Thumb2 16-bit b{cond} >> +case 2: >> + >> +// Thumb2 32-bit b{cond} >> +case 4: return \"b%d1\\t%l0\";break; >> + >> +// Thumb2 b{cond} out of range. Use unconditional branch. >> +case 8: return arm_gen_far_branch \ >> +(operands, 0, \"Lbcond\", \"b%D1\t\"); >> +break; >> + >> +// A32 b{cond} >> +default: return \"b%d1\\t%l0\"; >> + } > > Please fix indentation here. > .Fixed together with below changes. >> " >> [(set_attr "conds" "use") >> (set_attr "type" "branch") >> (set (attr "length") >> -(if_then_else >> - (and (match_test "TARGET_THUMB2") >> -(and (ge (minus (match_dup 0) (pc)) (const_int -250)) >> - (le (minus (match_dup 0) (pc)) (const_int 256 >> - (const_int 2) >> - (const_int 4)))] >> +(if_then_else (match_test
[PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)
Hi all, This is a patch for an issue where the compiler was generating a conditional branch in Thumb2, which was too far for b{cond} to handle. This was originally reported at binutils: https://sourceware.org/bugzilla/show_bug.cgi?id=24991 And then raised for GCC: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91816 As can be seen here: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/Cihfddaf.html the range of a 32-bit Thumb B{cond} is +/-1MB. This is now checked for in arm.md and an unconditional branch is generated if the jump would be greater than 1MB. New test has been written that checks this for: beq (if (a)), bne (if (a==1)) Patch bootstrapped and regression tested on arm-none-linux-gnueabihf, however, on my native Aarch32 setup the test times out when run as part of a big "make check-gcc" regression, but not when run individually. Patch also regression tested on arm-none-eabi, arm-none-linux-gnueabi with no issues. Also, I don't have commit rights yet, so could someone commit it on my behalf? Thanks, Stam Markianos-Wright gcc/ChangeLog: 2019-10-11 Stamatis Markianos-Wright * config/arm/arm.md: Update b for Thumb2 range checks. * config/arm/arm.c: New function arm_gen_far_branch. * config/arm/arm-protos.h: New function arm_gen_far_branch prototype. gcc/testsuite/ChangeLog: 2019-10-11 Stamatis Markianos-Wright * testsuite/gcc.target/arm/pr91816.c: New test. diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index f995974f9bb..1dce333d1c3 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -570,4 +570,7 @@ void arm_parse_option_features (sbitmap, const cpu_arch_option *, void arm_initialize_isa (sbitmap, const enum isa_feature *); +const char * arm_gen_far_branch (rtx *, int,const char * , const char *); + + #endif /* ! GCC_ARM_PROTOS_H */ diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 39e1a1ef9a2..1a693d2ddca 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -32139,6 +32139,31 @@ arm_run_selftests (void) } } /* Namespace selftest. */ + +/* Generate code to enable conditional branches in functions over 1 MiB. */ +const char * +arm_gen_far_branch (rtx * operands, int pos_label, const char * dest, + const char * branch_format) +{ + rtx_code_label * tmp_label = gen_label_rtx (); + char label_buf[256]; + char buffer[128]; + ASM_GENERATE_INTERNAL_LABEL (label_buf, dest , \ + CODE_LABEL_NUMBER (tmp_label)); + const char *label_ptr = arm_strip_name_encoding (label_buf); + rtx dest_label = operands[pos_label]; + operands[pos_label] = tmp_label; + + snprintf (buffer, sizeof (buffer), "%s%s", branch_format , label_ptr); + output_asm_insn (buffer, operands); + + snprintf (buffer, sizeof (buffer), "b\t%%l0%d\n%s:", pos_label, label_ptr); + operands[pos_label] = dest_label; + output_asm_insn (buffer, operands); + return ""; +} + + #undef TARGET_RUN_TARGET_SELFTESTS #define TARGET_RUN_TARGET_SELFTESTS selftest::arm_run_selftests #endif /* CHECKING_P */ diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index f861c72ccfc..634fd0a59da 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -6686,9 +6686,16 @@ ;; And for backward branches we have ;; (neg_range - neg_base_offs + pc_offs) = (neg_range - (-2 or -4) + 4). ;; +;; In 16-bit Thumb these ranges are: ;; For a 'b' pos_range = 2046, neg_range = -2048 giving (-2040->2048). ;; For a 'b' pos_range = 254, neg_range = -256 giving (-250 ->256). +;; In 32-bit Thumb these ranges are: +;; For a 'b' +/- 16MB is not checked for. +;; For a 'b' pos_range = 1048574, neg_range = -1048576 giving +;; (-1048568 -> 1048576). + + (define_expand "cbranchsi4" [(set (pc) (if_then_else (match_operator 0 "expandable_comparison_operator" @@ -6947,22 +6954,42 @@ (pc)))] "TARGET_32BIT" "* - if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2) -{ - arm_ccfsm_state += 2; - return \"\"; -} - return \"b%d1\\t%l0\"; + if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2) + { + arm_ccfsm_state += 2; + return \"\"; + } + switch (get_attr_length (insn)) + { + // Thumb2 16-bit b{cond} + case 2: + + // Thumb2 32-bit b{cond} + case 4: return \"b%d1\\t%l0\";break; + + // Thumb2 b{cond} out of range. Use unconditional branch. + case 8: return arm_gen_far_branch \ + (operands, 0, \"Lbcond\", \"b%D1\t\"); + break; + + // A32 b{cond} + default: return \"b%d1\\t%l0\"; + } " [(set_attr "conds" "use") (set_attr "type" "branch") (set (attr "length") - (if_then_else - (and (match_test "TARGET_THUMB2") - (and (ge (minus (match_dup 0) (pc)) (const_int -250)) -
[GCC][PATCH][AArch64] Update hwcap string for fp16fml in aarch64-option-extensions.def
Hi all, This is a minor patch that fixes the entry for the fp16fml feature in GCC's aarch64-option-extensions.def. As can be seen in the Linux sources here https://github.com/torvalds/linux/blob/master/arch/arm64/kernel/cpuinfo.c#L69 the correct string is "asimdfhm", not "asimdfml". Cross-compiled and tested on aarch64-none-linux-gnu. Is this ok for trunk? Also, I don't have commit rights, so could someone commit it on my behalf? Thanks, Stam Markianos-Wright The diff is: diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def index 9919edd43d0..60e8f28fff5 100644 --- a/gcc/config/aarch64/aarch64-option-extensions.def +++ b/gcc/config/aarch64/aarch64-option-extensions.def @@ -135,7 +135,7 @@ AARCH64_OPT_EXTENSION("sm4", AARCH64_FL_SM4, AARCH64_FL_SIMD, \ /* Enabling "fp16fml" also enables "fp" and "fp16". Disabling "fp16fml" just disables "fp16fml". */ AARCH64_OPT_EXTENSION("fp16fml", AARCH64_FL_F16FML, \ - AARCH64_FL_FP | AARCH64_FL_F16, 0, false, "asimdfml") + AARCH64_FL_FP | AARCH64_FL_F16, 0, false, "asimdfhm") /* Enabling "sve" also enables "fp16", "fp" and "simd". Disabling "sve" disables "sve", "sve2", "sve2-aes", "sve2-sha3", "sve2-sm4" gcc/ChangeLog: 2019-09-09 Stamatis Markianos-Wright * config/aarch64/aarch64-option-extensions.def: Updated hwcap string for fp16fml. diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def index 9919edd43d0..60e8f28fff5 100644 --- a/gcc/config/aarch64/aarch64-option-extensions.def +++ b/gcc/config/aarch64/aarch64-option-extensions.def @@ -135,7 +135,7 @@ AARCH64_OPT_EXTENSION("sm4", AARCH64_FL_SM4, AARCH64_FL_SIMD, \ /* Enabling "fp16fml" also enables "fp" and "fp16". Disabling "fp16fml" just disables "fp16fml". */ AARCH64_OPT_EXTENSION("fp16fml", AARCH64_FL_F16FML, \ - AARCH64_FL_FP | AARCH64_FL_F16, 0, false, "asimdfml") + AARCH64_FL_FP | AARCH64_FL_F16, 0, false, "asimdfhm") /* Enabling "sve" also enables "fp16", "fp" and "simd". Disabling "sve" disables "sve", "sve2", "sve2-aes", "sve2-sha3", "sve2-sm4"