[PATCH 3/3] AArch64: Enable dispatch scheduling for Neoverse V2.

Jennifer Schmitz Tue, 29 Jul 2025 08:14:44 -0700

This patch adds dispatch constraints for Neoverse V2 and illustrates the steps
necessary to enable dispatch scheduling for an AArch64 core.


The dispatch constraints are based on section 4.1 of the Neoverse V2 SWOG.
Please note that the values used here deviate slightly from the current SWOG
version but are based on correct numbers. Arm will do an official Neoverse V2
SWOG release with the updated values in due time.

Here are the steps how we implemented the dispatch constraints for
Neoverse V2:
1. We used instruction attributes to group instructions into dispatch groups,
   corresponding to operations that utilize a certain pipeline type. For that,
   we added a new attribute (neoversev2_dispatch) with values for the
   different dispatch groups. The values of neoversev2_dispatch are determined
   using expressions of other instruction attributes.
   For example, the SWOG describes a constraint of "Up to 4 uOPs utilizing the
   M pipelines". Thus, one of the values of neoversev2_dispatch is "m" and it
   groups instructions that use the M pipelines such as integer multiplication.
   Note that we made some minor simplifications compared to the information
   in the SWOG, because the instruction annotation does not allow for a fully
   accurate mapping of instructions to utilized pipelines. To give one example,
   the instructions IRG and LDG are both tagged with "memtag", but IRG uses
   the M pipelines, while LDG uses the L pipelines.
2. In the Neoverse V2 tuning model, we added an array of dispatch_constraint
   objects and referenced it in the tune_params. The new attribute
   neoversev2_dispatch provided a compact way to define the dispatch
   constraints.
3. We enabled dispatch scheduling for Neoverse V2 by adding the
   AARCH64_EXTRA_TUNE_DISPATCH_SCHED tune flag.

Performance evaluation on Grace machine using SPEC2017 and GROMACS2024:
We ran each benchmark 5 times compiled with trunk (commit a1fb757) and with
the patch series and computed the speed-up for the median values per
test (i.e. values >1 mean that the patch series improves performance):

SPEC2017 FP (-O3 -Wl,-z,muldefs -lm -fallow-argument-mismatch -fpermissive
             -fstack-arrays -flto=auto -Wl,--sort-section=name -march=native
             -mcpu=neoverse-v2 -std=gnu17):
Geom. mean of speed-ups         1.0006
blender                         1.0008
bwaves                          0.9996
cactuBSSN                       1.0007
fotonik3d                       1.0002
imagick                         0.9999
lbm                             1.0016
nab                             1.0012
namd                            1.0002
parest                          1.0004
povray                          1.0029
roms                            1.0000
wrf                             1.0003

SPEC2017 INT (same as SPEC2017 FP):
Geom. mean of speed-ups         0.9994
deepsjeng                       0.9991
gcc                             1.0024
leela                           0.9985
mcf                             0.9985
exchange2                       1.0000
omnetpp                         1.0005
perlbench                       0.9975
x264                            1.0032
xalancbmk                       0.9916
xz                              1.0032

GROMACS 2024 (-O3 -Wl,-z,muldefs -lm -flto=auto -Wl,--sort-section=name
              -march=native -mcpu=neoverse-v2)
Geom. mean of speed-ups:                     1.0024
22vs23_cut_arm_neon_asimd_cpu_perf           1.0005
22vs23_cut_arm_sve_cpu_perf                  1.0153
22vs23_fsw_arm_neon_asimd_cpu_perf           1.0107
22vs23_fsw_arm_sve_cpu_perf                  1.0156
22vs23_ljpme-geom_arm_neon_asimd_cpu_perf    1.0081
22vs23_ljpme-geom_arm_sve_cpu_perf           1.0024
22vs23_ljpme-lb_arm_neon_asimd_cpu_perf      1.0068
22vs23_ljpme-lb_arm_sve_cpu_perf             0.9957
22vs23_psh_arm_neon_asimd_cpu_perf           0.9957
22vs23_psh_arm_sve_cpu_perf                  0.9885
22vs23_psw_arm_neon_asimd_cpu_perf           0.9983
22vs23_psw_arm_sve_cpu_perf                  1.0024
22vs23_rf_arm_neon_asimd_cpu_perf            0.9976
22vs23_rf_arm_sve_cpu_perf                   0.9916

The effect of the patch series on compile times was evaluated by
comparing the compile times of insn-emit-1.cc. Speed-up for the median
values of 5 repetitions: 1.0001

Any help with further performance evaluation would be greatly appreciated.

The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.

Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com>

gcc/ChangeLog:

        * config/aarch64/aarch64.md: Include neoversev2.md.
        * config/aarch64/tuning_models/neoversev2.h: Enable dispatch
        scheduling and add dispatch constraints.
        * config/aarch64/neoversev2.md: New file and new instruction attribute
        neoversev2_dispatch.
---
 gcc/config/aarch64/aarch64.md                 |   3 +
 gcc/config/aarch64/neoversev2.md              | 192 ++++++++++++++++++
 gcc/config/aarch64/tuning_models/neoversev2.h | 102 +++++++++-
 3 files changed, 294 insertions(+), 3 deletions(-)
 create mode 100644 gcc/config/aarch64/neoversev2.md

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index fc9c819b864..bceaf40ae97 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -672,6 +672,9 @@
 (include "tsv110.md")
 (include "thunderx3t110.md")
 
+;; Dispatch scheduling
+(include "neoversev2.md")
+
 ;; -------------------------------------------------------------------
 ;; Jumps and other miscellaneous insns
 ;; -------------------------------------------------------------------
diff --git a/gcc/config/aarch64/neoversev2.md b/gcc/config/aarch64/neoversev2.md
new file mode 100644
index 00000000000..8dc9b098d09
--- /dev/null
+++ b/gcc/config/aarch64/neoversev2.md
@@ -0,0 +1,192 @@
+;; Instruction attribute for dispatch scheduling for Neoverse V2.
+;; Copyright The GNU Toolchain Authors.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; Attribute that groups other instruction attributes into dispatch groups
+;; for Neoverse V2 cores.  Dispatch groups are groups of pipelines for which
+;; the SWOG specifies a dispatch constraint.  For example: Because the SWOG
+;; contains a dispatch constraint for the V02 pipelines, there is an attribute
+;; value "v02" that groups instructions that are processed by the V0 and V2
+;; pipelines.
+;; Values that contain a "_" represent combinations of dispatch groups.
+;; For example, there are dispatch constraints for the M0 and V pipelines.
+;; The value "m0_v" groups instructions that utilize the M0 as well as the
+;; V pipelines, such that both dispatch constraints apply.
+
+(define_attr "neoversev2_dispatch"
+  "none,bs01,bsm,m,m0,v02,v13,v,l01,l,bsm_l,m_l,m0_v,v_v13,v_l,\
+   l01_d,l01_v"
+  (cond [(eq_attr "type" "branch,call")
+        (const_string "bs01")
+        (ior
+          (eq_attr "type" "adc_reg,alu_ext,alu_imm,alu_sreg,alus_ext,\
+           alus_imm,alus_sreg,clz,csel,logic_imm,logic_reg,logics_imm,\
+           logics_reg,mov_imm,rbit,rev,shift_reg")
+          (eq_attr "sve_type" "sve_pred_cnt_scalar"))
+        (const_string "bsm")
+        (ior
+          (eq_attr "type" "alu_ext,alus_ext,bfm,bfx,mul,rotate_imm,\
+           smull,umull")
+          (eq_attr "autodetect_type" "alu_shift_asr_op2,alu_shift_lsl_op2,\
+           alu_shift_lsr_op2")
+          (eq_attr "sve_type" "sve_pred_cnt_ctrl,sve_pred_misc"))
+        (const_string "m")
+        (ior
+          (eq_attr "type" "crc,f_cvti2f,mla,neon_from_gp,neon_from_gp_q,\
+           sdiv,smlal,udiv,umlal")
+          (eq_attr "sve_type" "sve_ffr,sve_pred_logical"))
+        (const_string "m0")
+        (ior
+          (eq_attr "type"
+           "crypto_sha256_slow,crypto_sha3,crypto_sha512,crypto_sm3,\
+            crypto_sm4,f_rintd,f_rints,fccmpd,fccmps,fcmpd,fcmps,fdivd,\
+            fdivs,fsqrtd,fsqrts,neon_fp_cvt_narrow_d_q,\
+            neon_fp_cvt_narrow_s_q,neon_fp_cvt_widen_h,neon_fp_cvt_widen_s,\
+            neon_fp_div_d,neon_fp_div_d_q,neon_fp_div_s,neon_fp_div_s_q,\
+            neon_fp_recpe_d,neon_fp_recpe_d_q,neon_fp_recpe_s,\
+            neon_fp_recpe_s_q,neon_fp_recps_d,neon_fp_recps_d_q,\
+            neon_fp_recps_s,neon_fp_recps_s_q,neon_fp_recpx_d,\
+            neon_fp_recpx_d_q,neon_fp_recpx_s,neon_fp_recpx_s_q,\
+            neon_fp_round_d,neon_fp_round_d_q,neon_fp_round_s,\
+            neon_fp_round_s_q,neon_fp_rsqrte_d,neon_fp_rsqrte_d_q,\
+            neon_fp_rsqrte_s,neon_fp_rsqrte_s_q,neon_fp_rsqrts_d,\
+            neon_fp_rsqrts_d_q,neon_fp_rsqrts_s,neon_fp_rsqrts_s_q,\
+            neon_fp_sqrt_d,neon_fp_sqrt_d_q,neon_fp_sqrt_s,\
+            neon_fp_sqrt_s_q,neon_fp_to_int_d,neon_fp_to_int_d_q,\
+            neon_fp_to_int_s,neon_fp_to_int_s_q,neon_int_to_fp_d,\
+            neon_int_to_fp_d_q,neon_int_to_fp_s,neon_int_to_fp_s_q,\
+            neon_mla_b,neon_mla_b_q,neon_mla_h,neon_mla_h_q,\
+            neon_mla_s,neon_mla_s_q,neon_mla_b_long,neon_mla_h_long,\
+            neon_mla_h_scalar,neon_mla_h_scalar_q,neon_mla_s_long,\
+            neon_mla_s_scalar,neon_mla_s_scalar_q,neon_mla_h_scalar_long,\
+            neon_mla_s_scalar_long,neon_mul_b,neon_mul_b_q,\
+            neon_mul_d_long,neon_mul_h,neon_mul_h_q,neon_mul_h_long,\
+            neon_mul_h_scalar,neon_mul_h_scalar_q,neon_mul_h_scalar_long,\
+            neon_mul_s,neon_mul_s_q,neon_mul_s_long,neon_mul_s_scalar,\
+            neon_mul_s_scalar_q,neon_mul_s_scalar_long,neon_sat_mla_b_long,\
+            neon_sat_mla_h_long,neon_sat_mla_h_scalar_long,\
+            neon_sat_mla_s_long,neon_sat_mla_s_scalar_long,\
+            neon_sat_mul_b,neon_sat_mul_b_q,neon_sat_mul_b_long,\
+            neon_sat_mul_h,neon_sat_mul_h_q,neon_sat_mul_h_long,\
+            neon_sat_mul_h_scalar,neon_sat_mul_h_scalar_q,\
+            neon_sat_mul_h_scalar_long,neon_sat_mul_s,neon_sat_mul_s_q,\
+            neon_sat_mul_s_long,neon_sat_mul_s_scalar,\
+            neon_sat_mul_s_scalar_q,neon_sat_mul_s_scalar_long")
+          (eq_attr "sve_type"
+           "sve_crypto_sha3,sve_fp_cmp,sve_fp_cvt,sve_fp_div,sve_fp_log,\
+            sve_fp_sqrt,sve_int_cvt,sve_int_div,sve_int_dot,sve_int_index,\
+            sve_int_mul,sve_int_recip_est"))
+        (const_string "v02")
+        (ior
+          (eq_attr "type"
+           "neon_arith_acc,neon_arith_acc_q,neon_reduc_add,\
+            neon_reduc_add_long,neon_reduc_add_q,neon_reduc_minmax,\
+            neon_reduc_minmax_q,neon_sat_shift_imm,\
+            neon_sat_shift_imm_narrow_q,neon_sat_shift_imm_q,\
+            neon_sat_shift_reg,neon_sat_shift_reg_q,neon_shift_acc,\
+            neon_shift_acc_q,neon_shift_imm,neon_shift_imm_long,\
+            neon_shift_imm_narrow_q,neon_shift_imm_q,neon_shift_reg,\
+            neon_shift_reg_q")
+          (eq_attr "sve_type"
+           "sve_fp_assoc_add,sve_fp_exp,sve_int_accum,sve_int_bit_perm,\
+            sve_int_extend,sve_int_extract,sve_int_shift"))
+        (const_string "v13")
+        (ior
+          (eq_attr "type" "crypto_pmull,f_cvt,f_cvtf2i,f_minmaxd,f_minmaxs,\
+           faddd,fadds,fconstd,fconsts,fcsel,ffarithd,ffariths,fmacd,fmacs,\
+           fmov,fmuld,fmuls,f_mcr,f_mrc,neon_abd,\
+           neon_abd_long,neon_abd_q,neon_abs,neon_abs_q,neon_add,\
+           neon_add_halve,neon_add_halve_narrow_q,neon_add_halve_q,\
+           neon_add_long,neon_add_q,neon_add_widen,neon_bsl,neon_bsl_q,\
+           neon_cls,neon_cls_q,neon_cnt,neon_cnt_q,neon_compare,\
+           neon_compare_q,neon_compare_zero,neon_compare_zero_q,\
+           neon_dup,neon_dup_q,neon_ext,neon_ext_q,neon_fcadd,neon_fcmla,\
+           neon_fp_abd_d,neon_fp_abd_d_q,neon_fp_abd_s,neon_fp_abd_s_q,\
+           neon_fp_abs_d,neon_fp_abs_d_q,neon_fp_abs_s,neon_fp_abs_s_q,\
+           neon_fp_addsub_d,neon_fp_addsub_d_q,neon_fp_addsub_s,\
+           neon_fp_addsub_s_q,neon_fp_compare_d,neon_fp_compare_d_q,\
+           neon_fp_compare_s,neon_fp_compare_s_q,neon_fp_minmax_d,\
+           neon_fp_minmax_d_q,neon_fp_minmax_s,neon_fp_minmax_s_q,\
+           neon_fp_mla_d,neon_fp_mla_d_q,neon_fp_mla_d_scalar_q,\
+           neon_fp_mla_s,neon_fp_mla_s_q,neon_fp_mla_s_scalar,\
+           neon_fp_mla_s_scalar_q,neon_fp_mul_d,neon_fp_mul_d_q,\
+           neon_fp_mul_d_scalar_q,neon_fp_mul_s,neon_fp_mul_s_q,\
+           neon_fp_mul_s_scalar,neon_fp_mul_s_scalar_q,neon_fp_neg_d,\
+           neon_fp_neg_d_q,neon_fp_neg_s,neon_fp_neg_s_q,neon_fp_reduc_add_d,\
+           neon_fp_reduc_add_d_q,neon_fp_reduc_add_s,neon_fp_reduc_add_s_q,\
+           neon_fp_reduc_minmax_d,neon_fp_reduc_minmax_d_q,\
+           neon_fp_reduc_minmax_s,neon_fp_reduc_minmax_s_q,neon_logic,\
+           neon_logic_q,neon_minmax,neon_minmax_q,neon_move,\
+           neon_move_narrow_q,neon_move_q,neon_neg,neon_neg_q,neon_permute,\
+           neon_permute_q,neon_qabs,neon_qabs_q,neon_qadd,neon_qadd_q,\
+           neon_qneg,neon_qneg_q,neon_qsub,neon_qsub_q,neon_rbit,\
+           neon_rbit_q,neon_rev,neon_rev_q,neon_sub,neon_sub_halve,\
+           neon_sub_halve_narrow_q,neon_sub_halve_q,neon_sub_long,\
+           neon_sub_q,neon_sub_widen,neon_tbl1,neon_tbl1_q,neon_tbl2,\
+           neon_tbl2_q,neon_tbl3,neon_tbl3_q,neon_tbl4,neon_tbl4_q,\
+           neon_to_gp,neon_to_gp_q,neon_tst,neon_tst_q,neon_zip,\
+           neon_zip_q")
+          (eq_attr "sve_type" "sve_fp_arith,sve_fp_misc,sve_fp_mul,\
+           sve_fp_reduc,sve_int_general,sve_int_pmul"))
+        (const_string "v")
+        (eq_attr "sve_type" "sve_store_pred")
+        (const_string "l01")
+        (ior
+          (eq_attr "type" "neon_ldp,neon_ldp_q,neon_load1_1reg,\
+           neon_load1_1reg_q,neon_load1_2reg,neon_load1_2reg_q,\
+           neon_load1_3reg,neon_load1_3reg_q,neon_load1_4reg,\
+           neon_load1_4reg_q")
+          (eq_attr "sve_type" "sve_load_1reg"))
+        (const_string "l")
+        (eq_attr "type" "f_loadd,f_loads")
+        (const_string "bsm_l")
+        (eq_attr "sve_type" "sve_load_pred")
+        (const_string "m_l")
+        (ior
+          (eq_attr "type" "neon_ins,neon_ins_q")
+          (eq_attr "sve_type" "sve_int_cmp_set,sve_int_match,sve_pred_vec"))
+        (const_string "m0_v")
+        (eq_attr "sve_type" "sve_int_reduc")
+        (const_string "v_v13")
+        (ior
+          (eq_attr "type" "neon_load1_all_lanes,neon_load1_one_lane,\
+           neon_load1_one_lane_q,neon_load2_2reg,neon_load2_2reg_q,\
+           neon_load2_all_lanes,neon_load2_all_lanes_q,neon_load2_one_lane,\
+           neon_load3_3reg,neon_load3_3reg_q,neon_load3_all_lanes,\
+           neon_load3_all_lanes_q,neon_load3_one_lane,neon_load4_4reg,\
+           neon_load4_4reg_q,neon_load4_all_lanes,neon_load4_all_lanes_q,\
+           neon_load4_one_lane")
+          (eq_attr "sve_type" "sve_gatherload_32,sve_gatherload_64,\
+           sve_load_2reg,sve_load_3reg,sve_load_4reg"))
+        (const_string "v_l")
+        (eq_attr "type" "load_16,load_4,load_8,store_16,store_4,store_8")
+        (const_string "l01_d")
+        (ior
+          (eq_attr "type" "f_stored,f_stores,neon_stp,neon_stp_q,\
+           neon_store1_1reg,neon_store1_1reg_q,neon_store1_2reg,\
+           neon_store1_2reg_q,neon_store1_3reg,neon_store1_3reg_q,\
+           neon_store1_4reg,neon_store1_4reg_q,neon_store1_one_lane,\
+           neon_store1_one_lane_q,neon_store2_2reg,neon_store2_2reg_q,\
+           neon_store2_one_lane,neon_store2_one_lane_q,neon_store3_3reg,\
+           neon_store3_3reg_q,neon_store3_one_lane,neon_store3_one_lane_q,\
+           neon_store4_4reg,neon_store4_4reg_q,neon_store4_one_lane,\
+           neon_store4_one_lane_q")
+          (eq_attr "sve_type" "sve_scatterstore_32,sve_scatterstore_64,\
+           sve_store_1reg,sve_store_2reg,sve_store_3reg,sve_store_4reg"))
+        (const_string "l01_v")]
+       (const_string "none")))
\ No newline at end of file
diff --git a/gcc/config/aarch64/tuning_models/neoversev2.h 
b/gcc/config/aarch64/tuning_models/neoversev2.h
index faf06d8e7ed..c3749d0c194 100644
--- a/gcc/config/aarch64/tuning_models/neoversev2.h
+++ b/gcc/config/aarch64/tuning_models/neoversev2.h
@@ -21,6 +21,7 @@
 #define GCC_AARCH64_H_NEOVERSEV2
 
 #include "generic.h"
+#include "../aarch64-sched-dispatch.h"
 
 static const struct cpu_regmove_cost neoversev2_regmove_cost =
 {
@@ -188,6 +189,100 @@ static const struct cpu_vector_cost 
neoversev2_vector_cost =
   &neoversev2_vec_issue_info /* issue_info  */
 };
 
+/* Neoverse V2 dispatch constraints for instruction scheduling.  */
+static const dispatch_constraint neoversev2_dispatch_constraints[] = {
+  dispatch_constraint ("total", 16, [](rtx_insn *)
+    {
+      return 1;
+    }),
+  dispatch_constraint ("b_s01", 4, [](rtx_insn *insn)
+    {
+      auto dispatch_group = get_attr_neoversev2_dispatch (insn);
+      return (int)(dispatch_group == NEOVERSEV2_DISPATCH_BS01);
+    }),
+  dispatch_constraint ("m0", 2, [](rtx_insn *insn)
+    {
+      auto dispatch_group = get_attr_neoversev2_dispatch (insn);
+      return (int)(dispatch_group == NEOVERSEV2_DISPATCH_M0
+                  || dispatch_group == NEOVERSEV2_DISPATCH_M0_V);
+    }),
+  dispatch_constraint ("m", 4, [](rtx_insn *insn)
+    {
+      auto dispatch_group = get_attr_neoversev2_dispatch (insn);
+      return (int)(dispatch_group == NEOVERSEV2_DISPATCH_M
+                  || dispatch_group == NEOVERSEV2_DISPATCH_M0
+                  || dispatch_group == NEOVERSEV2_DISPATCH_M_L
+                  || dispatch_group == NEOVERSEV2_DISPATCH_M0_V);
+    }),
+  dispatch_constraint ("b_s_m", 8, [](rtx_insn *insn)
+    {
+      auto dispatch_group = get_attr_neoversev2_dispatch (insn);
+      return (int)(dispatch_group == NEOVERSEV2_DISPATCH_BS01
+                  || dispatch_group == NEOVERSEV2_DISPATCH_BSM
+                  || dispatch_group == NEOVERSEV2_DISPATCH_M
+                  || dispatch_group == NEOVERSEV2_DISPATCH_M0
+                  || dispatch_group == NEOVERSEV2_DISPATCH_BSM_L
+                  || dispatch_group == NEOVERSEV2_DISPATCH_M_L
+                  || dispatch_group == NEOVERSEV2_DISPATCH_M0_V);
+    }),
+  dispatch_constraint ("v02", 2, [](rtx_insn *insn)
+    {
+      auto dispatch_group = get_attr_neoversev2_dispatch (insn);
+      return (int)(dispatch_group == NEOVERSEV2_DISPATCH_V02);
+    }),
+  dispatch_constraint ("v13", 2, [](rtx_insn *insn)
+    {
+      auto dispatch_group = get_attr_neoversev2_dispatch (insn);
+      return (int)(dispatch_group == NEOVERSEV2_DISPATCH_V13);
+    }),
+  dispatch_constraint ("v", 4, [](rtx_insn *insn)
+    {
+      auto dispatch_group = get_attr_neoversev2_dispatch (insn);
+      switch (dispatch_group) {
+       case NEOVERSEV2_DISPATCH_V02:
+       case NEOVERSEV2_DISPATCH_V13:
+       case NEOVERSEV2_DISPATCH_V:
+       case NEOVERSEV2_DISPATCH_M0_V:
+       case NEOVERSEV2_DISPATCH_V_L:
+       case NEOVERSEV2_DISPATCH_L01_V:
+         return 1;
+       case NEOVERSEV2_DISPATCH_V_V13:
+         return 2;
+       default:
+         return 0;
+      }
+    }),
+  dispatch_constraint ("l01_d", 4, [](rtx_insn *insn)
+    {
+      auto dispatch_group = get_attr_neoversev2_dispatch (insn);
+      switch (dispatch_group) {
+       case NEOVERSEV2_DISPATCH_L01_V:
+       case NEOVERSEV2_DISPATCH_L01:
+         return 1;
+       case NEOVERSEV2_DISPATCH_L01_D:
+         return 2;
+       default:
+         return 0;
+      }
+    }),
+  dispatch_constraint ("l", 6, [](rtx_insn *insn)
+    {
+      auto dispatch_group = get_attr_neoversev2_dispatch (insn);
+      switch (dispatch_group) {
+       case NEOVERSEV2_DISPATCH_L:
+       case NEOVERSEV2_DISPATCH_BSM_L:
+       case NEOVERSEV2_DISPATCH_M_L:
+       case NEOVERSEV2_DISPATCH_V_L:
+       case NEOVERSEV2_DISPATCH_L01_V:
+         return 1;
+       case NEOVERSEV2_DISPATCH_L01_D:
+         return 2;
+       default:
+         return 0;
+      }
+    })
+};
+
 static const struct tune_params neoversev2_tunings =
 {
   &cortexa76_extra_costs,
@@ -221,12 +316,13 @@ static const struct tune_params neoversev2_tunings =
    | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
    | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
    | AARCH64_EXTRA_TUNE_AVOID_PRED_RMW
-   | AARCH64_EXTRA_TUNE_AVOID_LDAPUR), /* tune_flags.  */
+   | AARCH64_EXTRA_TUNE_AVOID_LDAPUR
+   | AARCH64_EXTRA_TUNE_DISPATCH_SCHED),       /* tune_flags.  */
   &generic_armv9a_prefetch_tune,
   AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
   AARCH64_LDP_STP_POLICY_ALWAYS,   /* stp_policy_model.  */
-  nullptr,     /* dispatch_constraints.  */
-  0            /* num_dispatch_constraints.  */
+  neoversev2_dispatch_constraints,  /* dispatch_constraints.  */
+  ARRAY_SIZE (neoversev2_dispatch_constraints)  /* num_dispatch_constraints.  
*/
 };
 
 #endif /* GCC_AARCH64_H_NEOVERSEV2.  */
-- 
2.34.1

[PATCH 3/3] AArch64: Enable dispatch scheduling for Neoverse V2.

Reply via email to