Jennifer Schmitz <jschm...@nvidia.com> writes: > Thank you for the feedback. I added checks for SCALAR_INT_MODE_P for the reg > operands of the compare and if-then-else expressions. As it is not legal to > have different modes in the operand registers, I only added one check for > each of the expressions. > The updated patch was bootstrapped and tested again. > Best, > Jennifer > > From 8da609be99fece8130cf1429bd938b2a26c6672b Mon Sep 17 00:00:00 2001 > From: Jennifer Schmitz <jschm...@nvidia.com> > Date: Wed, 24 Jul 2024 06:13:59 -0700 > Subject: [PATCH] aarch64: Fuse CMP+CSEL and CMP+CSET for -mcpu=neoverse-v2 > > According to the Neoverse V2 Software Optimization Guide (section 4.14), the > instruction pairs CMP+CSEL and CMP+CSET can be fused, which had not been > implemented so far. This patch implements and tests the two fusion pairs. > > The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. > There was also no non-noise impact on SPEC CPU2017 benchmark. > OK for mainline? > > Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> > > gcc/ > > * config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Implement > fusion logic. > * config/aarch64/aarch64-fusion-pairs.def (cmp+csel): New entry. > (cmp+cset): Likewise. > * config/aarch64/tuning_models/neoversev2.h: Enable logic in > field fusible_ops. > > gcc/testsuite/ > > * gcc.target/aarch64/fuse_cmp_csel.c: New test. > * gcc.target/aarch64/fuse_cmp_cset.c: Likewise.
Thanks for the update. It looks from a quick scan like the main three instructions associated with single-set integer COMPAREs are CMP, CMN and TST. TST could be distinguished from CMP and CMN based on get_attr_type (), although it looks like: (define_insn "*and<mode>_compare0" [(set (reg:CC_Z CC_REGNUM) (compare:CC_Z (match_operand:SHORT 0 "register_operand" "r") (const_int 0)))] "" "tst\\t%<w>0, <short_mask>" [(set_attr "type" "alus_imm")] ) should use logics_imm instead of alus_imm. Alternatively, we could add a new attribute for "compare_type" and use that. That would make the test in aarch_macro_fusion_pair_p slightly simpler, since it could use get_attr_compare_type without having to look at the pattern of prev_set. But there's a danger that we'd forget to add the new attribute for new comparison instructions. I did wonder whether we could simply punt on CC_Zmode, but that's not a reliable test. But I suppose the counter-argument to my questions above is: how bad would it be if we fused CMN and TST? They are at least plausible fusions, so it probably doesn't matter if we include them too. So: > --- > gcc/config/aarch64/aarch64-fusion-pairs.def | 2 ++ > gcc/config/aarch64/aarch64.cc | 22 +++++++++++++ > gcc/config/aarch64/tuning_models/neoversev2.h | 5 ++- > .../gcc.target/aarch64/fuse_cmp_csel.c | 33 +++++++++++++++++++ > .../gcc.target/aarch64/fuse_cmp_cset.c | 31 +++++++++++++++++ > 5 files changed, 92 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/fuse_cmp_csel.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/fuse_cmp_cset.c > > diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def > b/gcc/config/aarch64/aarch64-fusion-pairs.def > index 9a43b0c8065..bf5e85ba8fe 100644 > --- a/gcc/config/aarch64/aarch64-fusion-pairs.def > +++ b/gcc/config/aarch64/aarch64-fusion-pairs.def > @@ -37,5 +37,7 @@ AARCH64_FUSION_PAIR ("aes+aesmc", AES_AESMC) > AARCH64_FUSION_PAIR ("alu+branch", ALU_BRANCH) > AARCH64_FUSION_PAIR ("alu+cbz", ALU_CBZ) > AARCH64_FUSION_PAIR ("addsub_2reg_const1", ADDSUB_2REG_CONST1) > +AARCH64_FUSION_PAIR ("cmp+csel", CMP_CSEL) > +AARCH64_FUSION_PAIR ("cmp+cset", CMP_CSET) > > #undef AARCH64_FUSION_PAIR > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index 9e51236ce9f..7a0351f7dac 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -27348,6 +27348,28 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn > *curr) > && reg_referenced_p (SET_DEST (prev_set), PATTERN (curr))) > return true; > > + /* Fuse CMP and CSEL. */ How about adding something like: The test for CMP is not exact, and includes things like CMN and TST as well. However, fusing with those instructions doesn't seem inherently bad. > + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSEL) > + && prev_set && curr_set > + && GET_CODE (SET_SRC (prev_set)) == COMPARE > + && GET_CODE (SET_SRC (curr_set)) == IF_THEN_ELSE > + && REG_P (XEXP (SET_SRC (curr_set), 1)) > + && REG_P (XEXP (SET_SRC (curr_set), 2)) I think we could use aarch64_reg_or_zero instead of REG_P for the last two lines, which would allow selects involving W/XZR as well. Looks good to me otherwise, but I'd be interested to hear what others think about the CMP match. Thanks, Richard > + && SCALAR_INT_MODE_P (GET_MODE (XEXP (SET_SRC (prev_set), 0))) > + && SCALAR_INT_MODE_P (GET_MODE (XEXP (SET_SRC (curr_set), 1))) > + && reg_referenced_p (SET_DEST (prev_set), PATTERN (curr))) > + return true; > + > + /* Fuse CMP and CSET. */ > + if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSET) > + && prev_set && curr_set > + && GET_CODE (SET_SRC (prev_set)) == COMPARE > + && GET_RTX_CLASS (GET_CODE (SET_SRC (curr_set))) == RTX_COMPARE > + && REG_P (SET_DEST (curr_set)) > + && SCALAR_INT_MODE_P (GET_MODE (XEXP (SET_SRC (prev_set), 0))) > + && reg_referenced_p (SET_DEST (prev_set), PATTERN (curr))) > + return true; > + > /* Fuse flag-setting ALU instructions and conditional branch. */ > if (aarch64_fusion_enabled_p (AARCH64_FUSE_ALU_BRANCH) > && any_condjump_p (curr)) > diff --git a/gcc/config/aarch64/tuning_models/neoversev2.h > b/gcc/config/aarch64/tuning_models/neoversev2.h > index f76e4ef358f..ae99fab22d8 100644 > --- a/gcc/config/aarch64/tuning_models/neoversev2.h > +++ b/gcc/config/aarch64/tuning_models/neoversev2.h > @@ -221,7 +221,10 @@ static const struct tune_params neoversev2_tunings = > 2 /* store_pred. */ > }, /* memmov_cost. */ > 5, /* issue_rate */ > - (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ > + (AARCH64_FUSE_AES_AESMC > + | AARCH64_FUSE_CMP_BRANCH > + | AARCH64_FUSE_CMP_CSEL > + | AARCH64_FUSE_CMP_CSET), /* fusible_ops */ > "32:16", /* function_align. */ > "4", /* jump_align. */ > "32:16", /* loop_align. */ > diff --git a/gcc/testsuite/gcc.target/aarch64/fuse_cmp_csel.c > b/gcc/testsuite/gcc.target/aarch64/fuse_cmp_csel.c > new file mode 100644 > index 00000000000..85f302bab98 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/fuse_cmp_csel.c > @@ -0,0 +1,33 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -mcpu=neoverse-v2" } */ > +/* { dg-final { check-function-bodies "**" "" } } */ > + > +/* > +** f1: > +** ... > +** cmp w[0-9]+, w[0-9]+ > +** csel w[0-9]+, w[0-9]+, w[0-9]+, le > +** ret > +*/ > +int f1 (int a, int b, int c) > +{ > + int cmp = a > b; > + int add1 = c + 3; > + int add2 = c + 8; > + return cmp ? add1 : add2; > +} > + > +/* > +** f2: > +** ... > +** cmp x[0-9]+, x[0-9]+ > +** csel x[0-9]+, x[0-9]+, x[0-9]+, le > +** ret > +*/ > +long long f2 (long long a, long long b, long long c) > +{ > + long long cmp = a > b; > + long long add1 = c + 3; > + long long add2 = c + 8; > + return cmp ? add1 : add2; > +} > diff --git a/gcc/testsuite/gcc.target/aarch64/fuse_cmp_cset.c > b/gcc/testsuite/gcc.target/aarch64/fuse_cmp_cset.c > new file mode 100644 > index 00000000000..04f1ce2773b > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/fuse_cmp_cset.c > @@ -0,0 +1,31 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -mcpu=neoverse-v2" } */ > +/* { dg-final { check-function-bodies "**" "" } } */ > + > +/* > +** f1: > +** cmp w[0-9]+, w[0-9]+ > +** cset w[0-9]+, gt > +** ... > +*/ > +int g; > +int f1 (int a, int b) > +{ > + int cmp = a > b; > + g = cmp + 1; > + return cmp; > +} > + > +/* > +** f2: > +** cmp x[0-9]+, x[0-9]+ > +** cset x[0-9]+, gt > +** ... > +*/ > +long long h; > +long long f2 (long long a, long long b) > +{ > + long long cmp = a > b; > + h = cmp + 1; > + return cmp; > +}