Re: [PATCH v2] aarch64: Fine-grained ldp and stp policies with test-cases.

Richard Sandiford Thu, 28 Sep 2023 06:42:58 -0700

Manos Anagnostakis <manos.anagnosta...@vrull.eu> writes:
> Hey Richard,
>
> Thanks for taking the time to review this, but it has been commited since
> yesterday after getting reviewed by Kyrill and Tamar.
>
> Discussions:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631285.html
> https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631300.html
> https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631389.html
>
> Commited version:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631484.html


Sorry about that.  I had v3 being filtered differently and so it went
into a different inbox.

Richard

>
> Manos.
>
> On Thu, Sep 28, 2023 at 4:17 PM Richard Sandiford <richard.sandif...@arm.com>
> wrote:
>
>> Thanks for the patch and sorry for the slow review.
>>
>> Manos Anagnostakis <manos.anagnosta...@vrull.eu> writes:
>> > This patch implements the following TODO in gcc/config/aarch64/aarch64.cc
>> > to provide the requested behaviour for handling ldp and stp:
>> >
>> >   /* Allow the tuning structure to disable LDP instruction formation
>> >      from combining instructions (e.g., in peephole2).
>> >      TODO: Implement fine-grained tuning control for LDP and STP:
>> >            1. control policies for load and store separately;
>> >            2. support the following policies:
>> >               - default (use what is in the tuning structure)
>> >               - always
>> >               - never
>> >               - aligned (only if the compiler can prove that the
>> >                 load will be aligned to 2 * element_size)  */
>> >
>> > It provides two new and concrete command-line options -mldp-policy and
>> -mstp-policy
>> > to give the ability to control load and store policies seperately as
>> > stated in part 1 of the TODO.
>> >
>> > The accepted values for both options are:
>> > - default: Use the ldp/stp policy defined in the corresponding tuning
>> >   structure.
>> > - always: Emit ldp/stp regardless of alignment.
>> > - never: Do not emit ldp/stp.
>> > - aligned: In order to emit ldp/stp, first check if the load/store will
>> >   be aligned to 2 * element_size.
>> >
>> > gcc/ChangeLog:
>> >         * config/aarch64/aarch64-protos.h (struct tune_params): Add
>> >       appropriate enums for the policies.
>> >         * config/aarch64/aarch64-tuning-flags.def
>> >       (AARCH64_EXTRA_TUNING_OPTION): Remove superseded tuning
>> >       options.
>> >         * config/aarch64/aarch64.cc (aarch64_parse_ldp_policy): New
>> >       function to parse ldp-policy option.
>> >         (aarch64_parse_stp_policy): New function to parse stp-policy
>> option.
>> >         (aarch64_override_options_internal): Call parsing functions.
>> >         (aarch64_operands_ok_for_ldpstp): Add option-value check and
>> >       alignment check and remove superseded ones
>> >         (aarch64_operands_adjust_ok_for_ldpstp): Add option-value check
>> and
>> >       alignment check and remove superseded ones.
>> >         * config/aarch64/aarch64.opt: Add options.
>> >
>> > gcc/testsuite/ChangeLog:
>> >         * gcc.target/aarch64/ldp_aligned.c: New test.
>> >         * gcc.target/aarch64/ldp_always.c: New test.
>> >         * gcc.target/aarch64/ldp_never.c: New test.
>> >         * gcc.target/aarch64/stp_aligned.c: New test.
>> >         * gcc.target/aarch64/stp_always.c: New test.
>> >         * gcc.target/aarch64/stp_never.c: New test.
>> >
>> > Signed-off-by: Manos Anagnostakis <manos.anagnosta...@vrull.eu>
>> > ---
>> > Changes in v2:
>> >         - Fixed commited ldp tests to correctly trigger
>> >           and test aarch64_operands_adjust_ok_for_ldpstp in aarch64.cc.
>> >         - Added "-mcpu=generic" to commited tests to guarantee generic
>> target code
>> >           generation and not cause the regressions of v1.
>> >
>> >  gcc/config/aarch64/aarch64-protos.h           |  24 ++
>> >  gcc/config/aarch64/aarch64-tuning-flags.def   |   8 -
>> >  gcc/config/aarch64/aarch64.cc                 | 229 ++++++++++++++----
>> >  gcc/config/aarch64/aarch64.opt                |   8 +
>> >  .../gcc.target/aarch64/ldp_aligned.c          |  66 +++++
>> >  gcc/testsuite/gcc.target/aarch64/ldp_always.c |  66 +++++
>> >  gcc/testsuite/gcc.target/aarch64/ldp_never.c  |  66 +++++
>> >  .../gcc.target/aarch64/stp_aligned.c          |  60 +++++
>> >  gcc/testsuite/gcc.target/aarch64/stp_always.c |  60 +++++
>> >  gcc/testsuite/gcc.target/aarch64/stp_never.c  |  60 +++++
>> >  10 files changed, 586 insertions(+), 61 deletions(-)
>> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_aligned.c
>> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_always.c
>> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_never.c
>> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_aligned.c
>> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_always.c
>> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_never.c
>> >
>> > diff --git a/gcc/config/aarch64/aarch64-protos.h
>> b/gcc/config/aarch64/aarch64-protos.h
>> > index 70303d6fd95..be1d73490ed 100644
>> > --- a/gcc/config/aarch64/aarch64-protos.h
>> > +++ b/gcc/config/aarch64/aarch64-protos.h
>> > @@ -568,6 +568,30 @@ struct tune_params
>> >    /* Place prefetch struct pointer at the end to enable type checking
>> >       errors when tune_params misses elements (e.g., from erroneous
>> merges).  */
>> >    const struct cpu_prefetch_tune *prefetch;
>> > +/* An enum specifying how to handle load pairs using a fine-grained
>> policy:
>> > +   - LDP_POLICY_ALIGNED: Emit ldp if the source pointer is aligned
>> > +   to at least double the alignment of the type.
>> > +   - LDP_POLICY_ALWAYS: Emit ldp regardless of alignment.
>> > +   - LDP_POLICY_NEVER: Do not emit ldp.  */
>> > +
>> > +  enum aarch64_ldp_policy_model
>> > +  {
>> > +    LDP_POLICY_ALIGNED,
>> > +    LDP_POLICY_ALWAYS,
>> > +    LDP_POLICY_NEVER
>> > +  } ldp_policy_model;
>> > +/* An enum specifying how to handle store pairs using a fine-grained
>> policy:
>> > +   - STP_POLICY_ALIGNED: Emit stp if the source pointer is aligned
>> > +   to at least double the alignment of the type.
>> > +   - STP_POLICY_ALWAYS: Emit stp regardless of alignment.
>> > +   - STP_POLICY_NEVER: Do not emit stp.  */
>> > +
>> > +  enum aarch64_stp_policy_model
>> > +  {
>> > +    STP_POLICY_ALIGNED,
>> > +    STP_POLICY_ALWAYS,
>> > +    STP_POLICY_NEVER
>> > +  } stp_policy_model;
>> >  };
>>
>> Generally the patch looks really good.  But I think we can use a single
>> enum type for both LDP and STP, with the values having the prefix
>> AARCH&4_LDP_STP_POLICY.  That means that we only need one parser,
>> and that:
>>
>> >  /* Classifies an address.
>> > diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def
>> b/gcc/config/aarch64/aarch64-tuning-flags.def
>> > index 52112ba7c48..774568e9106 100644
>> > --- a/gcc/config/aarch64/aarch64-tuning-flags.def
>> > +++ b/gcc/config/aarch64/aarch64-tuning-flags.def
>> > @@ -30,11 +30,6 @@
>> >
>> >  AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS)
>> >
>> > -/* Don't create non-8 byte aligned load/store pair.  That is if the
>> > -two load/stores are not at least 8 byte aligned don't create load/store
>> > -pairs.   */
>> > -AARCH64_EXTRA_TUNING_OPTION ("slow_unaligned_ldpw", SLOW_UNALIGNED_LDPW)
>> > -
>> >  /* Some of the optional shift to some arthematic instructions are
>> >     considered cheap.  Logical shift left <=4 with or without a
>> >     zero extend are considered cheap.  Sign extend; non logical shift
>> left
>> > @@ -44,9 +39,6 @@ AARCH64_EXTRA_TUNING_OPTION ("cheap_shift_extend",
>> CHEAP_SHIFT_EXTEND)
>> >  /* Disallow load/store pair instructions on Q-registers.  */
>> >  AARCH64_EXTRA_TUNING_OPTION ("no_ldp_stp_qregs", NO_LDP_STP_QREGS)
>> >
>> > -/* Disallow load-pair instructions to be formed in combine/peephole.  */
>> > -AARCH64_EXTRA_TUNING_OPTION ("no_ldp_combine", NO_LDP_COMBINE)
>> > -
>> >  AARCH64_EXTRA_TUNING_OPTION ("rename_load_regs", RENAME_LOAD_REGS)
>> >
>> >  AARCH64_EXTRA_TUNING_OPTION ("cse_sve_vl_constants",
>> CSE_SVE_VL_CONSTANTS)
>> > diff --git a/gcc/config/aarch64/aarch64.cc
>> b/gcc/config/aarch64/aarch64.cc
>> > index eba5d4a7e04..43d88c68647 100644
>> > --- a/gcc/config/aarch64/aarch64.cc
>> > +++ b/gcc/config/aarch64/aarch64.cc
>> > @@ -1356,7 +1356,9 @@ static const struct tune_params generic_tunings =
>> >       Neoverse V1.  It does not have a noticeable effect on A64FX and
>> should
>> >       have at most a very minor effect on SVE2 cores.  */
>> >    (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS), /* tune_flags.  */
>> > -  &generic_prefetch_tune
>> > +  &generic_prefetch_tune,
>> > +  tune_params::LDP_POLICY_ALWAYS,    /* ldp_policy_model.  */
>> > +  tune_params::STP_POLICY_ALWAYS     /* stp_policy_model.  */
>> >  };
>> >
>> >  static const struct tune_params cortexa35_tunings =
>> > @@ -1390,7 +1392,9 @@ static const struct tune_params cortexa35_tunings =
>> >    0, /* max_case_values.  */
>> >    tune_params::AUTOPREFETCHER_WEAK,  /* autoprefetcher_model.  */
>> >    (AARCH64_EXTRA_TUNE_NONE), /* tune_flags.  */
>> > -  &generic_prefetch_tune
>> > +  &generic_prefetch_tune,
>> > +  tune_params::LDP_POLICY_ALWAYS,    /* ldp_policy_model.  */
>> > +  tune_params::STP_POLICY_ALWAYS     /* stp_policy_model.  */
>> >  };
>> >
>> >  static const struct tune_params cortexa53_tunings =
>> > @@ -1424,7 +1428,9 @@ static const struct tune_params cortexa53_tunings =
>> >    0, /* max_case_values.  */
>> >    tune_params::AUTOPREFETCHER_WEAK,  /* autoprefetcher_model.  */
>> >    (AARCH64_EXTRA_TUNE_NONE), /* tune_flags.  */
>> > -  &generic_prefetch_tune
>> > +  &generic_prefetch_tune,
>> > +  tune_params::LDP_POLICY_ALWAYS,    /* ldp_policy_model.  */
>> > +  tune_params::STP_POLICY_ALWAYS     /* stp_policy_model.  */
>> >  };
>> >
>> >  static const struct tune_params cortexa57_tunings =
>> > @@ -1458,7 +1464,9 @@ static const struct tune_params cortexa57_tunings =
>> >    0, /* max_case_values.  */
>> >    tune_params::AUTOPREFETCHER_WEAK,  /* autoprefetcher_model.  */
>> >    (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS),      /* tune_flags.  */
>> > -  &generic_prefetch_tune
>> > +  &generic_prefetch_tune,
>> > +  tune_params::LDP_POLICY_ALWAYS,    /* ldp_policy_model.  */
>> > +  tune_params::STP_POLICY_ALWAYS     /* stp_policy_model.  */
>> >  };
>> >
>> >  static const struct tune_params cortexa72_tunings =
>> > @@ -1492,7 +1500,9 @@ static const struct tune_params cortexa72_tunings =
>> >    0, /* max_case_values.  */
>> >    tune_params::AUTOPREFETCHER_WEAK,  /* autoprefetcher_model.  */
>> >    (AARCH64_EXTRA_TUNE_NONE), /* tune_flags.  */
>> > -  &generic_prefetch_tune
>> > +  &generic_prefetch_tune,
>> > +  tune_params::LDP_POLICY_ALWAYS,    /* ldp_policy_model.  */
>> > +  tune_params::STP_POLICY_ALWAYS     /* stp_policy_model.  */
>> >  };
>> >
>> >  static const struct tune_params cortexa73_tunings =
>> > @@ -1526,7 +1536,9 @@ static const struct tune_params cortexa73_tunings =
>> >    0, /* max_case_values.  */
>> >    tune_params::AUTOPREFETCHER_WEAK,  /* autoprefetcher_model.  */
>> >    (AARCH64_EXTRA_TUNE_NONE), /* tune_flags.  */
>> > -  &generic_prefetch_tune
>> > +  &generic_prefetch_tune,
>> > +  tune_params::LDP_POLICY_ALWAYS,    /* ldp_policy_model.  */
>> > +  tune_params::STP_POLICY_ALWAYS     /* stp_policy_model.  */
>> >  };
>> >
>> >
>> > @@ -1561,7 +1573,9 @@ static const struct tune_params exynosm1_tunings =
>> >    48,        /* max_case_values.  */
>> >    tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model.  */
>> >    (AARCH64_EXTRA_TUNE_NONE), /* tune_flags.  */
>> > -  &exynosm1_prefetch_tune
>> > +  &exynosm1_prefetch_tune,
>> > +  tune_params::LDP_POLICY_ALWAYS,    /* ldp_policy_model.  */
>> > +  tune_params::STP_POLICY_ALWAYS     /* stp_policy_model.  */
>> >  };
>> >
>> >  static const struct tune_params thunderxt88_tunings =
>> > @@ -1593,8 +1607,10 @@ static const struct tune_params
>> thunderxt88_tunings =
>> >    2, /* min_div_recip_mul_df.  */
>> >    0, /* max_case_values.  */
>> >    tune_params::AUTOPREFETCHER_OFF,   /* autoprefetcher_model.  */
>> > -  (AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW),  /* tune_flags.  */
>> > -  &thunderxt88_prefetch_tune
>> > +  (AARCH64_EXTRA_TUNE_NONE), /* tune_flags.  */
>> > +  &thunderxt88_prefetch_tune,
>> > +  tune_params::LDP_POLICY_ALIGNED,   /* ldp_policy_model.  */
>> > +  tune_params::STP_POLICY_ALIGNED    /* stp_policy_model.  */
>> >  };
>> >
>> >  static const struct tune_params thunderx_tunings =
>> > @@ -1626,9 +1642,10 @@ static const struct tune_params thunderx_tunings =
>> >    2, /* min_div_recip_mul_df.  */
>> >    0, /* max_case_values.  */
>> >    tune_params::AUTOPREFETCHER_OFF,   /* autoprefetcher_model.  */
>> > -  (AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW
>> > -   | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags.  */
>> > -  &thunderx_prefetch_tune
>> > +  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND),   /* tune_flags.  */
>> > +  &thunderx_prefetch_tune,
>> > +  tune_params::LDP_POLICY_ALIGNED,   /* ldp_policy_model.  */
>> > +  tune_params::STP_POLICY_ALIGNED    /* stp_policy_model.  */
>> >  };
>> >
>> >  static const struct tune_params tsv110_tunings =
>> > @@ -1662,7 +1679,9 @@ static const struct tune_params tsv110_tunings =
>> >    0,    /* max_case_values.  */
>> >    tune_params::AUTOPREFETCHER_WEAK,     /* autoprefetcher_model.  */
>> >    (AARCH64_EXTRA_TUNE_NONE),     /* tune_flags.  */
>> > -  &tsv110_prefetch_tune
>> > +  &tsv110_prefetch_tune,
>> > +  tune_params::LDP_POLICY_ALWAYS,    /* ldp_policy_model.  */
>> > +  tune_params::STP_POLICY_ALWAYS     /* stp_policy_model.  */
>> >  };
>> >
>> >  static const struct tune_params xgene1_tunings =
>> > @@ -1695,7 +1714,9 @@ static const struct tune_params xgene1_tunings =
>> >    17,        /* max_case_values.  */
>> >    tune_params::AUTOPREFETCHER_OFF,   /* autoprefetcher_model.  */
>> >    (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS),     /* tune_flags.  */
>> > -  &xgene1_prefetch_tune
>> > +  &xgene1_prefetch_tune,
>> > +  tune_params::LDP_POLICY_ALWAYS,    /* ldp_policy_model.  */
>> > +  tune_params::STP_POLICY_ALWAYS     /* stp_policy_model.  */
>> >  };
>> >
>> >  static const struct tune_params emag_tunings =
>> > @@ -1728,7 +1749,9 @@ static const struct tune_params emag_tunings =
>> >    17,        /* max_case_values.  */
>> >    tune_params::AUTOPREFETCHER_OFF,   /* autoprefetcher_model.  */
>> >    (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS),     /* tune_flags.  */
>> > -  &xgene1_prefetch_tune
>> > +  &xgene1_prefetch_tune,
>> > +  tune_params::LDP_POLICY_ALWAYS,    /* ldp_policy_model.  */
>> > +  tune_params::STP_POLICY_ALWAYS     /* stp_policy_model.  */
>> >  };
>> >
>> >  static const struct tune_params qdf24xx_tunings =
>> > @@ -1762,7 +1785,9 @@ static const struct tune_params qdf24xx_tunings =
>> >    0, /* max_case_values.  */
>> >    tune_params::AUTOPREFETCHER_WEAK,  /* autoprefetcher_model.  */
>> >    AARCH64_EXTRA_TUNE_RENAME_LOAD_REGS, /* tune_flags.  */
>> > -  &qdf24xx_prefetch_tune
>> > +  &qdf24xx_prefetch_tune,
>> > +  tune_params::LDP_POLICY_ALWAYS,    /* ldp_policy_model.  */
>> > +  tune_params::STP_POLICY_ALWAYS     /* stp_policy_model.  */
>> >  };
>> >
>> >  /* Tuning structure for the Qualcomm Saphira core.  Default to falkor
>> values
>> > @@ -1798,7 +1823,9 @@ static const struct tune_params saphira_tunings =
>> >    0, /* max_case_values.  */
>> >    tune_params::AUTOPREFETCHER_WEAK,  /* autoprefetcher_model.  */
>> >    (AARCH64_EXTRA_TUNE_NONE),         /* tune_flags.  */
>> > -  &generic_prefetch_tune
>> > +  &generic_prefetch_tune,
>> > +  tune_params::LDP_POLICY_ALWAYS,    /* ldp_policy_model.  */
>> > +  tune_params::STP_POLICY_ALWAYS     /* stp_policy_model.  */
>> >  };
>> >
>> >  static const struct tune_params thunderx2t99_tunings =
>> > @@ -1832,7 +1859,9 @@ static const struct tune_params
>> thunderx2t99_tunings =
>> >    0, /* max_case_values.  */
>> >    tune_params::AUTOPREFETCHER_WEAK,  /* autoprefetcher_model.  */
>> >    (AARCH64_EXTRA_TUNE_NONE), /* tune_flags.  */
>> > -  &thunderx2t99_prefetch_tune
>> > +  &thunderx2t99_prefetch_tune,
>> > +  tune_params::LDP_POLICY_ALWAYS,    /* ldp_policy_model.  */
>> > +  tune_params::STP_POLICY_ALWAYS     /* stp_policy_model.  */
>> >  };
>> >
>> >  static const struct tune_params thunderx3t110_tunings =
>> > @@ -1866,7 +1895,9 @@ static const struct tune_params
>> thunderx3t110_tunings =
>> >    0, /* max_case_values.  */
>> >    tune_params::AUTOPREFETCHER_WEAK,  /* autoprefetcher_model.  */
>> >    (AARCH64_EXTRA_TUNE_NONE), /* tune_flags.  */
>> > -  &thunderx3t110_prefetch_tune
>> > +  &thunderx3t110_prefetch_tune,
>> > +  tune_params::LDP_POLICY_ALWAYS,    /* ldp_policy_model.  */
>> > +  tune_params::STP_POLICY_ALWAYS     /* stp_policy_model.  */
>> >  };
>> >
>> >  static const struct tune_params neoversen1_tunings =
>> > @@ -1899,7 +1930,9 @@ static const struct tune_params neoversen1_tunings
>> =
>> >    0, /* max_case_values.  */
>> >    tune_params::AUTOPREFETCHER_WEAK,  /* autoprefetcher_model.  */
>> >    (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND),   /* tune_flags.  */
>> > -  &generic_prefetch_tune
>> > +  &generic_prefetch_tune,
>> > +  tune_params::LDP_POLICY_ALWAYS,    /* ldp_policy_model.  */
>> > +  tune_params::STP_POLICY_ALWAYS     /* stp_policy_model.  */
>> >  };
>> >
>> >  static const struct tune_params ampere1_tunings =
>> > @@ -1935,8 +1968,10 @@ static const struct tune_params ampere1_tunings =
>> >    2, /* min_div_recip_mul_df.  */
>> >    0, /* max_case_values.  */
>> >    tune_params::AUTOPREFETCHER_WEAK,  /* autoprefetcher_model.  */
>> > -  (AARCH64_EXTRA_TUNE_NO_LDP_COMBINE),       /* tune_flags.  */
>> > -  &ampere1_prefetch_tune
>> > +  (AARCH64_EXTRA_TUNE_NONE), /* tune_flags.  */
>> > +  &ampere1_prefetch_tune,
>> > +  tune_params::LDP_POLICY_ALIGNED,   /* ldp_policy_model.  */
>> > +  tune_params::STP_POLICY_ALIGNED    /* stp_policy_model.  */
>> >  };
>> >
>> >  static const struct tune_params ampere1a_tunings =
>> > @@ -1973,8 +2008,10 @@ static const struct tune_params ampere1a_tunings =
>> >    2, /* min_div_recip_mul_df.  */
>> >    0, /* max_case_values.  */
>> >    tune_params::AUTOPREFETCHER_WEAK,  /* autoprefetcher_model.  */
>> > -  (AARCH64_EXTRA_TUNE_NO_LDP_COMBINE),       /* tune_flags.  */
>> > -  &ampere1_prefetch_tune
>> > +  (AARCH64_EXTRA_TUNE_NONE), /* tune_flags.  */
>> > +  &ampere1_prefetch_tune,
>> > +  tune_params::LDP_POLICY_ALIGNED,   /* ldp_policy_model.  */
>> > +  tune_params::STP_POLICY_ALIGNED    /* stp_policy_model.  */
>> >  };
>> >
>> >  static const advsimd_vec_cost neoversev1_advsimd_vector_cost =
>> > @@ -2155,7 +2192,9 @@ static const struct tune_params neoversev1_tunings
>> =
>> >     | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
>> >     | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
>> >     | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags.  */
>> > -  &generic_prefetch_tune
>> > +  &generic_prefetch_tune,
>> > +  tune_params::LDP_POLICY_ALWAYS,    /* ldp_policy_model.  */
>> > +  tune_params::STP_POLICY_ALWAYS     /* stp_policy_model.  */
>> >  };
>> >
>> >  static const sve_vec_cost neoverse512tvb_sve_vector_cost =
>> > @@ -2292,7 +2331,9 @@ static const struct tune_params
>> neoverse512tvb_tunings =
>> >    (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
>> >     | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
>> >     | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),  /* tune_flags.  */
>> > -  &generic_prefetch_tune
>> > +  &generic_prefetch_tune,
>> > +  tune_params::LDP_POLICY_ALWAYS,    /* ldp_policy_model.  */
>> > +  tune_params::STP_POLICY_ALWAYS     /* stp_policy_model.  */
>> >  };
>> >
>> >  static const advsimd_vec_cost neoversen2_advsimd_vector_cost =
>> > @@ -2482,7 +2523,9 @@ static const struct tune_params neoversen2_tunings
>> =
>> >     | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
>> >     | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
>> >     | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),  /* tune_flags.  */
>> > -  &generic_prefetch_tune
>> > +  &generic_prefetch_tune,
>> > +  tune_params::LDP_POLICY_ALWAYS,    /* ldp_policy_model.  */
>> > +  tune_params::STP_POLICY_ALWAYS     /* stp_policy_model.  */
>> >  };
>> >
>> >  static const advsimd_vec_cost neoversev2_advsimd_vector_cost =
>> > @@ -2672,7 +2715,9 @@ static const struct tune_params neoversev2_tunings
>> =
>> >     | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
>> >     | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
>> >     | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),  /* tune_flags.  */
>> > -  &generic_prefetch_tune
>> > +  &generic_prefetch_tune,
>> > +  tune_params::LDP_POLICY_ALWAYS,    /* ldp_policy_model.  */
>> > +  tune_params::STP_POLICY_ALWAYS     /* stp_policy_model.  */
>> >  };
>> >
>> >  static const struct tune_params a64fx_tunings =
>> > @@ -2705,7 +2750,9 @@ static const struct tune_params a64fx_tunings =
>> >    0, /* max_case_values.  */
>> >    tune_params::AUTOPREFETCHER_WEAK,  /* autoprefetcher_model.  */
>> >    (AARCH64_EXTRA_TUNE_NONE), /* tune_flags.  */
>> > -  &a64fx_prefetch_tune
>> > +  &a64fx_prefetch_tune,
>> > +  tune_params::LDP_POLICY_ALWAYS,    /* ldp_policy_model.  */
>> > +  tune_params::STP_POLICY_ALWAYS     /* stp_policy_model.  */
>> >  };
>> >
>> >  /* Support for fine-grained override of the tuning structures.  */
>> > @@ -17726,6 +17773,50 @@ aarch64_parse_tune (const char *to_parse, const
>> struct processor **res)
>> >    return AARCH_PARSE_INVALID_ARG;
>> >  }
>> >
>> > +/* Validate a command-line -mldp-policy option.  Parse the policy
>> > +   specified in STR and throw errors if appropriate.  */
>> > +
>> > +static bool
>> > +aarch64_parse_ldp_policy (const char *str, struct tune_params* tune)
>> > +{
>> > +  /* Check the value of the option to be one of the accepted.  */
>> > +  if (strcmp (str, "always") == 0)
>> > +    tune->ldp_policy_model = tune_params::LDP_POLICY_ALWAYS;
>> > +  else if (strcmp (str, "never") == 0)
>> > +    tune->ldp_policy_model = tune_params::LDP_POLICY_NEVER;
>> > +  else if (strcmp (str, "aligned") == 0)
>> > +    tune->ldp_policy_model = tune_params::LDP_POLICY_ALIGNED;
>> > +  else if (strcmp (str, "default") != 0)
>> > +    {
>> > +      error ("unknown value %qs for %<-mldp-policy%>", str);
>> > +      return false;
>> > +    }
>> > +
>> > +  return true;
>> > +}
>> > +
>> > +/* Validate a command-line -mstp-policy option.  Parse the policy
>> > +   specified in STR and throw errors if appropriate.  */
>> > +
>> > +static bool
>> > +aarch64_parse_stp_policy (const char *str, struct tune_params* tune)
>> > +{
>> > +  /* Check the value of the option to be one of the accepted.  */
>> > +  if (strcmp (str, "always") == 0)
>> > +      tune->stp_policy_model = tune_params::STP_POLICY_ALWAYS;
>> > +  else if (strcmp (str, "never") == 0)
>> > +      tune->stp_policy_model = tune_params::STP_POLICY_NEVER;
>> > +  else if (strcmp (str, "aligned") == 0)
>> > +      tune->stp_policy_model = tune_params::STP_POLICY_ALIGNED;
>> > +  else if (strcmp (str, "default") != 0)
>> > +    {
>> > +      error ("unknown value %qs for %<-mstp-policy%>", str);
>> > +      return false;
>> > +    }
>> > +
>> > +  return true;
>> > +}
>> > +
>> >  /* Parse TOKEN, which has length LENGTH to see if it is an option
>> >     described in FLAG.  If it is, return the index bit for that fusion
>> type.
>> >     If not, error (printing OPTION_NAME) and return zero.  */
>> > @@ -18074,6 +18165,14 @@ aarch64_override_options_internal (struct
>> gcc_options *opts)
>> >      aarch64_parse_override_string (opts->x_aarch64_override_tune_string,
>> >                                  &aarch64_tune_params);
>> >
>> > +  if (opts->x_aarch64_ldp_policy_string)
>> > +    aarch64_parse_ldp_policy (opts->x_aarch64_ldp_policy_string,
>> > +                           &aarch64_tune_params);
>> > +
>> > +  if (opts->x_aarch64_stp_policy_string)
>> > +    aarch64_parse_stp_policy (opts->x_aarch64_stp_policy_string,
>> > +                           &aarch64_tune_params);
>> > +
>> >    /* This target defaults to strict volatile bitfields.  */
>> >    if (opts->x_flag_strict_volatile_bitfields < 0 &&
>> abi_version_at_least (2))
>> >      opts->x_flag_strict_volatile_bitfields = 1;
>> > @@ -26382,18 +26481,14 @@ aarch64_operands_ok_for_ldpstp (rtx *operands,
>> bool load,
>> >    enum reg_class rclass_1, rclass_2;
>> >    rtx mem_1, mem_2, reg_1, reg_2;
>> >
>> > -  /* Allow the tuning structure to disable LDP instruction formation
>> > -     from combining instructions (e.g., in peephole2).
>> > -     TODO: Implement fine-grained tuning control for LDP and STP:
>> > -        1. control policies for load and store separately;
>> > -        2. support the following policies:
>> > -           - default (use what is in the tuning structure)
>> > -           - always
>> > -           - never
>> > -           - aligned (only if the compiler can prove that the
>> > -             load will be aligned to 2 * element_size)  */
>> > -  if (load && (aarch64_tune_params.extra_tuning_flags
>> > -            & AARCH64_EXTRA_TUNE_NO_LDP_COMBINE))
>> > +  /* If we have LDP_POLICY_NEVER, reject the load pair.  */
>> > +  if (load
>> > +      && aarch64_tune_params.ldp_policy_model ==
>> tune_params::LDP_POLICY_NEVER)
>> > +    return false;
>> > +
>> > +  /* If we have STP_POLICY_NEVER, reject the store pair.  */
>> > +  if (!load
>> > +      && aarch64_tune_params.stp_policy_model ==
>> tune_params::STP_POLICY_NEVER)
>> >      return false;
>>
>> ...here we could do something like:
>>
>>   auto policy = (load
>>                  ? aarch64_tune_params.ldp_policy_model
>>                  : aarch64_tune_params.stp_policy_model);
>>
>> Also:
>>
>> >
>> >    if (load)
>> > @@ -26420,13 +26515,22 @@ aarch64_operands_ok_for_ldpstp (rtx *operands,
>> bool load,
>> >    if (MEM_VOLATILE_P (mem_1) || MEM_VOLATILE_P (mem_2))
>> >      return false;
>> >
>> > -  /* If we have SImode and slow unaligned ldp,
>> > -     check the alignment to be at least 8 byte. */
>> > -  if (mode == SImode
>> > -      && (aarch64_tune_params.extra_tuning_flags
>> > -          & AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW)
>> > +  /* If we have LDP_POLICY_ALIGNED,
>> > +     do not emit the load pair unless the alignment is checked to be
>> > +     at least double the alignment of the type.  */
>> > +  if (load
>> > +      && aarch64_tune_params.ldp_policy_model ==
>> tune_params::LDP_POLICY_ALIGNED
>> >        && !optimize_size
>> > -      && MEM_ALIGN (mem_1) < 8 * BITS_PER_UNIT)
>> > +      && MEM_ALIGN (mem_1) < 2 * GET_MODE_ALIGNMENT (mode))
>> > +    return false;
>> > +
>> > +  /* If we have STP_POLICY_ALIGNED,
>> > +     do not emit the store pair unless the alignment is checked to be
>> > +     at least double the alignment of the type.  */
>> > +  if (!load
>> > +      && aarch64_tune_params.stp_policy_model ==
>> tune_params::STP_POLICY_ALIGNED
>> > +      && !optimize_size
>> > +      && MEM_ALIGN (mem_1) < 2 * GET_MODE_ALIGNMENT (mode))
>> >      return false;
>> >
>> >    /* Check if the addresses are in the form of [base+offset].  */
>> > @@ -26556,6 +26660,16 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx
>> *operands, bool load,
>> >    HOST_WIDE_INT offvals[num_insns], msize;
>> >    rtx mem[num_insns], reg[num_insns], base[num_insns],
>> offset[num_insns];
>> >
>> > +  /* If we have LDP_POLICY_NEVER, reject the load pair.  */
>> > +  if (load
>> > +      && aarch64_tune_params.ldp_policy_model ==
>> tune_params::LDP_POLICY_NEVER)
>> > +    return false;
>> > +
>> > +  /* If we have STP_POLICY_NEVER, reject the store pair.  */
>> > +  if (!load
>> > +      && aarch64_tune_params.stp_policy_model ==
>> tune_params::STP_POLICY_NEVER)
>> > +    return false;
>> > +
>> >    if (load)
>> >      {
>> >        for (int i = 0; i < num_insns; i++)
>> > @@ -26645,13 +26759,22 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx
>> *operands, bool load,
>> >    if (offvals[0] % msize != offvals[2] % msize)
>> >      return false;
>> >
>> > -  /* If we have SImode and slow unaligned ldp,
>> > -     check the alignment to be at least 8 byte. */
>> > -  if (mode == SImode
>> > -      && (aarch64_tune_params.extra_tuning_flags
>> > -       & AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW)
>> > +  /* If we have LDP_POLICY_ALIGNED,
>> > +     do not emit the load pair unless the alignment is checked to be
>> > +     at least double the alignment of the type.  */
>> > +  if (load
>> > +      && aarch64_tune_params.ldp_policy_model ==
>> tune_params::LDP_POLICY_ALIGNED
>> > +      && !optimize_size
>> > +      && MEM_ALIGN (mem[0]) < 2 * GET_MODE_ALIGNMENT (mode))
>> > +    return false;
>> > +
>> > +  /* If we have STP_POLICY_ALIGNED,
>> > +     do not emit the store pair unless the alignment is checked to be
>> > +     at least double the alignment of the type.  */
>> > +  if (!load
>> > +      && aarch64_tune_params.stp_policy_model ==
>> tune_params::STP_POLICY_ALIGNED
>> >        && !optimize_size
>> > -      && MEM_ALIGN (mem[0]) < 8 * BITS_PER_UNIT)
>> > +      && MEM_ALIGN (mem[0]) < 2 * GET_MODE_ALIGNMENT (mode))
>> >      return false;
>> >
>> >    return true;
>> > diff --git a/gcc/config/aarch64/aarch64.opt
>> b/gcc/config/aarch64/aarch64.opt
>> > index 4a0580435a8..e5302947ce7 100644
>> > --- a/gcc/config/aarch64/aarch64.opt
>> > +++ b/gcc/config/aarch64/aarch64.opt
>> > @@ -205,6 +205,14 @@ msign-return-address=
>> >  Target WarnRemoved RejectNegative Joined Enum(aarch_ra_sign_scope_t)
>> Var(aarch_ra_sign_scope) Init(AARCH_FUNCTION_NONE) Save
>> >  Select return address signing scope.
>> >
>> > +mldp-policy=
>> > +Target RejectNegative Joined Var(aarch64_ldp_policy_string) Save
>> > +Fine-grained policy for load pairs.
>> > +
>> > +mstp-policy=
>> > +Target RejectNegative Joined Var(aarch64_stp_policy_string) Save
>> > +Fine-grained policy for store pairs.
>> > +
>> >  Enum
>> >  Name(aarch_ra_sign_scope_t) Type(enum aarch_function_type)
>> >  Supported AArch64 return address signing scope (for use with
>> -msign-return-address= option):
>> > diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_aligned.c
>> b/gcc/testsuite/gcc.target/aarch64/ldp_aligned.c
>> > new file mode 100644
>> > index 00000000000..6e29b265168
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.target/aarch64/ldp_aligned.c
>> > @@ -0,0 +1,66 @@
>> > +/* { dg-options "-O2 -mldp-policy=aligned -mcpu=generic" } */
>> > +
>> > +#include <stdlib.h>
>> > +#include <stdint.h>
>> > +
>> > +typedef int v4si __attribute__ ((vector_size (16)));
>> > +
>> > +#define LDP_TEST_ALIGNED(TYPE) \
>> > +TYPE ldp_aligned_##TYPE(char* ptr){ \
>> > +    TYPE a_0, a_1; \
>> > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) -
>> 1)); \
>> > +    a_0 = arr[0]; \
>> > +    a_1 = arr[1]; \
>> > +    return a_0 + a_1; \
>> > +}
>> > +
>> > +#define LDP_TEST_UNALIGNED(TYPE) \
>> > +TYPE ldp_unaligned_##TYPE(char* ptr){ \
>> > +    TYPE a_0, a_1; \
>> > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) -
>> 1)); \
>> > +    TYPE *a = arr+1; \
>> > +    a_0 = a[0]; \
>> > +    a_1 = a[1]; \
>> > +    return a_0 + a_1; \
>> > +}
>> > +
>> > +#define LDP_TEST_ADJUST_ALIGNED(TYPE) \
>> > +TYPE ldp_aligned_adjust_##TYPE(char* ptr){ \
>> > +    TYPE a_0, a_1, a_2, a_3, a_4; \
>> > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) -
>> 1)); \
>> > +    a_0 = arr[100]; \
>> > +    a_1 = arr[101]; \
>> > +    a_2 = arr[102]; \
>> > +    a_3 = arr[103]; \
>> > +    a_4 = arr[110]; \
>> > +    return a_0 + a_1 + a_2 + a_3 + a_4; \
>> > +}
>> > +
>> > +#define LDP_TEST_ADJUST_UNALIGNED(TYPE) \
>> > +TYPE ldp_unaligned_adjust_##TYPE(char* ptr){ \
>> > +    TYPE a_0, a_1, a_2, a_3, a_4; \
>> > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) -
>> 1)); \
>> > +    TYPE *a = arr+1; \
>> > +    a_0 = a[100]; \
>> > +    a_1 = a[101]; \
>> > +    a_2 = a[102]; \
>> > +    a_3 = a[103]; \
>> > +    a_4 = a[110]; \
>> > +    return a_0 + a_1 + a_2 + a_3 + a_4; \
>> > +}
>> > +
>> > +LDP_TEST_ALIGNED(int32_t);
>> > +LDP_TEST_ALIGNED(int64_t);
>> > +LDP_TEST_ALIGNED(v4si);
>> > +LDP_TEST_UNALIGNED(int32_t);
>> > +LDP_TEST_UNALIGNED(int64_t);
>> > +LDP_TEST_UNALIGNED(v4si);
>> > +LDP_TEST_ADJUST_ALIGNED(int32_t);
>> > +LDP_TEST_ADJUST_ALIGNED(int64_t);
>> > +LDP_TEST_ADJUST_UNALIGNED(int32_t);
>> > +LDP_TEST_ADJUST_UNALIGNED(int64_t);
>> > +
>> > +/* { dg-final { scan-assembler-times "ldp\tw\[0-9\]+, w\[0-9\]" 3 } } */
>> > +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\]" 3 } } */
>> > +/* { dg-final { scan-assembler-times "ldp\tq\[0-9\]+, q\[0-9\]" 1 } } */
>>
>> It might be better to split this into two tests, one for the aligned
>> accesses and one for the unaligned accesses.  Same for the store version.
>> (Splitting isn't necessary or useful for =always and =never though.)
>>
>> Thanks,
>> Richard
>>
>> > +
>> > diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_always.c
>> b/gcc/testsuite/gcc.target/aarch64/ldp_always.c
>> > new file mode 100644
>> > index 00000000000..d2c4cf343e9
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.target/aarch64/ldp_always.c
>> > @@ -0,0 +1,66 @@
>> > +/* { dg-options "-O2 -mldp-policy=always -mcpu=generic" } */
>> > +
>> > +#include <stdlib.h>
>> > +#include <stdint.h>
>> > +
>> > +typedef int v4si __attribute__ ((vector_size (16)));
>> > +
>> > +#define LDP_TEST_ALIGNED(TYPE) \
>> > +TYPE ldp_aligned_##TYPE(char* ptr){ \
>> > +    TYPE a_0, a_1; \
>> > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) -
>> 1)); \
>> > +    a_0 = arr[0]; \
>> > +    a_1 = arr[1]; \
>> > +    return a_0 + a_1; \
>> > +}
>> > +
>> > +#define LDP_TEST_UNALIGNED(TYPE) \
>> > +TYPE ldp_unaligned_##TYPE(char* ptr){ \
>> > +    TYPE a_0, a_1; \
>> > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) -
>> 1)); \
>> > +    TYPE *a = arr+1; \
>> > +    a_0 = a[0]; \
>> > +    a_1 = a[1]; \
>> > +    return a_0 + a_1; \
>> > +}
>> > +
>> > +#define LDP_TEST_ADJUST_ALIGNED(TYPE) \
>> > +TYPE ldp_aligned_adjust_##TYPE(char* ptr){ \
>> > +    TYPE a_0, a_1, a_2, a_3, a_4; \
>> > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) -
>> 1)); \
>> > +    a_0 = arr[100]; \
>> > +    a_1 = arr[101]; \
>> > +    a_2 = arr[102]; \
>> > +    a_3 = arr[103]; \
>> > +    a_4 = arr[110]; \
>> > +    return a_0 + a_1 + a_2 + a_3 + a_4; \
>> > +}
>> > +
>> > +#define LDP_TEST_ADJUST_UNALIGNED(TYPE) \
>> > +TYPE ldp_unaligned_adjust_##TYPE(char* ptr){ \
>> > +    TYPE a_0, a_1, a_2, a_3, a_4; \
>> > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) -
>> 1)); \
>> > +    TYPE *a = arr+1; \
>> > +    a_0 = a[100]; \
>> > +    a_1 = a[101]; \
>> > +    a_2 = a[102]; \
>> > +    a_3 = a[103]; \
>> > +    a_4 = a[110]; \
>> > +    return a_0 + a_1 + a_2 + a_3 + a_4; \
>> > +}
>> > +
>> > +LDP_TEST_ALIGNED(int32_t);
>> > +LDP_TEST_ALIGNED(int64_t);
>> > +LDP_TEST_ALIGNED(v4si);
>> > +LDP_TEST_UNALIGNED(int32_t);
>> > +LDP_TEST_UNALIGNED(int64_t);
>> > +LDP_TEST_UNALIGNED(v4si);
>> > +LDP_TEST_ADJUST_ALIGNED(int32_t);
>> > +LDP_TEST_ADJUST_ALIGNED(int64_t);
>> > +LDP_TEST_ADJUST_UNALIGNED(int32_t);
>> > +LDP_TEST_ADJUST_UNALIGNED(int64_t);
>> > +
>> > +/* { dg-final { scan-assembler-times "ldp\tw\[0-9\]+, w\[0-9\]" 6 } } */
>> > +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\]" 6 } } */
>> > +/* { dg-final { scan-assembler-times "ldp\tq\[0-9\]+, q\[0-9\]" 2 } } */
>> > +
>> > diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_never.c
>> b/gcc/testsuite/gcc.target/aarch64/ldp_never.c
>> > new file mode 100644
>> > index 00000000000..f8a45ee18be
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.target/aarch64/ldp_never.c
>> > @@ -0,0 +1,66 @@
>> > +/* { dg-options "-O2 -mldp-policy=never -mcpu=generic" } */
>> > +
>> > +#include <stdlib.h>
>> > +#include <stdint.h>
>> > +
>> > +typedef int v4si __attribute__ ((vector_size (16)));
>> > +
>> > +#define LDP_TEST_ALIGNED(TYPE) \
>> > +TYPE ldp_aligned_##TYPE(char* ptr){ \
>> > +    TYPE a_0, a_1; \
>> > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) -
>> 1)); \
>> > +    a_0 = arr[0]; \
>> > +    a_1 = arr[1]; \
>> > +    return a_0 + a_1; \
>> > +}
>> > +
>> > +#define LDP_TEST_UNALIGNED(TYPE) \
>> > +TYPE ldp_unaligned_##TYPE(char* ptr){ \
>> > +    TYPE a_0, a_1; \
>> > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) -
>> 1)); \
>> > +    TYPE *a = arr+1; \
>> > +    a_0 = a[0]; \
>> > +    a_1 = a[1]; \
>> > +    return a_0 + a_1; \
>> > +}
>> > +
>> > +#define LDP_TEST_ADJUST_ALIGNED(TYPE) \
>> > +TYPE ldp_aligned_adjust_##TYPE(char* ptr){ \
>> > +    TYPE a_0, a_1, a_2, a_3, a_4; \
>> > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) -
>> 1)); \
>> > +    a_0 = arr[100]; \
>> > +    a_1 = arr[101]; \
>> > +    a_2 = arr[102]; \
>> > +    a_3 = arr[103]; \
>> > +    a_4 = arr[110]; \
>> > +    return a_0 + a_1 + a_2 + a_3 + a_4; \
>> > +}
>> > +
>> > +#define LDP_TEST_ADJUST_UNALIGNED(TYPE) \
>> > +TYPE ldp_unaligned_adjust_##TYPE(char* ptr){ \
>> > +    TYPE a_0, a_1, a_2, a_3, a_4; \
>> > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) -
>> 1)); \
>> > +    TYPE *a = arr+1; \
>> > +    a_0 = a[100]; \
>> > +    a_1 = a[101]; \
>> > +    a_2 = a[102]; \
>> > +    a_3 = a[103]; \
>> > +    a_4 = a[110]; \
>> > +    return a_0 + a_1 + a_2 + a_3 + a_4; \
>> > +}
>> > +
>> > +LDP_TEST_ALIGNED(int32_t);
>> > +LDP_TEST_ALIGNED(int64_t);
>> > +LDP_TEST_ALIGNED(v4si);
>> > +LDP_TEST_UNALIGNED(int32_t);
>> > +LDP_TEST_UNALIGNED(int64_t);
>> > +LDP_TEST_UNALIGNED(v4si);
>> > +LDP_TEST_ADJUST_ALIGNED(int32_t);
>> > +LDP_TEST_ADJUST_ALIGNED(int64_t);
>> > +LDP_TEST_ADJUST_UNALIGNED(int32_t);
>> > +LDP_TEST_ADJUST_UNALIGNED(int64_t);
>> > +
>> > +/* { dg-final { scan-assembler-times "ldp\tw\[0-9\]+, w\[0-9\]" 0 } } */
>> > +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\]" 0 } } */
>> > +/* { dg-final { scan-assembler-times "ldp\tq\[0-9\]+, q\[0-9\]" 0 } } */
>> > +
>> > diff --git a/gcc/testsuite/gcc.target/aarch64/stp_aligned.c
>> b/gcc/testsuite/gcc.target/aarch64/stp_aligned.c
>> > new file mode 100644
>> > index 00000000000..ae47b42efc4
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.target/aarch64/stp_aligned.c
>> > @@ -0,0 +1,60 @@
>> > +/* { dg-options "-O2 -mstp-policy=aligned -mcpu=generic" } */
>> > +
>> > +#include <stdlib.h>
>> > +#include <stdint.h>
>> > +
>> > +typedef int v4si __attribute__ ((vector_size (16)));
>> > +
>> > +#define STP_TEST_ALIGNED(TYPE) \
>> > +TYPE *stp_aligned_##TYPE(char* ptr, TYPE x){ \
>> > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) -
>> 1)); \
>> > +    arr[0] = x; \
>> > +    arr[1] = x; \
>> > +    return arr; \
>> > +}
>> > +
>> > +#define STP_TEST_UNALIGNED(TYPE) \
>> > +TYPE *stp_unaligned_##TYPE(char* ptr, TYPE x){ \
>> > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) -
>> 1)); \
>> > +    TYPE *a = arr+1; \
>> > +    a[0] = x; \
>> > +    a[1] = x; \
>> > +    return a; \
>> > +}
>> > +
>> > +#define STP_TEST_ADJUST_ALIGNED(TYPE) \
>> > +TYPE *stp_aligned_adjust_##TYPE(char* ptr, TYPE x){ \
>> > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) -
>> 1)); \
>> > +    arr[100] = x; \
>> > +    arr[101] = x; \
>> > +    arr[102] = x; \
>> > +    arr[103] = x; \
>> > +    return arr; \
>> > +}
>> > +
>> > +#define STP_TEST_ADJUST_UNALIGNED(TYPE) \
>> > +TYPE *stp_unaligned_adjust_##TYPE(char* ptr, TYPE x){ \
>> > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) -
>> 1)); \
>> > +    TYPE *a = arr+1; \
>> > +    a[100] = x; \
>> > +    a[101] = x; \
>> > +    a[102] = x; \
>> > +    a[103] = x; \
>> > +    return a; \
>> > +}
>> > +
>> > +STP_TEST_ALIGNED(int32_t);
>> > +STP_TEST_ALIGNED(int64_t);
>> > +STP_TEST_ALIGNED(v4si);
>> > +STP_TEST_UNALIGNED(int32_t);
>> > +STP_TEST_UNALIGNED(int64_t);
>> > +STP_TEST_UNALIGNED(v4si);
>> > +STP_TEST_ADJUST_ALIGNED(int32_t);
>> > +STP_TEST_ADJUST_ALIGNED(int64_t);
>> > +STP_TEST_ADJUST_UNALIGNED(int32_t);
>> > +STP_TEST_ADJUST_UNALIGNED(int64_t);
>> > +
>> > +/* { dg-final { scan-assembler-times "stp\tw\[0-9\]+, w\[0-9\]" 3 } } */
>> > +/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\]" 3 } } */
>> > +/* { dg-final { scan-assembler-times "stp\tq\[0-9\]+, q\[0-9\]" 1 } } */
>> > +
>> > diff --git a/gcc/testsuite/gcc.target/aarch64/stp_always.c
>> b/gcc/testsuite/gcc.target/aarch64/stp_always.c
>> > new file mode 100644
>> > index 00000000000..c1c51f9ae88
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.target/aarch64/stp_always.c
>> > @@ -0,0 +1,60 @@
>> > +/* { dg-options "-O2 -mstp-policy=always -mcpu=generic" } */
>> > +
>> > +#include <stdlib.h>
>> > +#include <stdint.h>
>> > +
>> > +typedef int v4si __attribute__ ((vector_size (16)));
>> > +
>> > +#define STP_TEST_ALIGNED(TYPE) \
>> > +TYPE *stp_aligned_##TYPE(char* ptr, TYPE x){ \
>> > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) -
>> 1)); \
>> > +    arr[0] = x; \
>> > +    arr[1] = x; \
>> > +    return arr; \
>> > +}
>> > +
>> > +#define STP_TEST_UNALIGNED(TYPE) \
>> > +TYPE *stp_unaligned_##TYPE(char* ptr, TYPE x){ \
>> > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) -
>> 1)); \
>> > +    TYPE *a = arr+1; \
>> > +    a[0] = x; \
>> > +    a[1] = x; \
>> > +    return a; \
>> > +}
>> > +
>> > +#define STP_TEST_ADJUST_ALIGNED(TYPE) \
>> > +TYPE *stp_aligned_adjust_##TYPE(char* ptr, TYPE x){ \
>> > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) -
>> 1)); \
>> > +    arr[100] = x; \
>> > +    arr[101] = x; \
>> > +    arr[102] = x; \
>> > +    arr[103] = x; \
>> > +    return arr; \
>> > +}
>> > +
>> > +#define STP_TEST_ADJUST_UNALIGNED(TYPE) \
>> > +TYPE *stp_unaligned_adjust_##TYPE(char* ptr, TYPE x){ \
>> > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) -
>> 1)); \
>> > +    TYPE *a = arr+1; \
>> > +    a[100] = x; \
>> > +    a[101] = x; \
>> > +    a[102] = x; \
>> > +    a[103] = x; \
>> > +    return a; \
>> > +}
>> > +
>> > +STP_TEST_ALIGNED(int32_t);
>> > +STP_TEST_ALIGNED(int64_t);
>> > +STP_TEST_ALIGNED(v4si);
>> > +STP_TEST_UNALIGNED(int32_t);
>> > +STP_TEST_UNALIGNED(int64_t);
>> > +STP_TEST_UNALIGNED(v4si);
>> > +STP_TEST_ADJUST_ALIGNED(int32_t);
>> > +STP_TEST_ADJUST_ALIGNED(int64_t);
>> > +STP_TEST_ADJUST_UNALIGNED(int32_t);
>> > +STP_TEST_ADJUST_UNALIGNED(int64_t);
>> > +
>> > +/* { dg-final { scan-assembler-times "stp\tw\[0-9\]+, w\[0-9\]" 6 } } */
>> > +/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\]" 6 } } */
>> > +/* { dg-final { scan-assembler-times "stp\tq\[0-9\]+, q\[0-9\]" 2 } } */
>> > +
>> > diff --git a/gcc/testsuite/gcc.target/aarch64/stp_never.c
>> b/gcc/testsuite/gcc.target/aarch64/stp_never.c
>> > new file mode 100644
>> > index 00000000000..c28fcafa0ed
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.target/aarch64/stp_never.c
>> > @@ -0,0 +1,60 @@
>> > +/* { dg-options "-O2 -mstp-policy=never -mcpu=generic" } */
>> > +
>> > +#include <stdlib.h>
>> > +#include <stdint.h>
>> > +
>> > +typedef int v4si __attribute__ ((vector_size (16)));
>> > +
>> > +#define STP_TEST_ALIGNED(TYPE) \
>> > +TYPE *stp_aligned_##TYPE(char* ptr, TYPE x){ \
>> > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) -
>> 1)); \
>> > +    arr[0] = x; \
>> > +    arr[1] = x; \
>> > +    return arr; \
>> > +}
>> > +
>> > +#define STP_TEST_UNALIGNED(TYPE) \
>> > +TYPE *stp_unaligned_##TYPE(char* ptr, TYPE x){ \
>> > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) -
>> 1)); \
>> > +    TYPE *a = arr+1; \
>> > +    a[0] = x; \
>> > +    a[1] = x; \
>> > +    return a; \
>> > +}
>> > +
>> > +#define STP_TEST_ADJUST_ALIGNED(TYPE) \
>> > +TYPE *stp_aligned_adjust_##TYPE(char* ptr, TYPE x){ \
>> > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) -
>> 1)); \
>> > +    arr[100] = x; \
>> > +    arr[101] = x; \
>> > +    arr[102] = x; \
>> > +    arr[103] = x; \
>> > +    return arr; \
>> > +}
>> > +
>> > +#define STP_TEST_ADJUST_UNALIGNED(TYPE) \
>> > +TYPE *stp_unaligned_adjust_##TYPE(char* ptr, TYPE x){ \
>> > +    TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) -
>> 1)); \
>> > +    TYPE *a = arr+1; \
>> > +    a[100] = x; \
>> > +    a[101] = x; \
>> > +    a[102] = x; \
>> > +    a[103] = x; \
>> > +    return a; \
>> > +}
>> > +
>> > +STP_TEST_ALIGNED(int32_t);
>> > +STP_TEST_ALIGNED(int64_t);
>> > +STP_TEST_ALIGNED(v4si);
>> > +STP_TEST_UNALIGNED(int32_t);
>> > +STP_TEST_UNALIGNED(int64_t);
>> > +STP_TEST_UNALIGNED(v4si);
>> > +STP_TEST_ADJUST_ALIGNED(int32_t);
>> > +STP_TEST_ADJUST_ALIGNED(int64_t);
>> > +STP_TEST_ADJUST_UNALIGNED(int32_t);
>> > +STP_TEST_ADJUST_UNALIGNED(int64_t);
>> > +
>> > +/* { dg-final { scan-assembler-times "stp\tw\[0-9\]+, w\[0-9\]" 0 } } */
>> > +/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\]" 0 } } */
>> > +/* { dg-final { scan-assembler-times "stp\tq\[0-9\]+, q\[0-9\]" 0 } } */
>> > +
>>

Re: [PATCH v2] aarch64: Fine-grained ldp and stp policies with test-cases.

Reply via email to