Hi Richard,

> -----Original Message-----
> From: Richard Sandiford <richard.sandif...@arm.com>
> Sent: Monday, November 20, 2023 12:16 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH] aarch64: Add an early RA for strided registers
> 
> [Yeah, I just missed the stage1 deadline, sorry.  But this is gated
>  behind several other things, so it seems a bit academic whether it
>  was posted yesterday or today.]
> 
> This pass adds a simple register allocator for FP & SIMD registers.
> Its main purpose is to make use of SME2's strided LD1, ST1 and LUTI2/4
> instructions, which require a very specific grouping structure,
> and so would be difficult to exploit with general allocation.
> 
> The allocator is very simple.  It gives up on anything that would
> require spilling, or that it might not handle well for other reasons.
> 
> The allocator needs to track liveness at the level of individual FPRs.
> Doing that fixes a lot of the PRs relating to redundant moves caused by
> structure loads and stores.  That particular problem is now fixed more
> generally by Lehua's RA patches.
> 
> However, the patch runs before scheduling, so it has a chance to bag
> a spill-free allocation of vector code before the scheduler moves
> things around.  It could therefore still be useful for non-SME code
> (e.g. for hand-scheduled ACLE code).
> 
> The pass is controlled by a tristate switch:
> 
> - -mearly-ra=all: run on all functions
> - -mearly-ra=strided: run on functions that have access to strided registers
> - -mearly-ra=none: don't run on any function
> 
> The patch makes -mearly-ra=strided the default at -O2 and above.

Can you please add a statement about that default to the invoke.texi 
documentation.

> However, I've tested it with -mearly-ra=all too.
> 
> As said previously, the pass is very naive.  There's much more that we
> could do, such as handling invariants better.  The main focus is on not
> committing to a bad allocation, rather than on handling as much as
> possible.
> 

At a first glance it seemed to me a bit unconventional to add an 
aarch64-specific pass for such an RA task.
But I tend to agree that the specific allocation requirements for these 
instructions warrant some target-specific logic.
And we're quite happy to have target-specific passes that adjust all manners of 
instruction selection and scheduling decisions, and I don't see the strided 
register allocation decisions as qualitatively different.

> Tested on aarch64-linux-gnu.

I'm okay with that patch, but it looks like it has a small hunk to 
ira-costs.cc, so I guess you'll want an authority on that to okay it.

One further small comment inline below.

Thanks,
Kyrill

> 
> Richard
> 
> 
> gcc/
>       * ira-costs.cc (scan_one_insn): Restrict handling of parameter
>       loads to pseudo registers.
>       * config.gcc: Add aarch64-early-ra.o for AArch64 targets.
>       * config/aarch64/t-aarch64 (aarch64-early-ra.o): New rule.
>       * config/aarch64/aarch64-opts.h (aarch64_early_ra_scope): New enum.
>       * config/aarch64/aarch64.opt (mearly_ra): New option.
>       * doc/invoke.texi: Document it.
>       * common/config/aarch64/aarch64-common.cc
>       (aarch_option_optimization_table): Use -mearly-ra=strided by
>       default for -O2 and above.
>       * config/aarch64/aarch64-passes.def (pass_aarch64_early_ra): New
> pass.
>       * config/aarch64/aarch64-protos.h (aarch64_strided_registers_p)
>       (make_pass_aarch64_early_ra): Declare.
>       * config/aarch64/aarch64-sme.md
> (@aarch64_sme_lut<LUTI_BITS><mode>):
>       Add a stride_type attribute.
>       (@aarch64_sme_lut<LUTI_BITS><mode>_strided2): New pattern.
>       (@aarch64_sme_lut<LUTI_BITS><mode>_strided4): Likewise.
>       * config/aarch64/aarch64-sve-builtins-base.cc (svld1_impl::expand)
>       (svldnt1_impl::expand, svst1_impl::expand, svstn1_impl::expand):
> Handle
>       new way of defining multi-register loads and stores.
>       * config/aarch64/aarch64-sve.md
> (@aarch64_ld1<SVE_FULLx24:mode>)
>       (@aarch64_ldnt1<SVE_FULLx24:mode>,
> @aarch64_st1<SVE_FULLx24:mode>)
>       (@aarch64_stnt1<SVE_FULLx24:mode>): Delete.
>       * config/aarch64/aarch64-sve2.md
> (@aarch64_<LD1_COUNT:optab><mode>)
>       (@aarch64_<LD1_COUNT:optab><mode>_strided2): New patterns.
>       (@aarch64_<LD1_COUNT:optab><mode>_strided4): Likewise.
>       (@aarch64_<ST1_COUNT:optab><mode>): Likewise.
>       (@aarch64_<ST1_COUNT:optab><mode>_strided2): Likewise.
>       (@aarch64_<ST1_COUNT:optab><mode>_strided4): Likewise.
>       * config/aarch64/aarch64.cc (aarch64_strided_registers_p): New
>       function.
>       * config/aarch64/aarch64.md (UNSPEC_LD1_SVE_COUNT): Delete.
>       (UNSPEC_ST1_SVE_COUNT, UNSPEC_LDNT1_SVE_COUNT): Likewise.
>       (UNSPEC_STNT1_SVE_COUNT): Likewise.
>       (stride_type): New attribute.
>       * config/aarch64/constraints.md (Uwd, Uwt): New constraints.
>       * config/aarch64/iterators.md (UNSPEC_LD1_COUNT,
> UNSPEC_LDNT1_COUNT)
>       (UNSPEC_ST1_COUNT, UNSPEC_STNT1_COUNT): New unspecs.
>       (optab): Handle them.
>       (LD1_COUNT, ST1_COUNT): New iterators.
>       * config/aarch64/aarch64-early-ra.cc: New file.
> 
> gcc/testsuite/
>       * gcc.target/aarch64/ldp_stp_16.c (cons4_4_float): Tighten expected
>       output test.
>       * gcc.target/aarch64/sme/strided_1.c: New test.
> ---

> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index bd6c236f884..423f406a00d 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -525,6 +521,11 @@ (define_attr "predicated" "yes,no" (const_string "no"))
>  ;; may chose to hold the tracking state encoded in SP.
>  (define_attr "speculation_barrier" "true,false" (const_string "false"))
> 
> +(define_attr "stride_type"
> +  "none,ld1_consecutive,ld1_strided,st1_consecutive,st1_strided,
> +   luti_consecutive,luti_strided"
> +  (const_string "none"))

I think it'd be good to have a comment on this attribute to let future 
developers know that these attributes are used
by the new RA pass, and whether they are mandatory for correctness or 
optimization purposes.

Reply via email to