Soumya AR <soum...@nvidia.com> writes:
> One additional change with this patch is that I had to update ldapr-sext.c 
> too.
>
> During the combine pass, cases of UNSPECV_LDAP (with an offset) + sign_extend
> transform to LDAPUR + SEXT, and later to LDAPURS (with address folding).
>
> The aarch64 tests run with -moverride=tune=none, which clears all tuning flags
> including the AVOID_LDAPUR flag, enabling LDAPUR. (Since the generic arch 
> tuning
> adjustments are applied before the tune override.)
>
> This breaks ldapr-sext.c, in all instances where an offset is used. (Even 
> though
> there is no explicit offset in the testcase, since it uses global variables,
> there is an implicit offset due to the alignment padding in bss.)
>
> As a result, the following code:
>
> add   x0, x0, 2
> ldapursh      x0, [x0]
>
> becomes:
>
> ldapursh      x0, [x0, 2]
>
> I can just add -moverride=tune=avoid_ldapur to ldapr-sext.c to maintain the
> original behavior but since we're extending LDAPURS to fold offsets anyway,
> I think it makes sense to check that alongside as well.
> Let me know if that works.

Yeah, sounds good to me.

> [...]
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index abbb97768f5..4ee539e1dcd 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -18760,6 +18760,8 @@ aarch64_adjust_generic_arch_tuning (struct 
> tune_params &current_tune)
>    if (TARGET_SVE2)
>      current_tune.extra_tuning_flags
>        &= ~AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS;
> +  if (!AARCH64_HAVE_ISA(V8_8A))
> +      aarch64_tune_params.extra_tuning_flags |= 
> AARCH64_EXTRA_TUNE_AVOID_LDAPUR;

Sorry for the formatting nit, but the last line above should be
indented by 2 columns fewer.

> diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
> index 36b0dbd1f57..57a8e410c20 100644
> --- a/gcc/config/aarch64/atomics.md
> +++ b/gcc/config/aarch64/atomics.md
> @@ -679,13 +679,16 @@
>  )
>  
>  (define_insn "aarch64_atomic_load<mode>_rcpc"
> -  [(set (match_operand:ALLI 0 "register_operand" "=r")
> +  [(set (match_operand:ALLI 0 "register_operand")
>      (unspec_volatile:ALLI
> -      [(match_operand:ALLI 1 "aarch64_sync_memory_operand" "Q")
> +      [(match_operand:ALLI 1 "aarch64_rcpc_memory_operand")
>         (match_operand:SI 2 "const_int_operand")]                     ;; model
>        UNSPECV_LDAP))]
>    "TARGET_RCPC"
> -  "ldapr<atomic_sfx>\t%<w>0, %1"
> +  {@ [ cons: =0 , 1   ; attrs: enable_ldapur ]
> +     [ r       , Q   ; any                   ] ldapr<atomic_sfx>\t%<w>0, %1
> +     [ r       , Ust ; yes                   ] ldapur<atomic_sfx>\t%<w>0, %1
> +  }
>  )
>  
>  (define_insn "aarch64_atomic_load<mode>"
> @@ -705,21 +708,24 @@
>  )
>  
>  (define_insn "*aarch64_atomic_load<ALLX:mode>_rcpc_zext"
> -  [(set (match_operand:SD_HSDI 0 "register_operand" "=r")
> +  [(set (match_operand:SD_HSDI 0 "register_operand")
>      (zero_extend:SD_HSDI
>        (unspec_volatile:ALLX
> -        [(match_operand:ALLX 1 "aarch64_sync_memory_operand" "Q")
> +        [(match_operand:ALLX 1 "aarch64_rcpc_memory_operand")
>           (match_operand:SI 2 "const_int_operand")]                   ;; model
>         UNSPECV_LDAP)))]
>    "TARGET_RCPC && (<SD_HSDI:sizen> > <ALLX:sizen>)"
> -  "ldapr<ALLX:atomic_sfx>\t%w0, %1"
> +  {@ [ cons: =0 , 1   ; attrs: enable_ldapur ]
> +     [ r       , Q   ; any                  ] ldapr<ALLX:atomic_sfx>\t%w0, %1
> +     [ r       , Ust ; yes                  ] ldapur<ALLX:atomic_sfx>\t%w0, 
> %1
> +  }
>  )

In both of the patterns above, it would be good to keep the "[", ",", ";" and
"]" lined up.

OK with those changes, thanks.

Richard

Reply via email to