On 18/06/2024 19:14, Siarhei Volkau wrote:
> If the address register is dead after load/store operation it looks
> beneficial to use LDMIA/STMIA instead of pair of LDR/STR instructions,
> at least if optimizing for size.
> 
> Changes v1 -> v2:
>  - switching to peephole2 approach
>  - added test case
> 
> gcc/ChangeLog:
> 
>         * config/arm/thumb1.md (peephole2 to rewrite DI/DF load): New.
>         (peephole2 to rewrite DI/DF store): New.
>         (thumb1_movdi_insn): Handle overlapped regs ldmia case.
>         (thumb_movdf_insn): Likewise.
> 
>       * config/arm/iterators.md (DIDF): New.
> 
> gcc/testsuite:
> 
>         * gcc.target/arm/thumb1-load-store-64bit.c: Add new test.
> 
> Signed-off-by: Siarhei Volkau <lis8...@gmail.com>
> ---
>  gcc/config/arm/iterators.md                   |  3 +++
>  gcc/config/arm/thumb1.md                      | 27 ++++++++++++++++++-
>  .../gcc.target/arm/thumb1-load-store-64bit.c  | 16 +++++++++++
>  3 files changed, 45 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
> 
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 8d066fcf05d..09046bff83b 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -50,6 +50,9 @@ (define_mode_iterator QHSD [QI HI SI DI])
>  ;; A list of the 32bit and 64bit integer modes
>  (define_mode_iterator SIDI [SI DI])
>  
> +;; A list of the 64bit modes for thumb1.
> +(define_mode_iterator DIDF [DI DF])
> +
>  ;; A list of atomic compare and swap success return modes
>  (define_mode_iterator CCSI [(CC_Z "TARGET_32BIT") (SI "TARGET_THUMB1")])
>  
> diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
> index d7074b43f60..ed4b706773a 100644
> --- a/gcc/config/arm/thumb1.md
> +++ b/gcc/config/arm/thumb1.md
> @@ -633,6 +633,8 @@ (define_insn "*thumb1_movdi_insn"
>        gcc_assert (TARGET_HAVE_MOVT);
>        return \"movw\\t%Q0, %L1\;movs\\tR0, #0\";
>      case 4:
> +      if (reg_overlap_mentioned_p (operands[0], operands[1]))
> +     return \"ldmia\\t%m1, {%0, %H0}\";

See below for why I don't think this is a case we need to consider here.

>        return \"ldmia\\t%1, {%0, %H0}\";
>      case 5:
>        return \"stmia\\t%0, {%1, %H1}\";
> @@ -966,6 +968,8 @@ (define_insn "*thumb_movdf_insn"
>       return \"adds\\t%0, %1, #0\;adds\\t%H0, %H1, #0\";
>        return \"adds\\t%H0, %H1, #0\;adds\\t%0, %1, #0\";
>      case 1:
> +      if (reg_overlap_mentioned_p (operands[0], operands[1]))
> +     return \"ldmia\\t%m1, {%0, %H0}\";
>        return \"ldmia\\t%1, {%0, %H0}\";
>      case 2:
>        return \"stmia\\t%0, {%1, %H1}\";
> @@ -2055,4 +2059,25 @@ (define_insn "thumb1_stack_protect_test_insn"
>     (set_attr "conds" "clob")
>     (set_attr "type" "multiple")]
>  )
> -
> +
> +;; match patterns usable by ldmia/stmia
> +(define_peephole2
> +  [(set (match_operand:DIDF 0 "low_register_operand" "")
> +     (mem:DIDF (match_operand:SI 1 "low_register_operand")))]
> +  "TARGET_THUMB1
> +   && (peep2_reg_dead_p (1, operands[1])
> +       || REGNO (operands[0]) + 1 == REGNO (operands[1]))"

I don't understand this second condition (partial overlap of the base address 
with the value loaded), what are you guarding against here?  The instruction 
specification says that there is no writeback if the base register has any 
overlap with any of the loaded registers, so really we should reject the 
peephole if that's true (we'd have invalid RTL as well as there would end up 
being two writes to the same register).

> +  [(set (match_dup 0)
> +     (mem:DIDF (post_inc:SI (match_dup 1))))]
> +  ""

This is not enough, unfortunately.  MEM() objects carry attributes about the 
memory accessed (alias sets, known alignment, etc) and these will not be 
propagated correctly if you rewrite the pattern this way.  The correct solution 
is to match the entire mem as operand1, then use change_address to rewrite 
that.  Something like:

     operands[1] = change_address (operands[1], VOIDmode,
                                   gen_rtx_POST_INC (SImode,
                                                     XEXP (operands[1], 0)));

> +)
> +
> +(define_peephole2
> +  [(set (mem:DIDF (match_operand:SI 1 "low_register_operand"))
> +     (match_operand:DIDF 0 "low_register_operand" ""))]
> +  "TARGET_THUMB1
> +   && peep2_reg_dead_p (1, operands[1])"
> +  [(set (mem:DIDF (post_inc:SI (match_dup 1)))
> +     (match_dup 0))]

There's a similar address rewriting issue here as well, but we also have to 
guard against the (unlikely) case where the base register is part of the value 
stored if it isn't the lowest numbered register in the list.  In that case the 
value stored would be undefined.  That is:

  stmia r0!, {r0, r1}

is ok, but

  stmia r1!, {r0, r1}

is not.

> +  ""
> +)
> diff --git a/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c 
> b/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
> new file mode 100644
> index 00000000000..167fa9ec876
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mthumb -Os" }  */
> +/* { dg-require-effective-target arm_thumb1_ok } */
> +
> +void copy_df(double *dst, const double *src)
> +{
> +    *dst = *src;
> +}
> +
> +void copy_di(unsigned long long *dst, const unsigned long long *src)
> +{
> +    *dst = *src;
> +}
> +
> +/* { dg-final { scan-assembler-times "ldmia\tr\[0-7\]" 2 } } */
> +/* { dg-final { scan-assembler-times "stmia\tr\[0-7\]!" 2 } } */

R.

Reply via email to