On 31/10/16 13:42, Richard Earnshaw (lists) wrote:
On 31/10/16 11:54, Kyrill Tkachov wrote:
On 24/10/16 17:15, Andrew Pinski wrote:
On Mon, Oct 24, 2016 at 7:27 AM, Kyrill Tkachov
<kyrylo.tkac...@foss.arm.com> wrote:
Hi all,

When storing a 64-bit immediate that has equal bottom and top halves we
currently
synthesize the repeating 32-bit pattern twice and perform a single
X-store.
With this patch we synthesize the 32-bit pattern once into a W
register and
store
that twice using an STP. This reduces codesize bloat from
synthesising the
same
constant multiple times at the expense of converting a store to a
store-pair.
It will only trigger if we can save two or more instructions, so it will
only transform:
          mov     x1, 49370
          movk    x1, 0xc0da, lsl 32
          str     x1, [x0]

into:

          mov     w1, 49370
          stp     w1, w1, [x0]

when optimising for -Os, whereas it will always transform a 4-insn
synthesis
sequence into a two-insn sequence + STP (see comments in the patch).

This patch triggers already but will trigger more with the store merging
pass
that I'm working on since that will generate more of these repeating
64-bit
constants.
This helps improve codegen on 456.hmmer where store merging can
sometimes
create very
complex repeating constants and target-specific expand needs to break
them
down.
Doing STP might be worse on ThunderX 1 than the mov/movk.  Or this
might cause an ICE with -mcpu=thunderx; I can't remember if the check
for slow unaligned store pair word is with the pattern or not.
I can't get it to ICE with -mcpu=thunderx.
The restriction is just on the STP forming code in the sched-fusion
hooks AFAIK.

Basically the rule is
1) if 4 byte aligned, then it is better to do two str.
2) If 8 byte aligned, then doing stp is good
3) Otherwise it is better to do two str.
Ok, then I'll make the function just emit two stores and depend on the
sched-fusion
machinery to fuse them into an STP when appropriate since that has the
logic that
takes thunderx into account.
If the mode is DImode (ie the pattern is 'movdi', then surely we must
have a 64-bit aligned store.

I don't think that's guaranteed. At least the Standard Names documentation
doesn't mention it. I've gone through an example with gdb where store merging
produces an unaligned store and the gen_movdi expander is called with a source
of (const_int 8589934593 [0x200000001]) and a destination of:
(mem:DI (reg/v/f:DI 73 [ a ]) [1 MEM[(int *)a_2(D)]+0 S8 A32])

i.e. a 32-bit aligned DImode MEM.

Thanks,
Kyrill

R.

Thanks for the info.
Kyrill

Thanks,
Andrew


Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?

Thanks,
Kyrill

2016-10-24  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>

      * config/aarch64/aarch64.md (mov<mode>): Call
      aarch64_split_dimode_const_store on DImode constant stores.
      * config/aarch64/aarch64-protos.h
(aarch64_split_dimode_const_store):
      New prototype.
      * config/aarch64/aarch64.c (aarch64_split_dimode_const_store): New
      function.

2016-10-24  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>

      * gcc.target/aarch64/store_repeating_constant_1.c: New test.
      * gcc.target/aarch64/store_repeating_constant_2.c: Likewise.

Reply via email to