On Fri, Dec 21, 2018 at 06:30:49AM -0600, Kyrill Tkachov wrote: > Hi all, > > Our movmem expansion currently emits TImode loads and stores when copying > 128-bit chunks. > This generates X-register LDP/STP sequences as these are the most preferred > registers for that mode. > > For the purpose of copying memory, however, we want to prefer Q-registers. > This uses one fewer register, so helping with register pressure. > It also allows merging of 256-bit and larger copies into Q-reg LDP/STP, > further helping code size. > > The implementation of that is easy: we just use a 128-bit vector mode > (V4SImode in this patch) > rather than a TImode. > > With this patch the testcase: > #define N 8 > int src[N], dst[N]; > > void > foo (void) > { > __builtin_memcpy (dst, src, N * sizeof (int)); > } > > generates: > foo: > adrp x1, src > add x1, x1, :lo12:src > adrp x0, dst > add x0, x0, :lo12:dst > ldp q1, q0, [x1] > stp q1, q0, [x0] > ret > > instead of: > foo: > adrp x1, src > add x1, x1, :lo12:src > adrp x0, dst > add x0, x0, :lo12:dst > ldp x2, x3, [x1] > stp x2, x3, [x0] > ldp x2, x3, [x1, 16] > stp x2, x3, [x0, 16] > ret > > Bootstrapped and tested on aarch64-none-linux-gnu. > I hope this is a small enough change for GCC 9. > One could argue that it is finishing up the work done this cycle to support > Q-register LDP/STPs > > I've seen this give about 1.8% on 541.leela_r on Cortex-A57 with other > changes in SPEC2017 in the noise > but there is reduction in code size everywhere (due to more LDP/STP-Q pairs > being formed) > > Ok for trunk?
I'm surprised by the logic. If we want to use 256-bit copies, shouldn't we be explicit about that in the movmem code, rather than using 128-bit copies that get merged. Why do TImode loads require two X registers? Shouldn't we just fix TImode loads to use Q registers if that is preferable? I'm not opposed to the principle of using LDP-Q in our movmem, but is this the best way to make that happen? Thanks, James > 2018-12-21 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > * config/aarch64/aarch64.c (aarch64_expand_movmem): Use V4SImode for > 128-bit moves. > > 2018-12-21 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > * gcc.target/aarch64/movmem-q-reg_1.c: New test.