On Thu, Nov 10, 2016 at 1:04 AM, Kyrill Tkachov <kyrylo.tkac...@foss.arm.com> wrote: > Ping. > https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00040.html > > Andrew, do you have any objections to this version?
Not really. Thanks, Andrew > Thanks, > Kyrill > > On 01/11/16 15:21, Kyrill Tkachov wrote: >> >> >> On 31/10/16 11:54, Kyrill Tkachov wrote: >>> >>> >>> On 24/10/16 17:15, Andrew Pinski wrote: >>>> >>>> On Mon, Oct 24, 2016 at 7:27 AM, Kyrill Tkachov >>>> <kyrylo.tkac...@foss.arm.com> wrote: >>>>> >>>>> Hi all, >>>>> >>>>> When storing a 64-bit immediate that has equal bottom and top halves we >>>>> currently >>>>> synthesize the repeating 32-bit pattern twice and perform a single >>>>> X-store. >>>>> With this patch we synthesize the 32-bit pattern once into a W register >>>>> and >>>>> store >>>>> that twice using an STP. This reduces codesize bloat from synthesising >>>>> the >>>>> same >>>>> constant multiple times at the expense of converting a store to a >>>>> store-pair. >>>>> It will only trigger if we can save two or more instructions, so it >>>>> will >>>>> only transform: >>>>> mov x1, 49370 >>>>> movk x1, 0xc0da, lsl 32 >>>>> str x1, [x0] >>>>> >>>>> into: >>>>> >>>>> mov w1, 49370 >>>>> stp w1, w1, [x0] >>>>> >>>>> when optimising for -Os, whereas it will always transform a 4-insn >>>>> synthesis >>>>> sequence into a two-insn sequence + STP (see comments in the patch). >>>>> >>>>> This patch triggers already but will trigger more with the store >>>>> merging >>>>> pass >>>>> that I'm working on since that will generate more of these repeating >>>>> 64-bit >>>>> constants. >>>>> This helps improve codegen on 456.hmmer where store merging can >>>>> sometimes >>>>> create very >>>>> complex repeating constants and target-specific expand needs to break >>>>> them >>>>> down. >>>> >>>> >>>> Doing STP might be worse on ThunderX 1 than the mov/movk. Or this >>>> might cause an ICE with -mcpu=thunderx; I can't remember if the check >>>> for slow unaligned store pair word is with the pattern or not. >>> >>> >>> I can't get it to ICE with -mcpu=thunderx. >>> The restriction is just on the STP forming code in the sched-fusion hooks >>> AFAIK. >>> >>>> Basically the rule is >>>> 1) if 4 byte aligned, then it is better to do two str. >>>> 2) If 8 byte aligned, then doing stp is good >>>> 3) Otherwise it is better to do two str. >>> >>> >>> Ok, then I'll make the function just emit two stores and depend on the >>> sched-fusion >>> machinery to fuse them into an STP when appropriate since that has the >>> logic that >>> takes thunderx into account. >>> >> >> Here it is. >> I've confirmed that it emits to STRs for 4 byte aligned stores when >> -mtune=thunderx >> and still generates STP for the other tunings, though now sched-fusion is >> responsible for >> merging them, which is ok by me. >> >> Bootstrapped and tested on aarch64. >> Ok for trunk? >> >> Thanks, >> Kyril >> >> >> 2016-11-01 Kyrylo Tkachov <kyrylo.tkac...@arm.com> >> >> * config/aarch64/aarch64.md (mov<mode>): Call >> aarch64_split_dimode_const_store on DImode constant stores. >> * config/aarch64/aarch64-protos.h (aarch64_split_dimode_const_store): >> New prototype. >> * config/aarch64/aarch64.c (aarch64_split_dimode_const_store): New >> function. >> >> 2016-11-01 Kyrylo Tkachov <kyrylo.tkac...@arm.com> >> >> * gcc.target/aarch64/store_repeating_constant_1.c: New test. >> * gcc.target/aarch64/store_repeating_constant_2.c: Likewise. >> >>> >>> >>>> >>>> Thanks, >>>> Andrew >>>> >>>> >>>>> Bootstrapped and tested on aarch64-none-linux-gnu. >>>>> >>>>> Ok for trunk? >>>>> >>>>> Thanks, >>>>> Kyrill >>>>> >>>>> 2016-10-24 Kyrylo Tkachov <kyrylo.tkac...@arm.com> >>>>> >>>>> * config/aarch64/aarch64.md (mov<mode>): Call >>>>> aarch64_split_dimode_const_store on DImode constant stores. >>>>> * config/aarch64/aarch64-protos.h >>>>> (aarch64_split_dimode_const_store): >>>>> New prototype. >>>>> * config/aarch64/aarch64.c (aarch64_split_dimode_const_store): New >>>>> function. >>>>> >>>>> 2016-10-24 Kyrylo Tkachov <kyrylo.tkac...@arm.com> >>>>> >>>>> * gcc.target/aarch64/store_repeating_constant_1.c: New test. >>>>> * gcc.target/aarch64/store_repeating_constant_2.c: Likewise. >>> >>> >> >