On Thu, Nov 10, 2016 at 1:04 AM, Kyrill Tkachov
<kyrylo.tkac...@foss.arm.com> wrote:
> Ping.
> https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00040.html
>
> Andrew, do you have any objections to this version?

Not really.

Thanks,
Andrew

> Thanks,
> Kyrill
>
> On 01/11/16 15:21, Kyrill Tkachov wrote:
>>
>>
>> On 31/10/16 11:54, Kyrill Tkachov wrote:
>>>
>>>
>>> On 24/10/16 17:15, Andrew Pinski wrote:
>>>>
>>>> On Mon, Oct 24, 2016 at 7:27 AM, Kyrill Tkachov
>>>> <kyrylo.tkac...@foss.arm.com> wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> When storing a 64-bit immediate that has equal bottom and top halves we
>>>>> currently
>>>>> synthesize the repeating 32-bit pattern twice and perform a single
>>>>> X-store.
>>>>> With this patch we synthesize the 32-bit pattern once into a W register
>>>>> and
>>>>> store
>>>>> that twice using an STP. This reduces codesize bloat from synthesising
>>>>> the
>>>>> same
>>>>> constant multiple times at the expense of converting a store to a
>>>>> store-pair.
>>>>> It will only trigger if we can save two or more instructions, so it
>>>>> will
>>>>> only transform:
>>>>>          mov     x1, 49370
>>>>>          movk    x1, 0xc0da, lsl 32
>>>>>          str     x1, [x0]
>>>>>
>>>>> into:
>>>>>
>>>>>          mov     w1, 49370
>>>>>          stp     w1, w1, [x0]
>>>>>
>>>>> when optimising for -Os, whereas it will always transform a 4-insn
>>>>> synthesis
>>>>> sequence into a two-insn sequence + STP (see comments in the patch).
>>>>>
>>>>> This patch triggers already but will trigger more with the store
>>>>> merging
>>>>> pass
>>>>> that I'm working on since that will generate more of these repeating
>>>>> 64-bit
>>>>> constants.
>>>>> This helps improve codegen on 456.hmmer where store merging can
>>>>> sometimes
>>>>> create very
>>>>> complex repeating constants and target-specific expand needs to break
>>>>> them
>>>>> down.
>>>>
>>>>
>>>> Doing STP might be worse on ThunderX 1 than the mov/movk.  Or this
>>>> might cause an ICE with -mcpu=thunderx; I can't remember if the check
>>>> for slow unaligned store pair word is with the pattern or not.
>>>
>>>
>>> I can't get it to ICE with -mcpu=thunderx.
>>> The restriction is just on the STP forming code in the sched-fusion hooks
>>> AFAIK.
>>>
>>>> Basically the rule is
>>>> 1) if 4 byte aligned, then it is better to do two str.
>>>> 2) If 8 byte aligned, then doing stp is good
>>>> 3) Otherwise it is better to do two str.
>>>
>>>
>>> Ok, then I'll make the function just emit two stores and depend on the
>>> sched-fusion
>>> machinery to fuse them into an STP when appropriate since that has the
>>> logic that
>>> takes thunderx into account.
>>>
>>
>> Here it is.
>> I've confirmed that it emits to STRs for 4 byte aligned stores when
>> -mtune=thunderx
>> and still generates STP for the other tunings, though now sched-fusion is
>> responsible for
>> merging them, which is ok by me.
>>
>> Bootstrapped and tested on aarch64.
>> Ok for trunk?
>>
>> Thanks,
>> Kyril
>>
>>
>> 2016-11-01  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
>>
>>     * config/aarch64/aarch64.md (mov<mode>): Call
>>     aarch64_split_dimode_const_store on DImode constant stores.
>>     * config/aarch64/aarch64-protos.h (aarch64_split_dimode_const_store):
>>     New prototype.
>>     * config/aarch64/aarch64.c (aarch64_split_dimode_const_store): New
>>     function.
>>
>> 2016-11-01  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
>>
>>     * gcc.target/aarch64/store_repeating_constant_1.c: New test.
>>     * gcc.target/aarch64/store_repeating_constant_2.c: Likewise.
>>
>>>
>>>
>>>>
>>>> Thanks,
>>>> Andrew
>>>>
>>>>
>>>>> Bootstrapped and tested on aarch64-none-linux-gnu.
>>>>>
>>>>> Ok for trunk?
>>>>>
>>>>> Thanks,
>>>>> Kyrill
>>>>>
>>>>> 2016-10-24  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
>>>>>
>>>>>      * config/aarch64/aarch64.md (mov<mode>): Call
>>>>>      aarch64_split_dimode_const_store on DImode constant stores.
>>>>>      * config/aarch64/aarch64-protos.h
>>>>> (aarch64_split_dimode_const_store):
>>>>>      New prototype.
>>>>>      * config/aarch64/aarch64.c (aarch64_split_dimode_const_store): New
>>>>>      function.
>>>>>
>>>>> 2016-10-24  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
>>>>>
>>>>>      * gcc.target/aarch64/store_repeating_constant_1.c: New test.
>>>>>      * gcc.target/aarch64/store_repeating_constant_2.c: Likewise.
>>>
>>>
>>
>

Reply via email to