On Wed, Jun 7, 2017 at 10:16 AM, James Greenhalgh
<james.greenha...@arm.com> wrote:
> On Fri, Dec 30, 2016 at 10:05:26PM -0800, Andrew Pinski wrote:
>> Hi,
>>   Currently for the following function:
>> int f(int a, int b)
>> {
>>   return a + (b <<7);
>> }
>>
>> GCC produces:
>> add     w0, w0, w1, lsl 7
>> But for ThunderX 1, it is better if the instruction was split allowing
>> better scheduling to happen in most cases, the latency is the same.  I
>> get a small improvement in coremarks, ~1%.
>>
>> Currently the code does not take into account Arith_shift even though
>> the comment:
>>   /* Strip any extend, leave shifts behind as we will
>>     cost them through mult_cost.  */
>> Say it does not strip out the shift, aarch64_strip_extend does and has
>> always has since the back-end was added to GCC.
>>
>> Once I fixed the code around aarch64_strip_extend, I got a regression
>> for ThunderX 1 as some shifts/extends (left shifts <=4 and/or zero
>> extends) are considered free so I needed to add a new tuning flag.
>>
>> Note I will get an even more improvement for ThunderX 2 CN99XX, but I
>> have not measured it yet as I have not made the change to
>> aarch64-cost-tables.h yet as I am waiting for approval of the renaming
>> patch first before submitting any of the cost table changes.  Also I
>> noticed this problem with this tuning first and then looked back at
>> what I needed to do for ThunderX 1.
>>
>> OK?  Bootstrapped and tested on aarch64-linux-gnu without any
>> regressions (both with and without --with-cpu=thunderx).
>
> This is mostly OK, but I don't like the name "easy"_shift_extend. Cheap
> or free seems better. I have some other minor points below.


Ok, that seems like a good idea.  I used easy since that was the
wording our hardware folks had came up with.  I am changing the
comments to make clearer when this flag should be used.
I should a new patch out by the end of today.

Thanks,
Andrew


>
>> Index: config/aarch64/aarch64-tuning-flags.def
>> ===================================================================
>> --- config/aarch64/aarch64-tuning-flags.def   (revision 243974)
>> +++ config/aarch64/aarch64-tuning-flags.def   (working copy)
>> @@ -35,4 +35,8 @@ two load/stores are not at least 8 byte
>>  pairs.   */
>>  AARCH64_EXTRA_TUNING_OPTION ("slow_unaligned_ldpw", SLOW_UNALIGNED_LDPW)
>>
>> +/* Logical shift left <=4 with/without zero extend are considered easy
>> +   extended, also zero extends without the shift. */
>
>
> I'm struggling to parse this comment. "also zero extends without the shift"
> is what is getting me. I'm also not certain I follow when I should set this
> flag. If all shifts are cheap/free on my platform, should I set this flag?
>
>> +AARCH64_EXTRA_TUNING_OPTION ("easy_shift_extend", EASY_SHIFT_EXTEND)
>> +
>>  #undef AARCH64_EXTRA_TUNING_OPTION
>
>
>> +
>> +/* Return true iff X is an easy shift without a sign extend. */
>> +
>
> Again I don't like calling <= 4 "easy", it feels imprecise.
>
> Thanks,
> James
>

Reply via email to