On Fri, Dec 30, 2016 at 10:05 PM, Andrew Pinski <pins...@gmail.com> wrote:
> Hi,
>   Currently for the following function:
> int f(int a, int b)
> {
>   return a + (b <<7);
> }
>
> GCC produces:
> add     w0, w0, w1, lsl 7
> But for ThunderX 1, it is better if the instruction was split allowing
> better scheduling to happen in most cases, the latency is the same.  I
> get a small improvement in coremarks, ~1%.
>
> Currently the code does not take into account Arith_shift even though
> the comment:
>   /* Strip any extend, leave shifts behind as we will
>     cost them through mult_cost.  */
> Say it does not strip out the shift, aarch64_strip_extend does and has
> always has since the back-end was added to GCC.
>
> Once I fixed the code around aarch64_strip_extend, I got a regression
> for ThunderX 1 as some shifts/extends (left shifts <=4 and/or zero
> extends) are considered free so I needed to add a new tuning flag.
>
> Note I will get an even more improvement for ThunderX 2 CN99XX, but I
> have not measured it yet as I have not made the change to
> aarch64-cost-tables.h yet as I am waiting for approval of the renaming
> patch first before submitting any of the cost table changes.  Also I
> noticed this problem with this tuning first and then looked back at
> what I needed to do for ThunderX 1.
>
> OK?  Bootstrapped and tested on aarch64-linux-gnu without any
> regressions (both with and without --with-cpu=thunderx).

Ping?  This has been not reviewed for over 5 months now :(.

Thanks,
Andrew

>
> Thanks,
> Andrew
>
> ChangeLog:
> * config/aarch64/aarch64-cost-tables.h (thunderx_extra_costs):
> Increment Arith_shift and Arith_shift_reg by 1.
> * config/aarch64/aarch64-tuning-flags.def (easy_shift_extend): New tuning 
> flag.
> * config/aarch64/aarch64.c (thunderx_tunings): Enable
> AARCH64_EXTRA_TUNE_EASY_SHIFT_EXTEND.
> (aarch64_strip_extend): Add new argument and test for it.
> (aarch64_easy_mult_shift_p): New function.
> (aarch64_rtx_mult_cost): Call aarch64_easy_mult_shift_p and don't add
> a cost if it is true.
> Update calls to aarch64_strip_extend.
> (aarch64_rtx_costs): Update calls to aarch64_strip_extend.

Reply via email to