On Fri, Dec 30, 2016 at 10:05 PM, Andrew Pinski <pins...@gmail.com> wrote: > Hi, > Currently for the following function: > int f(int a, int b) > { > return a + (b <<7); > } > > GCC produces: > add w0, w0, w1, lsl 7 > But for ThunderX 1, it is better if the instruction was split allowing > better scheduling to happen in most cases, the latency is the same. I > get a small improvement in coremarks, ~1%. > > Currently the code does not take into account Arith_shift even though > the comment: > /* Strip any extend, leave shifts behind as we will > cost them through mult_cost. */ > Say it does not strip out the shift, aarch64_strip_extend does and has > always has since the back-end was added to GCC. > > Once I fixed the code around aarch64_strip_extend, I got a regression > for ThunderX 1 as some shifts/extends (left shifts <=4 and/or zero > extends) are considered free so I needed to add a new tuning flag. > > Note I will get an even more improvement for ThunderX 2 CN99XX, but I > have not measured it yet as I have not made the change to > aarch64-cost-tables.h yet as I am waiting for approval of the renaming > patch first before submitting any of the cost table changes. Also I > noticed this problem with this tuning first and then looked back at > what I needed to do for ThunderX 1. > > OK? Bootstrapped and tested on aarch64-linux-gnu without any > regressions (both with and without --with-cpu=thunderx).
Ping? This has been not reviewed for over 5 months now :(. Thanks, Andrew > > Thanks, > Andrew > > ChangeLog: > * config/aarch64/aarch64-cost-tables.h (thunderx_extra_costs): > Increment Arith_shift and Arith_shift_reg by 1. > * config/aarch64/aarch64-tuning-flags.def (easy_shift_extend): New tuning > flag. > * config/aarch64/aarch64.c (thunderx_tunings): Enable > AARCH64_EXTRA_TUNE_EASY_SHIFT_EXTEND. > (aarch64_strip_extend): Add new argument and test for it. > (aarch64_easy_mult_shift_p): New function. > (aarch64_rtx_mult_cost): Call aarch64_easy_mult_shift_p and don't add > a cost if it is true. > Update calls to aarch64_strip_extend. > (aarch64_rtx_costs): Update calls to aarch64_strip_extend.