On Wed, Jun 7, 2017 at 10:16 AM, James Greenhalgh <james.greenha...@arm.com> wrote: > On Fri, Dec 30, 2016 at 10:05:26PM -0800, Andrew Pinski wrote: >> Hi, >> Currently for the following function: >> int f(int a, int b) >> { >> return a + (b <<7); >> } >> >> GCC produces: >> add w0, w0, w1, lsl 7 >> But for ThunderX 1, it is better if the instruction was split allowing >> better scheduling to happen in most cases, the latency is the same. I >> get a small improvement in coremarks, ~1%. >> >> Currently the code does not take into account Arith_shift even though >> the comment: >> /* Strip any extend, leave shifts behind as we will >> cost them through mult_cost. */ >> Say it does not strip out the shift, aarch64_strip_extend does and has >> always has since the back-end was added to GCC. >> >> Once I fixed the code around aarch64_strip_extend, I got a regression >> for ThunderX 1 as some shifts/extends (left shifts <=4 and/or zero >> extends) are considered free so I needed to add a new tuning flag. >> >> Note I will get an even more improvement for ThunderX 2 CN99XX, but I >> have not measured it yet as I have not made the change to >> aarch64-cost-tables.h yet as I am waiting for approval of the renaming >> patch first before submitting any of the cost table changes. Also I >> noticed this problem with this tuning first and then looked back at >> what I needed to do for ThunderX 1. >> >> OK? Bootstrapped and tested on aarch64-linux-gnu without any >> regressions (both with and without --with-cpu=thunderx). > > This is mostly OK, but I don't like the name "easy"_shift_extend. Cheap > or free seems better. I have some other minor points below.
Ok, that seems like a good idea. I used easy since that was the wording our hardware folks had came up with. I am changing the comments to make clearer when this flag should be used. I should a new patch out by the end of today. Thanks, Andrew > >> Index: config/aarch64/aarch64-tuning-flags.def >> =================================================================== >> --- config/aarch64/aarch64-tuning-flags.def (revision 243974) >> +++ config/aarch64/aarch64-tuning-flags.def (working copy) >> @@ -35,4 +35,8 @@ two load/stores are not at least 8 byte >> pairs. */ >> AARCH64_EXTRA_TUNING_OPTION ("slow_unaligned_ldpw", SLOW_UNALIGNED_LDPW) >> >> +/* Logical shift left <=4 with/without zero extend are considered easy >> + extended, also zero extends without the shift. */ > > > I'm struggling to parse this comment. "also zero extends without the shift" > is what is getting me. I'm also not certain I follow when I should set this > flag. If all shifts are cheap/free on my platform, should I set this flag? > >> +AARCH64_EXTRA_TUNING_OPTION ("easy_shift_extend", EASY_SHIFT_EXTEND) >> + >> #undef AARCH64_EXTRA_TUNING_OPTION > > >> + >> +/* Return true iff X is an easy shift without a sign extend. */ >> + > > Again I don't like calling <= 4 "easy", it feels imprecise. > > Thanks, > James >