Hi James, > -----Original Message----- > From: James Greenhalgh [mailto:james.greenha...@arm.com] > Sent: Monday, January 11, 2016 5:24 PM > To: gcc-patches@gcc.gnu.org > Cc: n...@arm.com; marcus.shawcr...@arm.com; > richard.earns...@arm.com; Kumar, Venkataramanan; > philipp.toms...@theobroma-systems.com; pins...@gmail.com; > kyrylo.tkac...@arm.com; e.mene...@samsung.com > Subject: [Patch AArch64] Use software sqrt expansion always for -mlow- > precision-recip-sqrt > > > Hi, > > I'd like to switch the logic around in aarch64.c such that -mlow-precision- > recip-sqrt causes us to always emit the low-precision software expansion for > reciprocal square root. I have two reasons to do this; first is consistency > across -mcpu targets, second is enabling more -mcpu targets to use the flag > for peak tuning. > > I don't much like that the precision we use for -mlow-precision-recip-sqrt > differs between cores (and possibly compiler revisions). Yes, we're under - > ffast-math but I take this flag to mean the user explicitly wants the low- > precision expansion, and we should not diverge from that based on an > internal decision as to what is optimal for performance in the high-precision > case. I'd prefer to keep things as predictable as possible, and here that > means always emitting the low-precision expansion when asked. > > Judging by the comments in the thread proposing the reciprocal square root > optimisation, this will benefit all cores currently supported by GCC. > To be clear, we would still not expand in the high-precision case for any > cores > which do not explicitly ask for it. Currently that is Cortex-A57 and xgene, > though I will be proposing a patch to remove Cortex-A57 from that list > shortly. > > Which gives my second motivation for this patch. -mlow-precision-recip-sqrt > is intended as a tuning flag for situations where performance is more > important than precision, but the current logic requires setting an internal > flag which also changes the performance characteristics where high-precision > is needed. This conflates two decisions the target might want to make, and > reduces the applicability of an option targets might want to enable for > performance. In particular, I'd still like to see -mlow-precision-recip-sqrt > continue to emit the cheaper, low-precision sequence for floats under > Cortex-A57. > > Based on that reasoning, this patch makes the appropriate change to the > logic. I've checked with the current -mcpu values to ensure that behaviour > without -mlow-precision-recip-sqrt does not change, and that behaviour > with -mlow-precision-recip-sqrt is to emit the low precision sequences. > > I've also put this through bootstrap and test on aarch64-none-linux-gnu with > no issues. > > OK? > > Thanks, > James >
Yes I like enabling this optimization for all cpus target via -mlow-precision-recip-sqrt . If my understanding is correct for cortex-a57 we now need to use only -mlow-precision-recip-sqrt to emit software sqrt expansion? In the below code ---snip--- void aarch64_emit_swrsqrt (rtx dst, rtx src) { ............ ............ int iterations = double_mode ? 3 : 2; if (flag_mrecip_low_precision_sqrt) iterations--; ---snip--- Now cortex-a57 case we will always do 2 and 1 steps for double and float and 3 and 2 will never be used. Should we make it 2 and 1 as default? Or any target still needs to use 3 and 2. Ps: I remember reducing iterations benefited gromacs but caused some VE in other FP benchmarks. Regards, Venkat. > --- > 2015-12-10 James Greenhalgh <james.greenha...@arm.com> > > * config/aarch64/aarch64.c (use_rsqrt_p): Always use software > reciprocal sqrt for -mlow-precision-recip-sqrt.