On Wed, 2023-11-29 at 10:23 +0800, Jiahao Xu wrote: > > 在 2023/11/29 上午10:08, Xi Ruoyao 写道: > > On Tue, 2023-11-28 at 11:29 +0800, Jiahao Xu wrote: > > > diff --git a/gcc/config/loongarch/predicates.md > > > b/gcc/config/loongarch/predicates.md > > > index f7796da10b2..9e9ce58cb53 100644 > > > --- a/gcc/config/loongarch/predicates.md > > > +++ b/gcc/config/loongarch/predicates.md > > > @@ -235,6 +235,10 @@ (define_predicate "reg_or_1_operand" > > > (ior (match_operand 0 "const_1_operand") > > > (match_operand 0 "register_operand"))) > > > > > > +(define_predicate "reg_or_vecotr_1_operand" > > "vector" instead of "vecotr". > > > > > + (ior (match_operand 0 "const_vector_1_operand") > > > + (match_operand 0 "register_operand"))) > > > +@opindex mrecip > > > +@item -mrecip > > > +This option enables use of the reciprocal estimate and reciprocal square > > > +root estimate instructions with additional Newton-Raphson steps to > > > increase > > > +precision instead of doing a divide or square root and divide for > > > +floating-point arguments. > > > +These instructions are generated only when > > > @option{-funsafe-math-optimizations} > > > +is enabled together with @option{-ffinite-math-only} and > > > +@option{-fno-trapping-math}. > > > +Note that while the throughput of the sequence is higher than the > > > throughput of > > > +the non-reciprocal instruction, the precision of the sequence can be > > > decreased > > > +by up to 2 ulp (i.e. the inverse of 1.0 equals 0.99999994). > > > + > > > +@opindex mrecip=opt > > We should document that using these options requires the target CPU to > > support the frecipe/frsqrte instructions. > > > I am currently improving this patch by adding option -mfrecipe to ensure > that the target CPU supports approximate instructions
You just need to add a line into gcc/config/loongarch/genopts/isa- evolution.in: 2 25 frecipe Support frecipe.{s/d} and frsqrte.{s/d} instuctions Then the -mfrecipe option will be added and can be tested with TARGET_FRECIPE in GCC code. -march=native will also detect it properly because the cpucfg info is included. Then just add OPTION_MASK_ISA_FRECIPE into ISA_BASE_LA64V110_FEATURES in loongarch- cpu.cc. And could we have a __builtin for scalar frecipe/frsqrte too? Then if the approximation is not OK for the entire program, but the programmer knows it's OK for some operations in a hot path, (s)he can code __builtin_loongarch_frecipe_d (x) for an acceleration. -- Xi Ruoyao <xry...@xry111.site> School of Aerospace Science and Technology, Xidian University