On Wed, 2023-11-29 at 10:23 +0800, Jiahao Xu wrote:
> 
> 在 2023/11/29 上午10:08, Xi Ruoyao 写道:
> > On Tue, 2023-11-28 at 11:29 +0800, Jiahao Xu wrote:
> > > diff --git a/gcc/config/loongarch/predicates.md
> > > b/gcc/config/loongarch/predicates.md
> > > index f7796da10b2..9e9ce58cb53 100644
> > > --- a/gcc/config/loongarch/predicates.md
> > > +++ b/gcc/config/loongarch/predicates.md
> > > @@ -235,6 +235,10 @@ (define_predicate "reg_or_1_operand"
> > >     (ior (match_operand 0 "const_1_operand")
> > >          (match_operand 0 "register_operand")))
> > >   
> > > +(define_predicate "reg_or_vecotr_1_operand"
> > "vector" instead of "vecotr".
> > 
> > > +  (ior (match_operand 0 "const_vector_1_operand")
> > > +       (match_operand 0 "register_operand")))
> > > +@opindex mrecip
> > > +@item -mrecip
> > > +This option enables use of the reciprocal estimate and reciprocal square
> > > +root estimate instructions with additional Newton-Raphson steps to 
> > > increase
> > > +precision instead of doing a divide or square root and divide for
> > > +floating-point arguments.
> > > +These instructions are generated only when 
> > > @option{-funsafe-math-optimizations}
> > > +is enabled together with @option{-ffinite-math-only} and
> > > +@option{-fno-trapping-math}.
> > > +Note that while the throughput of the sequence is higher than the 
> > > throughput of
> > > +the non-reciprocal instruction, the precision of the sequence can be 
> > > decreased
> > > +by up to 2 ulp (i.e. the inverse of 1.0 equals 0.99999994).
> > > +
> > > +@opindex mrecip=opt
> > We should document that using these options requires the target CPU to
> > support the frecipe/frsqrte instructions.
> > 
> I am currently improving this patch by adding option -mfrecipe to ensure 
> that the target CPU supports approximate instructions

You just need to add a line into gcc/config/loongarch/genopts/isa-
evolution.in:

2       25      frecipe         Support frecipe.{s/d} and frsqrte.{s/d} 
instuctions

Then the -mfrecipe option will be added and can be tested with
TARGET_FRECIPE in GCC code.  -march=native will also detect it properly
because the cpucfg info is included.  Then just add
OPTION_MASK_ISA_FRECIPE into ISA_BASE_LA64V110_FEATURES in loongarch-
cpu.cc.


And could we have a __builtin for scalar frecipe/frsqrte too?  Then if
the approximation is not OK for the entire program, but the programmer
knows it's OK for some operations in a hot path, (s)he can code
__builtin_loongarch_frecipe_d (x) for an acceleration.

-- 
Xi Ruoyao <xry...@xry111.site>
School of Aerospace Science and Technology, Xidian University

Reply via email to