Peter, Kewen:

On Thu, 2023-05-25 at 13:28 +0800, Kewen.Lin wrote:
> on 2023/5/24 23:20, Carl Love wrote:
> > On Wed, 2023-05-24 at 13:32 +0800, Kewen.Lin wrote:
> > > on 2023/5/24 06:30, Peter Bergner wrote:
> > > > On 5/23/23 12:24 AM, Kewen.Lin wrote:
> > > > > on 2023/5/23 01:31, Carl Love wrote:
> > > > > > The builtins were requested for use in GLibC.  As of
> > > > > > version
> > > > > > 2.31 they
> > > > > > were added as inline asm.  They requested a builtin so the
> > > > > > asm
> > > > > > could be
> > > > > > removed.
> > > > > 
> > > > > So IMHO we also want the similar support for mffscrn, that is
> > > > > to
> > > > > make
> > > > > use of mffscrn and mffscrni on Power9 and later, but falls
> > > > > back
> > > > > to 
> > > > > __builtin_set_fpscr_rn + mffs similar on older platforms.
> > > > 
> > > > So __builtin_set_fpscr_rn everything we want (sets the RN bits)
> > > > and
> > > > uses mffscrn/mffscrni on P9 and later and uses older insns on
> > > > pre-
> > > > P9.
> > > > The only problem is we don't return the current FPSCR bits, as
> > > > the
> > > > bif
> > > > is defined to return void.
> > > 
> > > Yes.
> > > 
> > > > Crazy idea, but could we extend the built-in
> > > > with an overload that returns the FPSCR bits?  
> > > 
> > > So you agree that we should make this proposed new bif handle
> > > pre-P9
> > > just
> > > like some other existing bifs. :)  I think extending it is good
> > > and
> > > doable,
> > > but the only concern here is the bif name
> > > "__builtin_set_fpscr_rn",
> > > which
> > > matches the existing behavior (only set rounding) but doesn't
> > > match
> > > the
> > > proposed extending behavior (set rounding and get some env bits
> > > back).
> > > Maybe it's not a big deal if the documentation clarify it well.
> > 
> > Extending the builtin to pre Power 9 is straight forward and I
> > agree
> > would make good sense to do.
> > 
> > I am a bit concerned on how to extend __builtin_set_fpscr_rn to add
> > the
> > new functionality.  Peter suggests overloading the builtin to
> > either
> > return void or returns FPSCR bits.  It is my understanding that the
> > return value for a given builtin had to be the same, i.e. you can't
> > overload the return value. Maybe you can with Bill's new
> > infrastructure?  I recall having problems trying to overload the
> > return
> > value in the past and Bill said you couldn't do it.  I play with
> > this
> > and see if I can overload the return value.
> 
> Your understanding on that we fail to overload this for just
> different
> return types is correct.  But previously I interpreted the extending
> proposal as to extend
> 
>   void __builtin_set_fpscr_rn (int);
> 
> to 
> 
>   void __builtin_set_fpscr_rn (int, double*);
> 
> The related address taken and store here can be optimized out
> normally.

I don't think that is correct.   The current definition of the builtin
is:

     void __builtin_set_fpscr_rn (int);

The proposal by Peter was to change the return type to double, i.e.

     double __builtin_set_fpscr_rn (int);

Peter also said the following:

   The built-in machinery can see that the usage is expecting a return
   value or not and for the pre-P9 code, can skip generating the ending
   mffs if we don't want the return value.

Which I don't think we want.  The mffscrn and mffscrni instructions
return the contents of the control bits in the FPSCR, that is, bits
29:31 (DRN) and bits 56:63 (VE, OE, UE, ZE, XE, NI, RN), are placed
into the corresponding bits in register FRT. All other bits in register
FRT are set to 0.  

The instructions also updates the current RN field of the FPSCR with
the new RN supplied the second argument of the instruction.  So, the
instructions update the RN field just like the __builtin_set_fpscr_rn. 
So, we can use the existing __builtin_set_fpscr_rn to update the RN for
all ISAs, we just need to have __builtin_set_fpscr_rn always return a
double with the desired fields from the FPSCR (the current RN).  This
will then emulate the behavior of the mffscrn and mffscrni
instructions.  The current uses of __builtin_set_fpscr_rn will just
ignore the return value which is not a problem.  The return value can
be stored in the places were the user is currently using the inline asm
for the mffscrn and mffscrni instructions.

The __builtin_set_fpscr_rn builtin is currently using the mffscrn and
mffscrni on Power 9 and throwing away the result from the instruction. 
We just need to change __builtin_set_fpscr_rn to return the value
instead.  For the pre Power 9 code, the builtin will need to read the
full FPSCR, mask of the desired fields and return the fields.

So, there is no need for the builtin to have to determine if the user
is storing the result of the __builtin_set_fpscr_rn.  The RN bits will
always be updated by the __builtin_set_fpscr_rn builtin and the
existing fields of the FPSCR will always be returned by the builtin.

Please let me know if you agree.  I think I have this sorted out
correctly.  Thanks.

                        Carl 

Reply via email to