On Thu, Apr 06, 2023 at 03:37:59PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Apr 06, 2023 at 11:12:11AM -0400, Michael Meissner wrote:
> > The Altivec instructions fmaddfp and fnmsubfp have different rounding 
> > behaviors
> 
> Those are not existing instructions.  You mean "vmaddfp" etc.

Yes, sorry about that.  I guess I was thinking about the scalar instructions.

> > than the VSX xvmaddsp and xvnmsubsp instructions.  In particular, generating
> > these instructions seems to break Eigen.
> 
> Those instructions use round-to-nearest-tiea-to-even, like all other
> VMX FP insns.  A proper patch has to deal with all VMX FP insns.  But,
> almost all programs expect that rounding mode anyway, so this is not a
> problem in practice.  What happened on Eigen is that the Linux kernel
> starts every new process with VSCR[NJ]=1, breaking pretty much
> everything that wants floating point for non-toy purposes.  (There
> currently is a bug on LE that sets the wrong bit, hiding the problem in
> that configuration, but it is intended there as well).
> 
> > GCC has generated the Altivec fmaddfp and fnmsubfp instructions on VSX 
> > systems
> > as an alternative to the xsmadd{a,m}sp and xsnmsub{a,m}sp instructions.  The
> > advantage  of the Altivec instructions is that they are 4 operand 
> > instructions
> > (i.e. the target register does not have to overlap with one of the input
> > registers).  The advantage is it can eliminate an extra move instruction.  
> > The
> > disadvantage is it does round the same was as the VSX instructions.
> 
> And it gets the VSCR[NJ] setting applied.  Yup.
> 
> > This patch eliminates the generation of the Altivec fmaddfp and fnmsubfp
> > instructions as alternatives in the VSX instruction insn support, and in the
> > Altivec insns it adds a test to prevent the insn from being used if VSX is
> > available.  I also added a test to the regression test suite.
> 
> Please leave the latter out, it does not belong in this patch.  If you
> want a patch to do that deal with *all* VMX FP insns?  There also are
> add, sub, mul, etc.  Well I think those (as well as madd and nmsub) are
> the only ones that use the NJ bit or the RN bits, but please check.

After I posted the patch I refreshed my memory of the VECTOR_UNIT_ALTIVEC_P
macro and it is not true if VSX code generation is enabled.  So I dropped the
changes to altivec.md.

In addition, as far as I know, the only AltiVec (VMX) floating point
instructions generated when VSX is used are the vmaddfp and vnmsubfp
instructions.  In the case of add and subtract, xvaddsp and xvsubsp is more
general than vaddfp or vsubfp since it can access all VSX registers.  VMX does
not have a stand-alone multiply (it generates FMA with a zero register) and it
does not have a division operation.  And VMX does not have xvmsub{a,m}sp nor
xvnadd{a,m}sp variations of the FMA instructions.

> > --- a/gcc/config/rs6000/altivec.md
> > +++ b/gcc/config/rs6000/altivec.md
> > @@ -750,12 +750,15 @@ (define_insn "altivec_vsel<mode>4"
> >  
> >  ;; Fused multiply add.
> >  
> > +;; If we are using VSX instructions, do not generate the vmaddfp 
> > instruction
> > +;; since is has different rounding behavior than the xvmaddsp instruction.
> > +
> 
> No blank lines please.

Ok.

> >  (define_insn "*altivec_fmav4sf4"
> >    [(set (match_operand:V4SF 0 "register_operand" "=v")
> >     (fma:V4SF (match_operand:V4SF 1 "register_operand" "v")
> >               (match_operand:V4SF 2 "register_operand" "v")
> >               (match_operand:V4SF 3 "register_operand" "v")))]
> > -  "VECTOR_UNIT_ALTIVEC_P (V4SFmode)"
> > +  "VECTOR_UNIT_ALTIVEC_P (V4SFmode) && !TARGET_VSX"
> 
> This is very error-prone.  Maybe add a test to the VECTOR_UNIT_ALTIVEC
> macro instead?

As I said that part of the code is not in the next patch.

> > -;; Fused vector multiply/add instructions. Support the classical Altivec
> > -;; versions of fma, which allows the target to be a separate register from 
> > the
> > -;; 3 inputs.  Under VSX, the target must be either the addend or the first
> > -;; multiply.
> > +;; Fused vector multiply/add instructions. Do not use the classical Altivec

> (Two spaces after dot, and AltiVec is spelled with a capital V.  I don't
> like it either, VMX is a much nicer and more regular name).

When the name might be more regular, but in terms of the instruction set, it
does have holes that I mentioned above (no multiply that is not a FMA, two of
the four FMA variants are not provided).

> > +;; versions of fma.  Those instructions allows the target to be a separate
> > +;; register from the 3 inputs, but they have different rounding behaviors.
> >  
> >  (define_insn "*vsx_fmav4sf4"
> > -  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v")
> > +  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
> >     (fma:V4SF
> > -     (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v")
> > -     (match_operand:V4SF 2 "vsx_register_operand" "wa,0,v")
> > -     (match_operand:V4SF 3 "vsx_register_operand" "0,wa,v")))]
> > +     (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa")
> > +     (match_operand:V4SF 2 "vsx_register_operand" "wa,0")
> > +     (match_operand:V4SF 3 "vsx_register_operand" "0,wa")))]
> >    "VECTOR_UNIT_VSX_P (V4SFmode)"
> >    "@
> >     xvmaddasp %x0,%x1,%x2
> > -   xvmaddmsp %x0,%x1,%x3
> > -   vmaddfp %0,%1,%2,%3"
> > +   xvmaddmsp %x0,%x1,%x3"
> >    [(set_attr "type" "vecfloat")])
> 
> So this part looks okay, and it alone is safe for GCC 13 as well.

Well as we were discussing on a private channel, it is desirable to generate
vmaddfp and vnmsubfp if -Ofast is used, so the next patch incorporates that
change.  It will consider generating those instructions if -Ofast is used, and
if not, it will only generate VSX instructions.

> >  (define_insn "*vsx_nfmsv4sf4"
> > -  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v")
> > +  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
> >     (neg:V4SF
> >      (fma:V4SF
> > -      (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v")
> > -      (match_operand:V4SF 2 "vsx_register_operand" "wa,0,v")
> > +      (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa")
> > +      (match_operand:V4SF 2 "vsx_register_operand" "wa,0")
> >        (neg:V4SF
> > -        (match_operand:V4SF 3 "vsx_register_operand" "0,wa,v")))))]
> > +        (match_operand:V4SF 3 "vsx_register_operand" "0,wa")))))]
> >    "VECTOR_UNIT_VSX_P (V4SFmode)"
> >    "@
> >     xvnmsubasp %x0,%x1,%x2
> > -   xvnmsubmsp %x0,%x1,%x3
> > -   vnmsubfp %0,%1,%2,%3"
> > +   xvnmsubmsp %x0,%x1,%x3"
> >    [(set_attr "type" "vecfloat")])
> 
> Well, together with this of course :-)
> 
> Could you please do that?

I will submit the patch separately.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Reply via email to