Hi! On Thu, Apr 06, 2023 at 11:12:11AM -0400, Michael Meissner wrote: > The Altivec instructions fmaddfp and fnmsubfp have different rounding > behaviors
Those are not existing instructions. You mean "vmaddfp" etc. > than the VSX xvmaddsp and xvnmsubsp instructions. In particular, generating > these instructions seems to break Eigen. Those instructions use round-to-nearest-tiea-to-even, like all other VMX FP insns. A proper patch has to deal with all VMX FP insns. But, almost all programs expect that rounding mode anyway, so this is not a problem in practice. What happened on Eigen is that the Linux kernel starts every new process with VSCR[NJ]=1, breaking pretty much everything that wants floating point for non-toy purposes. (There currently is a bug on LE that sets the wrong bit, hiding the problem in that configuration, but it is intended there as well). > GCC has generated the Altivec fmaddfp and fnmsubfp instructions on VSX systems > as an alternative to the xsmadd{a,m}sp and xsnmsub{a,m}sp instructions. The > advantage of the Altivec instructions is that they are 4 operand instructions > (i.e. the target register does not have to overlap with one of the input > registers). The advantage is it can eliminate an extra move instruction. The > disadvantage is it does round the same was as the VSX instructions. And it gets the VSCR[NJ] setting applied. Yup. > This patch eliminates the generation of the Altivec fmaddfp and fnmsubfp > instructions as alternatives in the VSX instruction insn support, and in the > Altivec insns it adds a test to prevent the insn from being used if VSX is > available. I also added a test to the regression test suite. Please leave the latter out, it does not belong in this patch. If you want a patch to do that deal with *all* VMX FP insns? There also are add, sub, mul, etc. Well I think those (as well as madd and nmsub) are the only ones that use the NJ bit or the RN bits, but please check. > --- a/gcc/config/rs6000/altivec.md > +++ b/gcc/config/rs6000/altivec.md > @@ -750,12 +750,15 @@ (define_insn "altivec_vsel<mode>4" > > ;; Fused multiply add. > > +;; If we are using VSX instructions, do not generate the vmaddfp instruction > +;; since is has different rounding behavior than the xvmaddsp instruction. > + No blank lines please. > (define_insn "*altivec_fmav4sf4" > [(set (match_operand:V4SF 0 "register_operand" "=v") > (fma:V4SF (match_operand:V4SF 1 "register_operand" "v") > (match_operand:V4SF 2 "register_operand" "v") > (match_operand:V4SF 3 "register_operand" "v")))] > - "VECTOR_UNIT_ALTIVEC_P (V4SFmode)" > + "VECTOR_UNIT_ALTIVEC_P (V4SFmode) && !TARGET_VSX" This is very error-prone. Maybe add a test to the VECTOR_UNIT_ALTIVEC macro instead? > -;; Fused vector multiply/add instructions. Support the classical Altivec > -;; versions of fma, which allows the target to be a separate register from > the > -;; 3 inputs. Under VSX, the target must be either the addend or the first > -;; multiply. > +;; Fused vector multiply/add instructions. Do not use the classical Altivec (Two spaces after dot, and AltiVec is spelled with a capital V. I don't like it either, VMX is a much nicer and more regular name). > +;; versions of fma. Those instructions allows the target to be a separate > +;; register from the 3 inputs, but they have different rounding behaviors. > > (define_insn "*vsx_fmav4sf4" > - [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v") > + [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa") > (fma:V4SF > - (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v") > - (match_operand:V4SF 2 "vsx_register_operand" "wa,0,v") > - (match_operand:V4SF 3 "vsx_register_operand" "0,wa,v")))] > + (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa") > + (match_operand:V4SF 2 "vsx_register_operand" "wa,0") > + (match_operand:V4SF 3 "vsx_register_operand" "0,wa")))] > "VECTOR_UNIT_VSX_P (V4SFmode)" > "@ > xvmaddasp %x0,%x1,%x2 > - xvmaddmsp %x0,%x1,%x3 > - vmaddfp %0,%1,%2,%3" > + xvmaddmsp %x0,%x1,%x3" > [(set_attr "type" "vecfloat")]) So this part looks okay, and it alone is safe for GCC 13 as well. > (define_insn "*vsx_nfmsv4sf4" > - [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v") > + [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa") > (neg:V4SF > (fma:V4SF > - (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v") > - (match_operand:V4SF 2 "vsx_register_operand" "wa,0,v") > + (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa") > + (match_operand:V4SF 2 "vsx_register_operand" "wa,0") > (neg:V4SF > - (match_operand:V4SF 3 "vsx_register_operand" "0,wa,v")))))] > + (match_operand:V4SF 3 "vsx_register_operand" "0,wa")))))] > "VECTOR_UNIT_VSX_P (V4SFmode)" > "@ > xvnmsubasp %x0,%x1,%x2 > - xvnmsubmsp %x0,%x1,%x3 > - vnmsubfp %0,%1,%2,%3" > + xvnmsubmsp %x0,%x1,%x3" > [(set_attr "type" "vecfloat")]) Well, together with this of course :-) Could you please do that? Segher