Hi!

On Thu, Apr 06, 2023 at 11:12:11AM -0400, Michael Meissner wrote:
> The Altivec instructions fmaddfp and fnmsubfp have different rounding 
> behaviors

Those are not existing instructions.  You mean "vmaddfp" etc.

> than the VSX xvmaddsp and xvnmsubsp instructions.  In particular, generating
> these instructions seems to break Eigen.

Those instructions use round-to-nearest-tiea-to-even, like all other
VMX FP insns.  A proper patch has to deal with all VMX FP insns.  But,
almost all programs expect that rounding mode anyway, so this is not a
problem in practice.  What happened on Eigen is that the Linux kernel
starts every new process with VSCR[NJ]=1, breaking pretty much
everything that wants floating point for non-toy purposes.  (There
currently is a bug on LE that sets the wrong bit, hiding the problem in
that configuration, but it is intended there as well).

> GCC has generated the Altivec fmaddfp and fnmsubfp instructions on VSX systems
> as an alternative to the xsmadd{a,m}sp and xsnmsub{a,m}sp instructions.  The
> advantage  of the Altivec instructions is that they are 4 operand instructions
> (i.e. the target register does not have to overlap with one of the input
> registers).  The advantage is it can eliminate an extra move instruction.  The
> disadvantage is it does round the same was as the VSX instructions.

And it gets the VSCR[NJ] setting applied.  Yup.

> This patch eliminates the generation of the Altivec fmaddfp and fnmsubfp
> instructions as alternatives in the VSX instruction insn support, and in the
> Altivec insns it adds a test to prevent the insn from being used if VSX is
> available.  I also added a test to the regression test suite.

Please leave the latter out, it does not belong in this patch.  If you
want a patch to do that deal with *all* VMX FP insns?  There also are
add, sub, mul, etc.  Well I think those (as well as madd and nmsub) are
the only ones that use the NJ bit or the RN bits, but please check.

> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -750,12 +750,15 @@ (define_insn "altivec_vsel<mode>4"
>  
>  ;; Fused multiply add.
>  
> +;; If we are using VSX instructions, do not generate the vmaddfp instruction
> +;; since is has different rounding behavior than the xvmaddsp instruction.
> +

No blank lines please.

>  (define_insn "*altivec_fmav4sf4"
>    [(set (match_operand:V4SF 0 "register_operand" "=v")
>       (fma:V4SF (match_operand:V4SF 1 "register_operand" "v")
>                 (match_operand:V4SF 2 "register_operand" "v")
>                 (match_operand:V4SF 3 "register_operand" "v")))]
> -  "VECTOR_UNIT_ALTIVEC_P (V4SFmode)"
> +  "VECTOR_UNIT_ALTIVEC_P (V4SFmode) && !TARGET_VSX"

This is very error-prone.  Maybe add a test to the VECTOR_UNIT_ALTIVEC
macro instead?

> -;; Fused vector multiply/add instructions. Support the classical Altivec
> -;; versions of fma, which allows the target to be a separate register from 
> the
> -;; 3 inputs.  Under VSX, the target must be either the addend or the first
> -;; multiply.
> +;; Fused vector multiply/add instructions. Do not use the classical Altivec

(Two spaces after dot, and AltiVec is spelled with a capital V.  I don't
like it either, VMX is a much nicer and more regular name).

> +;; versions of fma.  Those instructions allows the target to be a separate
> +;; register from the 3 inputs, but they have different rounding behaviors.
>  
>  (define_insn "*vsx_fmav4sf4"
> -  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v")
> +  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
>       (fma:V4SF
> -       (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v")
> -       (match_operand:V4SF 2 "vsx_register_operand" "wa,0,v")
> -       (match_operand:V4SF 3 "vsx_register_operand" "0,wa,v")))]
> +       (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa")
> +       (match_operand:V4SF 2 "vsx_register_operand" "wa,0")
> +       (match_operand:V4SF 3 "vsx_register_operand" "0,wa")))]
>    "VECTOR_UNIT_VSX_P (V4SFmode)"
>    "@
>     xvmaddasp %x0,%x1,%x2
> -   xvmaddmsp %x0,%x1,%x3
> -   vmaddfp %0,%1,%2,%3"
> +   xvmaddmsp %x0,%x1,%x3"
>    [(set_attr "type" "vecfloat")])

So this part looks okay, and it alone is safe for GCC 13 as well.

>  (define_insn "*vsx_nfmsv4sf4"
> -  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v")
> +  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
>       (neg:V4SF
>        (fma:V4SF
> -        (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v")
> -        (match_operand:V4SF 2 "vsx_register_operand" "wa,0,v")
> +        (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa")
> +        (match_operand:V4SF 2 "vsx_register_operand" "wa,0")
>          (neg:V4SF
> -          (match_operand:V4SF 3 "vsx_register_operand" "0,wa,v")))))]
> +          (match_operand:V4SF 3 "vsx_register_operand" "0,wa")))))]
>    "VECTOR_UNIT_VSX_P (V4SFmode)"
>    "@
>     xvnmsubasp %x0,%x1,%x2
> -   xvnmsubmsp %x0,%x1,%x3
> -   vnmsubfp %0,%1,%2,%3"
> +   xvnmsubmsp %x0,%x1,%x3"
>    [(set_attr "type" "vecfloat")])

Well, together with this of course :-)

Could you please do that?


Segher

Reply via email to