[PATCH 1/2] IBM Z: Store long doubles in vector registers when possible

Ilya Leoshkevich via Gcc-patches Mon, 09 Nov 2020 11:56:20 -0800

On z14+, there are instructions for working with 128-bit floats (long
doubles) in vector registers.  It's beneficial to use them instead of
instructions that operate on floating point register pairs, because it
allows to store 4 times more data in registers at a time, relieving
register pressure.  The raw performance of the new instructions is
almost the same as that of the new ones.


Implement by storing TFmode values in vector registers on z14+.  Since
not all operations are available with the new instructions, keep the
old ones available using the new FPRX2 mode, and convert between it and
TFmode when necessary (this is called "forwarder" expanders below).
Change the existing TFmode expanders to call either new- or old-style
ones depending on whether we are on z14+ or older machines
("dispatcher" expanders).

gcc/ChangeLog:

2020-11-03  Ilya Leoshkevich  <i...@linux.ibm.com>

        * config/s390/s390-modes.def (FPRX2): New mode.
        * config/s390/s390-protos.h (s390_fma_allowed_p): New function.
        * config/s390/s390.c (s390_fma_allowed_p): Likewise.
        (s390_build_signbit_mask): Support 128-bit masks.
        (print_operand): Support printing the second word of a TFmode
        operand as vector register.
        (constant_modes): Add FPRX2mode.
        (s390_class_max_nregs): Return 1 for TFmode on z14+.
        (s390_is_fpr128): New function.
        (s390_is_vr128): Likewise.
        (s390_can_change_mode_class): Use s390_is_fpr128 and
        s390_is_vr128 in order to determine whether mode refers to a FPR
        pair or to a VR.
        (s390_emit_compare): Force TFmode operands into registers on
        z14+.
        * config/s390/s390.h (HAVE_TF): New macro.
        (EXPAND_MOVTF): New macro.
        (EXPAND_TF): Likewise.
        * config/s390/s390.md (PFPO_OP_TYPE_FPRX2): PFPO_OP_TYPE_TF
        alias.
        (ALL): Add FPRX2.
        (FP_ALL): Add FPRX2 for z14+, restrict TFmode to z13-.
        (FP): Likewise.
        (FP_ANYTF): New mode iterator.
        (BFP): Add FPRX2 for z14+, restrict TFmode to z13-.
        (TD_TF): Likewise.
        (xde): Add FPRX2.
        (nBFP): Likewise.
        (nDFP): Likewise.
        (DSF): Likewise.
        (DFDI): Likewise.
        (SFSI): Likewise.
        (DF): Likewise.
        (SF): Likewise.
        (fT0): Likewise.
        (bt): Likewise.
        (_d): Likewise.
        (HALF_TMODE): Likewise.
        (tf_fpr): New mode_attr.
        (type): New mode_attr.
        (*cmp<mode>_ccz_0): Use type instead of mode with fsimp.
        (*cmp<mode>_ccs_0_fastmath): Likewise.
        (*cmptf_ccs): New pattern for wfcxb.
        (*cmptf_ccsfps): New pattern for wfkxb.
        (mov<mode>): Rename to mov<mode><tf_fpr>.
        (signbit<mode>2): Rename to signbit<mode>2<tf_fpr>.
        (isinf<mode>2): Renamed to isinf<mode>2<tf_fpr>.
        (*TDC_insn_<mode>): Use type instead of mode with fsimp.
        (fixuns_trunc<FP:mode><GPR:mode>2): Rename to
        fixuns_trunc<FP:mode><GPR:mode>2<FP:tf_fpr>.
        (fix_trunctf<mode>2): Rename to fix_trunctf<mode>2_fpr.
        (floatdi<mode>2): Rename to floatdi<mode>2<tf_fpr>, use type
        instead of mode with itof.
        (floatsi<mode>2): Rename to floatsi<mode>2<tf_fpr>, use type
        instead of mode with itof.
        (*floatuns<GPR:mode><FP:mode>2): Use type instead of mode for
        itof.
        (floatuns<GPR:mode><FP:mode>2): Rename to
        floatuns<GPR:mode><FP:mode>2<tf_fpr>.
        (trunctf<mode>2): Rename to trunctf<mode>2_fpr, use type instead
        of mode with fsimp.
        (extend<DSF:mode><BFP:mode>2): Rename to
        extend<DSF:mode><BFP:mode>2<BFP:tf_fpr>.
        (<FPINT:fpint_name><BFP:mode>2): Rename to
        <FPINT:fpint_name><BFP:mode>2<BFP:tf_fpr>, use type instead of
        mode with fsimp.
        (rint<BFP:mode>2): Rename to rint<BFP:mode>2<BFP:tf_fpr>, use
        type instead of mode with fsimp.
        (<FPINT:fpint_name><DFP:mode>2): Use type instead of mode for
        fsimp.
        (rint<DFP:mode>2): Likewise.
        (trunc<BFP:mode><DFP_ALL:mode>2): Rename to
        trunc<BFP:mode><DFP_ALL:mode>2<BFP:tf_fpr>.
        (trunc<DFP_ALL:mode><BFP:mode>2): Rename to
        trunc<DFP_ALL:mode><BFP:mode>2<BFP:tf_fpr>.
        (extend<BFP:mode><DFP_ALL:mode>2): Rename to
        extend<BFP:mode><DFP_ALL:mode>2<BFP:tf_fpr>.
        (extend<DFP_ALL:mode><BFP:mode>2): Rename to
        extend<DFP_ALL:mode><BFP:mode>2<BFP:tf_fpr>.
        (add<mode>3): Rename to add<mode>3<tf_fpr>, use type instead of
        mode with fsimp.
        (*add<mode>3_cc): Use type instead of mode with fsimp.
        (*add<mode>3_cconly): Likewise.
        (sub<mode>3): Rename to sub<mode>3<tf_fpr>, use type instead of
        mode with fsimp.
        (*sub<mode>3_cc): Use type instead of mode with fsimp.
        (*sub<mode>3_cconly): Likewise.
        (mul<mode>3): Rename to mul<mode>3<tf_fpr>, use type instead of
        mode with fsimp.
        (fma<mode>4): Restrict using s390_fma_allowed_p.
        (fms<mode>4): Restrict using s390_fma_allowed_p.
        (div<mode>3): Rename to div<mode>3<tf_fpr>, use type instead of
        mode with fdiv.
        (neg<mode>2): Rename to neg<mode>2<tf_fpr>.
        (*neg<mode>2_cc): Use type instead of mode with fsimp.
        (*neg<mode>2_cconly): Likewise.
        (*neg<mode>2_nocc): Likewise.
        (*neg<mode>2): Likeiwse.
        (abs<mode>2): Rename to abs<mode>2<tf_fpr>, use type instead of
        mode with fdiv.
        (*abs<mode>2_cc): Use type instead of mode with fsimp.
        (*abs<mode>2_cconly): Likewise.
        (*abs<mode>2_nocc): Likewise.
        (*abs<mode>2): Likewise.
        (*negabs<mode>2_cc): Likewise.
        (*negabs<mode>2_cconly): Likewise.
        (*negabs<mode>2_nocc): Likewise.
        (*negabs<mode>2): Likewise.
        (sqrt<mode>2): Rename to sqrt<mode>2<tf_fpr>, use type instead
        of mode with fsqrt.
        (cbranch<mode>4): Use FP_ANYTF instead of FP.
        (copysign<mode>3): Rename to copysign<mode>3<tf_fpr>, use type
        instead of mode with fsimp.
        * config/s390/s390.opt (flag_vx_long_double_fma): New
        undocumented option.
        * config/s390/vector.md (V_HW): Add TF for z14+.
        (V_HW2): Likewise.
        (VFT): Likewise.
        (VF_HW): Likewise.
        (V_128): Likewise.
        (tf_vr): New mode_attr.
        (tointvec): Add TF.
        (mov<mode>): Rename to mov<mode><tf_vr>.
        (movetf): New dispatcher.
        (*vec_tf_to_v1tf): Rename to *vec_tf_to_v1tf_fpr, restrict to
        z13-.
        (*vec_tf_to_v1tf_vr): New pattern for z14+.
        (*fprx2_to_tf): Likewise.
        (*mov_tf_to_fprx2_0): Likewise.
        (*mov_tf_to_fprx2_1): Likewise.
        (add<mode>3): Rename to add<mode>3<tf_vr>.
        (addtf3): New dispatcher.
        (sub<mode>3): Rename to sub<mode>3<tf_vr>.
        (subtf3): New dispatcher.
        (mul<mode>3): Rename to mul<mode>3<tf_vr>.
        (multf3): New dispatcher.
        (div<mode>3): Rename to div<mode>3<tf_vr>.
        (divtf3): New dispatcher.
        (sqrt<mode>2): Rename to sqrt<mode>2<tf_vr>.
        (sqrttf2): New dispatcher.
        (fma<mode>4): Restrict using s390_fma_allowed_p.
        (fms<mode>4): Likewise.
        (neg_fma<mode>4): Likewise.
        (neg_fms<mode>4): Likewise.
        (neg<mode>2): Rename to neg<mode>2<tf_vr>.
        (negtf2): New dispatcher.
        (abs<mode>2): Rename to abs<mode>2<tf_vr>.
        (abstf2): New dispatcher.
        (float<mode>tf2_vr): New forwarder.
        (float<mode>tf2): New dispatcher.
        (floatuns<mode>tf2_vr): New forwarder.
        (floatuns<mode>tf2): New dispatcher.
        (fix_trunctf<mode>2_vr): New forwarder.
        (fix_trunctf<mode>2): New dispatcher.
        (fixuns_trunctf<mode>2_vr): New forwarder.
        (fixuns_trunctf<mode>2): New dispatcher.
        (<FPINT:fpint_name><VF_HW:mode>2<VF_HW:tf_vr>): New pattern.
        (<FPINT:fpint_name>tf2): New forwarder.
        (rint<mode>2<tf_vr>): New pattern.
        (rinttf2): New forwarder.
        (*trunctfdf2_vr): New pattern.
        (trunctfdf2_vr): New forwarder.
        (trunctfdf2): New dispatcher.
        (trunctfsf2_vr): New forwarder.
        (trunctfsf2): New dispatcher.
        (extenddftf2_vr): New pattern.
        (extenddftf2): New dispatcher.
        (extendsftf2_vr): New forwarder.
        (extendsftf2): New dispatcher.
        (signbittf2_vr): New forwarder.
        (signbittf2): New dispatchers.
        (isinftf2_vr): New forwarder.
        (isinftf2): New dispatcher.
        * config/s390/vx-builtins.md (*vftci<mode>_cconly): Use VF_HW
        instead of VECF_HW, add missing constraint, add vw support.
        (vftci<mode>_intcconly): Use VF_HW instead of VECF_HW.
        (*vftci<mode>): Rename to vftci<mode>, use VF_HW instead of
        VECF_HW, and vw support.
        (vftci<mode>_intcc): Use VF_HW instead of VECF_HW.
---
 gcc/config/s390/s390-modes.def |   5 +-
 gcc/config/s390/s390-protos.h  |   1 +
 gcc/config/s390/s390.c         |  57 ++++-
 gcc/config/s390/s390.h         |  35 +++
 gcc/config/s390/s390.md        | 209 +++++++++++-------
 gcc/config/s390/s390.opt       |  11 +
 gcc/config/s390/vector.md      | 382 ++++++++++++++++++++++++++++++---
 gcc/config/s390/vx-builtins.md |  38 ++--
 8 files changed, 604 insertions(+), 134 deletions(-)

diff --git a/gcc/config/s390/s390-modes.def b/gcc/config/s390/s390-modes.def
index b1f8e1fc9e3..316ca5cf58b 100644
--- a/gcc/config/s390/s390-modes.def
+++ b/gcc/config/s390/s390-modes.def
@@ -22,9 +22,12 @@ along with GCC; see the file COPYING3.  If not see
 /* 256-bit integer mode is needed for STACK_SAVEAREA_MODE.  */
 INT_MODE (OI, 32);
 
-/* Define TFmode to work around reload problem PR 20927.  */
+/* 128-bit float stored in a VR on z14+ or a FPR pair on older machines.  */
 FLOAT_MODE (TF, 16, ieee_quad_format);
 
+/* 128-bit float stored in a FPR pair.  */
+FLOAT_MODE (FPRX2, 16, ieee_quad_format);
+
 /* Add any extra modes needed to represent the condition code.  */
 
 /*
diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index 029f7289fac..ad2f7f77c18 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -51,6 +51,7 @@ extern bool s390_hard_regno_rename_ok (unsigned int, unsigned 
int);
 extern int s390_class_max_nregs (enum reg_class, machine_mode);
 extern bool s390_function_arg_vector (machine_mode, const_tree);
 extern bool s390_return_addr_from_memory(void);
+extern bool s390_fma_allowed_p (machine_mode);
 #if S390_USE_TARGET_ATTRIBUTE
 extern tree s390_valid_target_attribute_tree (tree args,
                                              struct gcc_options *opts,
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 847cedde674..2300a517b64 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -456,6 +456,16 @@ s390_return_addr_from_memory ()
   return cfun_gpr_save_slot(RETURN_REGNUM) == SAVE_SLOT_STACK;
 }
 
+/* Return nonzero if it's OK to use fused multiply-add for MODE.  */
+bool
+s390_fma_allowed_p (machine_mode mode)
+{
+  if (TARGET_VXE && mode == TFmode)
+    return flag_vx_long_double_fma;
+
+  return true;
+}
+
 /* Indicate which ABI has been used for passing vector args.
    0 - no vector type arguments have been passed where the ABI is relevant
    1 - the old ABI has been used
@@ -1850,6 +1860,10 @@ s390_emit_compare (enum rtx_code code, rtx op0, rtx op1)
   machine_mode mode = s390_select_ccmode (code, op0, op1);
   rtx cc;
 
+  /* Force OP1 into register in order to satisfy VXE TFmode patterns.  */
+  if (TARGET_VXE && GET_MODE (op1) == TFmode)
+    op1 = force_reg (TFmode, op1);
+
   if (GET_MODE_CLASS (GET_MODE (op0)) == MODE_CC)
     {
       /* Do not output a redundant compare instruction if a
@@ -6959,6 +6973,13 @@ s390_expand_vec_init (rtx target, rtx vals)
 extern rtx
 s390_build_signbit_mask (machine_mode mode)
 {
+  if (mode == TFmode && TARGET_VXE)
+    {
+      wide_int mask_val = wi::set_bit_in_zero (127, 128);
+      rtx mask = immed_wide_int_const (mask_val, TImode);
+      return gen_lowpart (TFmode, mask);
+    }
+
   /* Generate the integral element mask value.  */
   machine_mode inner_mode = GET_MODE_INNER (mode);
   int inner_bitsize = GET_MODE_BITSIZE (inner_mode);
@@ -7902,6 +7923,7 @@ print_operand_address (FILE *file, rtx addr)
         CONST_VECTOR: Generate a bitmask for vgbm instruction.
     'x': print integer X as if it's an unsigned halfword.
     'v': print register number as vector register (v1 instead of f1).
+    'V': print the second word of a TFmode operand as vector register.
 */
 
 void
@@ -8071,13 +8093,13 @@ print_operand (FILE *file, rtx x, int code)
     case REG:
       /* Print FP regs as fx instead of vx when they are accessed
         through non-vector mode.  */
-      if (code == 'v'
+      if ((code == 'v' || code == 'V')
          || VECTOR_NOFP_REG_P (x)
          || (FP_REG_P (x) && VECTOR_MODE_P (GET_MODE (x)))
          || (VECTOR_REG_P (x)
              && (GET_MODE_SIZE (GET_MODE (x)) /
                  s390_class_max_nregs (FP_REGS, GET_MODE (x))) > 8))
-       fprintf (file, "%%v%s", reg_names[REGNO (x)] + 2);
+       fprintf (file, "%%v%s", reg_names[REGNO (x) + (code == 'V')] + 2);
       else
        fprintf (file, "%s", reg_names[REGNO (x)]);
       break;
@@ -8623,7 +8645,7 @@ replace_constant_pool_ref (rtx_insn *insn, rtx ref, rtx 
offset)
 
 static machine_mode constant_modes[] =
 {
-  TFmode, TImode, TDmode,
+  TFmode, FPRX2mode, TImode, TDmode,
   V16QImode, V8HImode, V4SImode, V2DImode, V1TImode,
   V4SFmode, V2DFmode, V1TFmode,
   DFmode, DImode, DDmode,
@@ -10418,7 +10440,8 @@ s390_class_max_nregs (enum reg_class rclass, 
machine_mode mode)
         full VRs.  */
       if (TARGET_VX
          && SCALAR_FLOAT_MODE_P (mode)
-         && GET_MODE_SIZE (mode) >= 16)
+         && GET_MODE_SIZE (mode) >= 16
+         && !(TARGET_VXE && mode == TFmode))
        reg_pair_required_p = true;
 
       /* Even if complex types would fit into a single FPR/VR we force
@@ -10441,6 +10464,24 @@ s390_class_max_nregs (enum reg_class rclass, 
machine_mode mode)
   return (GET_MODE_SIZE (mode) + reg_size - 1) / reg_size;
 }
 
+/* Return nonzero if mode M describes a 128-bit float in a floating point
+   register pair.  */
+
+static bool
+s390_is_fpr128 (machine_mode m)
+{
+  return m == FPRX2mode || (!TARGET_VXE && m == TFmode);
+}
+
+/* Return nonzero if mode M describes a 128-bit float in a vector
+   register.  */
+
+static bool
+s390_is_vr128 (machine_mode m)
+{
+  return m == V1TFmode || (TARGET_VXE && m == TFmode);
+}
+
 /* Implement TARGET_CAN_CHANGE_MODE_CLASS.  */
 
 static bool
@@ -10451,11 +10492,11 @@ s390_can_change_mode_class (machine_mode from_mode,
   machine_mode small_mode;
   machine_mode big_mode;
 
-  /* V1TF and TF have different representations in vector
-     registers.  */
+  /* 128-bit values have different representations in floating point and
+     vector registers.  */
   if (reg_classes_intersect_p (VEC_REGS, rclass)
-      && ((from_mode == V1TFmode && to_mode == TFmode)
-         || (from_mode == TFmode && to_mode == V1TFmode)))
+      && ((s390_is_fpr128 (from_mode) && s390_is_vr128 (to_mode))
+         || (s390_is_vr128 (from_mode) && s390_is_fpr128 (to_mode))))
     return false;
 
   if (GET_MODE_SIZE (from_mode) == GET_MODE_SIZE (to_mode))
diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
index ec5128c0af2..8c028317b6b 100644
--- a/gcc/config/s390/s390.h
+++ b/gcc/config/s390/s390.h
@@ -1186,5 +1186,40 @@ struct GTY(()) machine_function
 
 #define TARGET_INDIRECT_BRANCH_TABLE s390_indirect_branch_table
 
+#ifdef GENERATOR_FILE
+/* gencondmd.c is built before insn-flags.h.  */
+#define HAVE_TF(icode) true
+#else
+#define HAVE_TF(icode) (HAVE_##icode##_fpr || HAVE_##icode##_vr)
+#endif
+
+/* Dispatcher for movtf.  */
+#define EXPAND_MOVTF(icode)                                                   \
+  do                                                                          \
+    {                                                                         \
+      if (TARGET_VXE)                                                         \
+       emit_insn (gen_##icode##_vr (operands[0], operands[1]));              \
+      else                                                                    \
+       emit_insn (gen_##icode##_fpr (operands[0], operands[1]));             \
+      DONE;                                                                   \
+    }                                                                         \
+  while (false)
+
+/* Like EXPAND_MOVTF, but also legitimizes operands.  */
+#define EXPAND_TF(icode, nops)                                                \
+  do                                                                          \
+    {                                                                         \
+      const size_t __nops = (nops);                                           \
+      expand_operand ops[__nops];                                             \
+      create_output_operand (&ops[0], operands[0], GET_MODE (operands[0]));   \
+      for (size_t i = 1; i < __nops; i++)                                     \
+       create_input_operand (&ops[i], operands[i], GET_MODE (operands[i]));  \
+      if (TARGET_VXE)                                                         \
+       expand_insn (CODE_FOR_##icode##_vr, __nops, ops);                     \
+      else                                                                    \
+       expand_insn (CODE_FOR_##icode##_fpr, __nops, ops);                    \
+      DONE;                                                                   \
+    }                                                                         \
+  while (false)
 
 #endif /* S390_H */
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 050374980ae..a2c033b2515 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -405,6 +405,7 @@ (define_constants
    (PFPO_OP_TYPE_SF             0x5)
    (PFPO_OP_TYPE_DF             0x6)
    (PFPO_OP_TYPE_TF             0x7)
+   (PFPO_OP_TYPE_FPRX2          0x7)
    (PFPO_OP_TYPE_SD             0x8)
    (PFPO_OP_TYPE_DD             0x9)
    (PFPO_OP_TYPE_TD             0xa)
@@ -627,20 +628,29 @@ (define_attr "relative_long" "no,yes" (const_string "no"))
 
 ;; Iterators
 
-(define_mode_iterator ALL [TI DI SI HI QI TF DF SF TD DD SD V1QI V2QI V4QI 
V8QI V16QI V1HI V2HI V4HI V8HI V1SI V2SI V4SI V1DI V2DI V1SF V2SF V4SF V1TI 
V1DF V2DF V1TF])
+(define_mode_iterator ALL [TI DI SI HI QI TF FPRX2 DF SF TD DD SD V1QI V2QI
+                          V4QI V8QI V16QI V1HI V2HI V4HI V8HI V1SI V2SI V4SI
+                          V1DI V2DI V1SF V2SF V4SF V1TI V1DF V2DF V1TF])
 
 ;; These mode iterators allow floating point patterns to be generated from the
 ;; same template.
-(define_mode_iterator FP_ALL [TF DF SF (TD "TARGET_HARD_DFP") (DD 
"TARGET_HARD_DFP")
+(define_mode_iterator FP_ALL [(TF "!TARGET_VXE") (FPRX2 "TARGET_VXE") DF SF
+                             (TD "TARGET_HARD_DFP") (DD "TARGET_HARD_DFP")
                               (SD "TARGET_HARD_DFP")])
-(define_mode_iterator FP [TF DF SF (TD "TARGET_HARD_DFP") (DD 
"TARGET_HARD_DFP")])
-(define_mode_iterator BFP [TF DF SF])
+(define_mode_iterator FP [(TF "!TARGET_VXE") (FPRX2 "TARGET_VXE") DF SF
+                         (TD "TARGET_HARD_DFP") (DD "TARGET_HARD_DFP")])
+;; Like FP, but without a condition on TF. Useful for expanders that must be
+;; the same for FP and VR variants of TF.
+(define_mode_iterator FP_ANYTF [TF (FPRX2 "TARGET_VXE") DF SF
+                               (TD "TARGET_HARD_DFP")
+                               (DD "TARGET_HARD_DFP")])
+(define_mode_iterator BFP [(TF "!TARGET_VXE") (FPRX2 "TARGET_VXE") DF SF])
 (define_mode_iterator DFP [TD DD])
 (define_mode_iterator DFP_ALL [TD DD SD])
 (define_mode_iterator DSF [DF SF])
 (define_mode_iterator SD_SF [SF SD])
 (define_mode_iterator DD_DF [DF DD])
-(define_mode_iterator TD_TF [TF TD])
+(define_mode_iterator TD_TF [(TF "!TARGET_VXE") (FPRX2 "TARGET_VXE") TD])
 
 ; 32 bit int<->fp conversion instructions are available since VXE2 (z15).
 (define_mode_iterator VX_CONV_BFP [DF (SF "TARGET_VXE2")])
@@ -714,7 +724,8 @@ (define_code_attr noxa [(and "n") (ior "o") (xor "x") (plus 
"a")])
 
 ;; In FP templates, a string like "lt<de>br" will expand to "ltxbr" in
 ;; TF/TDmode, "ltdbr" in DF/DDmode, and "ltebr" in SF/SDmode.
-(define_mode_attr xde [(TF "x") (DF "d") (SF "e") (TD "x") (DD "d") (SD "e") 
(V4SF "e") (V2DF "d")])
+(define_mode_attr xde [(TF "x") (FPRX2 "x") (DF "d") (SF "e") (TD "x")
+                      (DD "d") (SD "e") (V4SF "e") (V2DF "d")])
 
 ;; In FP templates, a <dee> in "m<dee><bt>r" will expand to "mx<bt>r" in
 ;; TF/TDmode, "md<bt>r" in DF/DDmode, "mee<bt>r" in SFmode and "me<bt>r in
@@ -727,19 +738,22 @@ (define_mode_attr xdee [(TF "x") (DF "d") (SF "ee") (TD 
"x") (DD "d") (SD "e")])
 
 ;; These mode attributes are supposed to be used in the `enabled' insn
 ;; attribute to disable certain alternatives for certain modes.
-(define_mode_attr nBFP [(TF "0") (DF "0") (SF "0") (TD "*") (DD "*") (DD "*")])
-(define_mode_attr nDFP [(TF "*") (DF "*") (SF "*") (TD "0") (DD "0") (DD "0")])
-(define_mode_attr DSF [(TF "0") (DF "*") (SF "*") (TD "0") (DD "0") (SD "0")])
-(define_mode_attr DFDI [(TF "0") (DF "*") (SF "0")
+(define_mode_attr nBFP [(TF "0") (FPRX2 "0") (DF "0") (SF "0") (TD "*")
+                       (DD "*") (DD "*")])
+(define_mode_attr nDFP [(TF "*") (FPRX2 "*") (DF "*") (SF "*") (TD "0")
+                       (DD "0") (DD "0")])
+(define_mode_attr DSF [(TF "0") (FPRX2 "0") (DF "*") (SF "*") (TD "0")
+                      (DD "0") (SD "0")])
+(define_mode_attr DFDI [(TF "0") (FPRX2 "0") (DF "*") (SF "0")
                        (TD "0") (DD "0") (DD "0")
                        (TI "0") (DI "*") (SI "0")])
-(define_mode_attr SFSI [(TF "0") (DF "0") (SF "*")
+(define_mode_attr SFSI [(TF "0") (FPRX2 "0") (DF "0") (SF "*")
                        (TD "0") (DD "0") (DD "0")
                        (TI "0") (DI "0") (SI "*")])
-(define_mode_attr DF [(TF "0") (DF "*") (SF "0")
+(define_mode_attr DF [(TF "0") (FPRX2 "0") (DF "*") (SF "0")
                      (TD "0") (DD "0") (DD "0")
                      (TI "0") (DI "0") (SI "0")])
-(define_mode_attr SF [(TF "0") (DF "0") (SF "*")
+(define_mode_attr SF [(TF "0") (FPRX2 "0") (DF "0") (SF "*")
                      (TD "0") (DD "0") (DD "0")
                      (TI "0") (DI "0") (SI "0")])
 
@@ -749,15 +763,17 @@ (define_mode_attr SF [(TF "0") (DF "0") (SF "*")
 ;; sign bit instructions only handle single source and target fp registers
 ;; these instructions can only be used for TFmode values if the source and
 ;; target operand uses the same fp register.
-(define_mode_attr fT0 [(TF "0") (DF "f") (SF "f")])
+(define_mode_attr fT0 [(TF "0") (FPRX2 "0") (DF "f") (SF "f")])
 
 ;; This attribute adds b for bfp instructions and t for dfp instructions and 
is used
 ;; within instruction mnemonics.
-(define_mode_attr bt [(TF "b") (DF "b") (SF "b") (TD "t") (DD "t") (SD "t")])
+(define_mode_attr bt [(TF "b") (FPRX2 "b") (DF "b") (SF "b") (TD "t") (DD "t")
+                     (SD "t")])
 
 ;; This attribute is used within instruction mnemonics.  It evaluates to d for 
dfp
 ;; modes and to an empty string for bfp modes.
-(define_mode_attr _d [(TF "") (DF "") (SF "") (TD "d") (DD "d") (SD "d")])
+(define_mode_attr _d [(TF "") (FPRX2 "") (DF "") (SF "") (TD "d") (DD "d")
+                     (SD "d")])
 
 ;; In GPR and P templates, a constraint like "<d0>" will expand to "d" in 
DImode
 ;; and "0" in SImode. This allows to combine instructions of which the 31bit
@@ -829,7 +845,7 @@ (define_mode_attr DBL [(DI "TI") (SI "DI")])
 
 ;; This attribute expands to DF for TFmode and to DD for TDmode .  It is
 ;; used for Txmode splitters splitting a Txmode copy into 2 Dxmode copies.
-(define_mode_attr HALF_TMODE [(TF "DF") (TD "DD")])
+(define_mode_attr HALF_TMODE [(TF "DF") (FPRX2 "DF") (TD "DD")])
 
 ;; Maximum unsigned integer that fits in MODE.
 (define_mode_attr max_uint [(HI "65535") (QI "255")])
@@ -850,6 +866,13 @@ (define_mode_attr modesize [(DI "8") (SI "4")])
 ;; Allow return and simple_return to be defined from a single template.
 (define_code_iterator ANY_RETURN [return simple_return])
 
+;; Facilitate dispatching TFmode expanders on z14+.
+(define_mode_attr tf_fpr [(TF "_fpr") (FPRX2 "") (DF "") (SF "") (TD "")
+                         (DD "") (SD "")])
+
+;; Mode names as seen in type mode_attr values.
+(define_mode_attr type [(TF "tf") (FPRX2 "tf") (DF "df") (SF "sf") (TD "td")
+                       (DD "dd") (SD "sd")])
 
 
 ; Condition code modes generated by vector fp comparisons.  These will
@@ -1421,7 +1444,7 @@ (define_insn "*cmp<mode>_ccz_0"
   "TARGET_HARD_FLOAT"
   "lt<xde><bt>r\t%0,%0"
    [(set_attr "op_type" "RRE")
-    (set_attr "type"  "fsimp<mode>")])
+    (set_attr "type" "fsimp<type>")])
 
 (define_insn "*cmp<mode>_ccs_0_fastmath"
   [(set (reg CC_REGNUM)
@@ -1433,7 +1456,7 @@ (define_insn "*cmp<mode>_ccs_0_fastmath"
    && !flag_signaling_nans"
   "lt<xde><bt>r\t%0,%0"
   [(set_attr "op_type" "RRE")
-   (set_attr "type" "fsimp<mode>")])
+   (set_attr "type" "fsimp<type>")])
 
 ; VX: TFmode in FPR pairs: use cxbr instead of wfcxb
 ; cxtr, cdtr, cxbr, cdbr, cebr, cdb, ceb, wfcsb, wfcdb
@@ -1451,6 +1474,18 @@ (define_insn "*cmp<mode>_ccs"
    (set_attr "cpu_facility" "*,*,vx,vxe")
    (set_attr "enabled" "*,<DSF>,<DF>,<SF>")])
 
+; VX: TFmode in VR: use wfcxb
+(define_insn "*cmptf_ccs"
+  [(set (reg CC_REGNUM)
+       (compare (match_operand:TF 0 "register_operand" "v")
+                 (match_operand:TF 1 "register_operand" "v")))]
+  "s390_match_ccmode(insn, CCSmode) && TARGET_VXE"
+  "wfcxb\t%0,%1"
+  [(set_attr "op_type" "VRR")
+   (set_attr "cpu_facility" "vxe")])
+
+; VX: TFmode in FPR pairs: use kxbr instead of wfkxb
+; kxtr, kdtr, kxbr, kdbr, kebr, kdb, keb, wfksb, wfkdb
 (define_insn "*cmp<mode>_ccsfps"
   [(set (reg CC_REGNUM)
        (compare (match_operand:FP 0 "register_operand" "f,f,v,v")
@@ -1465,6 +1500,16 @@ (define_insn "*cmp<mode>_ccsfps"
    (set_attr "cpu_facility" "*,*,vx,vxe")
    (set_attr "enabled" "*,<DSF>,<DF>,<SF>")])
 
+; VX: TFmode in VR: use wfkxb
+(define_insn "*cmptf_ccsfps"
+  [(set (reg CC_REGNUM)
+       (compare (match_operand:TF 0 "register_operand" "v")
+                 (match_operand:TF 1 "register_operand" "v")))]
+  "s390_match_ccmode (insn, CCSFPSmode) && TARGET_VXE"
+  "wfkxb\t%0,%1"
+  [(set_attr "op_type" "VRR")
+   (set_attr "cpu_facility" "vxe")])
+
 ; Compare and Branch instructions
 
 ; cij, cgij, crj, cgrj, cfi, cgfi, cr, cgr
@@ -2489,7 +2534,7 @@ (define_insn "movstrictsi"
 ; mov(tf|td) instruction pattern(s).
 ;
 
-(define_expand "mov<mode>"
+(define_expand "mov<mode><tf_fpr>"
   [(set (match_operand:TD_TF 0 "nonimmediate_operand" "")
         (match_operand:TD_TF 1 "general_operand"      ""))]
   ""
@@ -3418,7 +3463,7 @@ (define_insn "*cpymem_long_31z"
 ; Test data class.
 ;
 
-(define_expand "signbit<mode>2"
+(define_expand "signbit<mode>2<tf_fpr>"
   [(set (reg:CCZ CC_REGNUM)
         (unspec:CCZ [(match_operand:FP_ALL 1 "register_operand" "f")
                      (match_dup 2)]
@@ -3430,7 +3475,7 @@ (define_expand "signbit<mode>2"
   operands[2] = GEN_INT (S390_TDC_SIGNBIT_SET);
 })
 
-(define_expand "isinf<mode>2"
+(define_expand "isinf<mode>2<tf_fpr>"
   [(set (reg:CCZ CC_REGNUM)
         (unspec:CCZ [(match_operand:FP_ALL 1 "register_operand" "f")
                      (match_dup 2)]
@@ -3468,7 +3513,7 @@ (define_insn "*TDC_insn_<mode>"
   "TARGET_HARD_FLOAT"
   "t<_d>c<xde><bt>\t%0,%1"
    [(set_attr "op_type" "RXE")
-    (set_attr "type"  "fsimp<mode>")])
+    (set_attr "type"    "fsimp<type>")])
 
 
 
@@ -4984,7 +5029,7 @@ (define_insn_and_split "*zero_extendqihi2_31"
 ; This is the only entry point for fixuns_trunc.  It multiplexes the
 ; expansion to either the *_emu expanders below for pre z196 machines
 ; or emits the default pattern otherwise.
-(define_expand "fixuns_trunc<FP:mode><GPR:mode>2"
+(define_expand "fixuns_trunc<FP:mode><GPR:mode>2<FP:tf_fpr>"
   [(parallel
     [(set (match_operand:GPR 0 "register_operand" "")
          (unsigned_fix:GPR (match_operand:FP 1 "register_operand" "")))
@@ -5247,12 +5292,12 @@ (define_insn "fix_trunc<DFP:mode>di2_dfp"
 ; fix_trunctf(si|di)2 instruction pattern(s).
 ;
 
-(define_expand "fix_trunctf<mode>2"
+(define_expand "fix_trunctf<mode>2_fpr"
   [(parallel [(set (match_operand:GPR 0 "register_operand" "")
                   (fix:GPR (match_operand:TF 1 "register_operand" "")))
              (unspec:GPR [(const_int BFP_RND_TOWARD_0)] UNSPEC_ROUND)
              (clobber (reg:CC CC_REGNUM))])]
-  "TARGET_HARD_FLOAT"
+  "TARGET_HARD_FLOAT && !TARGET_VXE"
   "")
 
 
@@ -5261,7 +5306,7 @@ (define_expand "fix_trunctf<mode>2"
 ;
 
 ; cxgbr, cdgbr, cegbr, cxgtr, cdgtr
-(define_insn "floatdi<mode>2"
+(define_insn "floatdi<mode>2<tf_fpr>"
   [(set (match_operand:FP           0 "register_operand" "=f,v")
         (float:FP (match_operand:DI 1 "register_operand"  "d,v")))]
   "TARGET_ZARCH && TARGET_HARD_FLOAT"
@@ -5269,12 +5314,12 @@ (define_insn "floatdi<mode>2"
    c<xde>g<bt>r\t%0,%1
    wcdgb\t%v0,%v1,0,0"
   [(set_attr "op_type"      "RRE,VRR")
-   (set_attr "type"         "itof<mode>" )
+   (set_attr "type"         "itof<type>" )
    (set_attr "cpu_facility" "*,vx")
    (set_attr "enabled"      "*,<DFDI>")])
 
 ; cxfbr, cdfbr, cefbr, wcefb
-(define_insn "floatsi<mode>2"
+(define_insn "floatsi<mode>2<tf_fpr>"
   [(set (match_operand:BFP           0 "register_operand" "=f,v")
         (float:BFP (match_operand:SI 1 "register_operand"  "d,v")))]
   "TARGET_HARD_FLOAT"
@@ -5282,7 +5327,7 @@ (define_insn "floatsi<mode>2"
    c<xde>fbr\t%0,%1
    wcefb\t%v0,%v1,0,0"
   [(set_attr "op_type"      "RRE,VRR")
-   (set_attr "type"         "itof<mode>" )
+   (set_attr "type"         "itof<type>" )
    (set_attr "cpu_facility" "*,vxe2")
    (set_attr "enabled"      "*,<SFSI>")])
 
@@ -5293,7 +5338,7 @@ (define_insn "floatsi<mode>2"
   "TARGET_Z196 && TARGET_HARD_FLOAT"
   "c<xde>ftr\t%0,0,%1,0"
   [(set_attr "op_type" "RRE")
-   (set_attr "type"   "itof<mode>" )])
+   (set_attr "type"    "itof<type>")])
 
 ;
 ; floatuns(si|di)(tf|df|sf|td|dd)2 instruction pattern(s).
@@ -5319,9 +5364,9 @@ (define_insn "*floatuns<GPR:mode><FP:mode>2"
    && (!TARGET_VX || <FP:MODE>mode != DFmode || <GPR:MODE>mode != DImode)"
   "c<FP:xde>l<GPR:gf><FP:bt>r\t%0,0,%1,0"
   [(set_attr "op_type" "RRE")
-   (set_attr "type"    "itof<FP:mode>")])
+   (set_attr "type"    "itof<FP:type>")])
 
-(define_expand "floatuns<GPR:mode><FP:mode>2"
+(define_expand "floatuns<GPR:mode><FP:mode>2<tf_fpr>"
   [(set (match_operand:FP                     0 "register_operand" "")
         (unsigned_float:FP (match_operand:GPR 1 "register_operand" "")))]
   "TARGET_Z196 && TARGET_HARD_FLOAT")
@@ -5347,7 +5392,7 @@ (define_insn "truncdfsf2"
 ;
 
 ; ldxbr, lexbr
-(define_insn "trunctf<mode>2"
+(define_insn "trunctf<mode>2_fpr"
   [(set (match_operand:DSF 0 "register_operand" "=f")
         (float_truncate:DSF (match_operand:TF 1 "register_operand" "f")))
    (clobber (match_scratch:TF 2 "=f"))]
@@ -5427,9 +5472,9 @@ (define_insn "*extend<DSF:mode><BFP:mode>2"
    l<BFP:xde><DSF:xde>br\t%0,%1
    l<BFP:xde><DSF:xde>b\t%0,%1"
   [(set_attr "op_type" "RRE,RXE")
-   (set_attr "type"    "fsimp<BFP:mode>, fload<BFP:mode>")])
+   (set_attr "type"    "fsimp<BFP:type>, fload<BFP:type>")])
 
-(define_expand "extend<DSF:mode><BFP:mode>2"
+(define_expand "extend<DSF:mode><BFP:mode>2<BFP:tf_fpr>"
   [(set (match_operand:BFP                   0 "register_operand"     "")
         (float_extend:BFP (match_operand:DSF 1 "nonimmediate_operand" "")))]
   "TARGET_HARD_FLOAT
@@ -5471,27 +5516,27 @@ (define_expand "extendsdtd2"
 ; For all of them the inexact exceptions are suppressed.
 
 ; fiebra, fidbra, fixbra
-(define_insn "<FPINT:fpint_name><BFP:mode>2"
+(define_insn "<FPINT:fpint_name><BFP:mode>2<BFP:tf_fpr>"
   [(set (match_operand:BFP 0 "register_operand" "=f")
        (unspec:BFP [(match_operand:BFP 1 "register_operand" "f")]
                    FPINT))]
   "TARGET_Z196"
   "fi<BFP:xde>bra\t%0,<FPINT:fpint_roundingmode>,%1,4"
   [(set_attr "op_type"   "RRF")
-   (set_attr "type"      "fsimp<BFP:mode>")])
+   (set_attr "type"      "fsimp<BFP:type>")])
 
 ; rint is supposed to raise an inexact exception so we can use the
 ; older instructions.
 
 ; fiebr, fidbr, fixbr
-(define_insn "rint<BFP:mode>2"
+(define_insn "rint<BFP:mode>2<BFP:tf_fpr>"
   [(set (match_operand:BFP 0 "register_operand" "=f")
        (unspec:BFP [(match_operand:BFP 1 "register_operand" "f")]
                    UNSPEC_FPINT_RINT))]
   ""
   "fi<BFP:xde>br\t%0,0,%1"
   [(set_attr "op_type"   "RRF")
-   (set_attr "type"      "fsimp<BFP:mode>")])
+   (set_attr "type"      "fsimp<BFP:type>")])
 
 
 ; Decimal Floating Point - load fp integer
@@ -5504,7 +5549,7 @@ (define_insn "<FPINT:fpint_name><DFP:mode>2"
   "TARGET_HARD_DFP"
   "fi<DFP:xde>tr\t%0,<FPINT:fpint_roundingmode>,%1,4"
   [(set_attr "op_type"   "RRF")
-   (set_attr "type"      "fsimp<DFP:mode>")])
+   (set_attr "type"      "fsimp<DFP:type>")])
 
 ; fidtr, fixtr
 (define_insn "rint<DFP:mode>2"
@@ -5514,7 +5559,7 @@ (define_insn "rint<DFP:mode>2"
   "TARGET_HARD_DFP"
   "fi<DFP:xde>tr\t%0,0,%1,0"
   [(set_attr "op_type"   "RRF")
-   (set_attr "type"      "fsimp<DFP:mode>")])
+   (set_attr "type"      "fsimp<DFP:type>")])
 
 ;
 ; Binary <-> Decimal floating point trunc patterns
@@ -5538,7 +5583,7 @@ (define_insn "*trunc<DFP_ALL:mode><BFP:mode>2"
   "TARGET_HARD_DFP"
   "pfpo")
 
-(define_expand "trunc<BFP:mode><DFP_ALL:mode>2"
+(define_expand "trunc<BFP:mode><DFP_ALL:mode>2<BFP:tf_fpr>"
   [(set (reg:BFP FPR4_REGNUM) (match_operand:BFP 1 "nonimmediate_operand" ""))
    (set (reg:SI GPR0_REGNUM) (match_dup 2))
    (parallel
@@ -5565,7 +5610,7 @@ (define_expand "trunc<BFP:mode><DFP_ALL:mode>2"
   operands[2] = GEN_INT (flags);
 })
 
-(define_expand "trunc<DFP_ALL:mode><BFP:mode>2"
+(define_expand "trunc<DFP_ALL:mode><BFP:mode>2<BFP:tf_fpr>"
   [(set (reg:DFP_ALL FPR4_REGNUM)
         (match_operand:DFP_ALL 1 "nonimmediate_operand" ""))
    (set (reg:SI GPR0_REGNUM) (match_dup 2))
@@ -5611,7 +5656,7 @@ (define_insn "*extend<DFP_ALL:mode><BFP:mode>2"
   "TARGET_HARD_DFP"
   "pfpo")
 
-(define_expand "extend<BFP:mode><DFP_ALL:mode>2"
+(define_expand "extend<BFP:mode><DFP_ALL:mode>2<BFP:tf_fpr>"
   [(set (reg:BFP FPR4_REGNUM) (match_operand:BFP 1 "nonimmediate_operand" ""))
    (set (reg:SI GPR0_REGNUM) (match_dup 2))
    (parallel
@@ -5638,7 +5683,7 @@ (define_expand "extend<BFP:mode><DFP_ALL:mode>2"
   operands[2] = GEN_INT (flags);
 })
 
-(define_expand "extend<DFP_ALL:mode><BFP:mode>2"
+(define_expand "extend<DFP_ALL:mode><BFP:mode>2<BFP:tf_fpr>"
   [(set (reg:DFP_ALL FPR4_REGNUM)
         (match_operand:DFP_ALL 1 "nonimmediate_operand" ""))
    (set (reg:SI GPR0_REGNUM) (match_dup 2))
@@ -6117,7 +6162,7 @@ (define_insn "*addv<mode>3_ccoverflow_const"
 
 ; axbr, adbr, aebr, axb, adb, aeb, adtr, axtr
 ; FIXME: wfadb does not clobber cc
-(define_insn "add<mode>3"
+(define_insn "add<mode>3<tf_fpr>"
   [(set (match_operand:FP          0 "register_operand"     "=f,f,f,v,v")
         (plus:FP (match_operand:FP 1 "nonimmediate_operand" "%f,0,0,v,v")
                 (match_operand:FP 2 "general_operand"       "f,f,R,v,v")))
@@ -6130,7 +6175,7 @@ (define_insn "add<mode>3"
    wfadb\t%v0,%v1,%v2
    wfasb\t%v0,%v1,%v2"
   [(set_attr "op_type"      "RRF,RRE,RXE,VRR,VRR")
-   (set_attr "type"         "fsimp<mode>")
+   (set_attr "type"         "fsimp<type>")
    (set_attr "cpu_facility" "*,*,*,vx,vxe")
    (set_attr "enabled"      "<nBFP>,<nDFP>,<DSF>,<DF>,<SF>")])
 
@@ -6148,7 +6193,7 @@ (define_insn "*add<mode>3_cc"
    a<xde>br\t%0,%2
    a<xde>b\t%0,%2"
   [(set_attr "op_type"  "RRF,RRE,RXE")
-   (set_attr "type"     "fsimp<mode>")
+   (set_attr "type"     "fsimp<type>")
    (set_attr "enabled"  "<nBFP>,<nDFP>,<DSF>")])
 
 ; axbr, adbr, aebr, axb, adb, aeb, adtr, axtr
@@ -6164,7 +6209,7 @@ (define_insn "*add<mode>3_cconly"
    a<xde>br\t%0,%2
    a<xde>b\t%0,%2"
   [(set_attr "op_type"  "RRF,RRE,RXE")
-   (set_attr "type"     "fsimp<mode>")
+   (set_attr "type"     "fsimp<type>")
    (set_attr "enabled"  "<nBFP>,<nDFP>,<DSF>")])
 
 ;
@@ -6562,7 +6607,7 @@ (define_insn "*subv<mode>3_ccoverflow"
 
 ; FIXME: (clobber (match_scratch:CC 3 "=c,c,c,X,X")) does not work - why?
 ; sxbr, sdbr, sebr, sdb, seb, sxtr, sdtr
-(define_insn "sub<mode>3"
+(define_insn "sub<mode>3<tf_fpr>"
   [(set (match_operand:FP           0 "register_operand" "=f,f,f,v,v")
         (minus:FP (match_operand:FP 1 "register_operand"  "f,0,0,v,v")
                  (match_operand:FP 2 "general_operand"   "f,f,R,v,v")))
@@ -6575,7 +6620,7 @@ (define_insn "sub<mode>3"
    wfsdb\t%v0,%v1,%v2
    wfssb\t%v0,%v1,%v2"
   [(set_attr "op_type"      "RRF,RRE,RXE,VRR,VRR")
-   (set_attr "type"         "fsimp<mode>")
+   (set_attr "type"         "fsimp<type>")
    (set_attr "cpu_facility" "*,*,*,vx,vxe")
    (set_attr "enabled"      "<nBFP>,<nDFP>,<DSF>,<DF>,<SF>")])
 
@@ -6593,7 +6638,7 @@ (define_insn "*sub<mode>3_cc"
    s<xde>br\t%0,%2
    s<xde>b\t%0,%2"
   [(set_attr "op_type"  "RRF,RRE,RXE")
-   (set_attr "type"     "fsimp<mode>")
+   (set_attr "type"     "fsimp<type>")
    (set_attr "enabled"  "<nBFP>,<nDFP>,<DSF>")])
 
 ; sxbr, sdbr, sebr, sdb, seb, sxtr, sdtr
@@ -6609,7 +6654,7 @@ (define_insn "*sub<mode>3_cconly"
    s<xde>br\t%0,%2
    s<xde>b\t%0,%2"
   [(set_attr "op_type"  "RRF,RRE,RXE")
-   (set_attr "type"     "fsimp<mode>")
+   (set_attr "type"     "fsimp<type>")
    (set_attr "enabled"  "<nBFP>,<nDFP>,<DSF>")])
 
 
@@ -7143,7 +7188,7 @@ (define_insn "umul<dwh><mode>3"
 ;
 
 ; mxbr, mdbr, meebr, mxb, mxb, meeb, mdtr, mxtr
-(define_insn "mul<mode>3"
+(define_insn "mul<mode>3<tf_fpr>"
   [(set (match_operand:FP          0 "register_operand"     "=f,f,f,v,v")
         (mult:FP (match_operand:FP 1 "nonimmediate_operand" "%f,0,0,v,v")
                 (match_operand:FP 2 "general_operand"       "f,f,R,v,v")))]
@@ -7155,7 +7200,7 @@ (define_insn "mul<mode>3"
    wfmdb\t%v0,%v1,%v2
    wfmsb\t%v0,%v1,%v2"
   [(set_attr "op_type"      "RRF,RRE,RXE,VRR,VRR")
-   (set_attr "type"         "fmul<mode>")
+   (set_attr "type"         "fmul<type>")
    (set_attr "cpu_facility" "*,*,*,vx,vxe")
    (set_attr "enabled"      "<nBFP>,<nDFP>,<DSF>,<DF>,<SF>")])
 
@@ -7165,7 +7210,7 @@ (define_insn "fma<mode>4"
        (fma:DSF (match_operand:DSF 1 "nonimmediate_operand" "%f,f,v,v")
                 (match_operand:DSF 2 "nonimmediate_operand"  "f,R,v,v")
                 (match_operand:DSF 3 "register_operand"      "0,0,v,v")))]
-  "TARGET_HARD_FLOAT"
+  "TARGET_HARD_FLOAT && s390_fma_allowed_p (<MODE>mode)"
   "@
    ma<xde>br\t%0,%1,%2
    ma<xde>b\t%0,%1,%2
@@ -7182,7 +7227,7 @@ (define_insn "fms<mode>4"
        (fma:DSF (match_operand:DSF          1 "nonimmediate_operand" 
"%f,f,v,v")
                 (match_operand:DSF          2 "nonimmediate_operand"  
"f,R,v,v")
                 (neg:DSF (match_operand:DSF 3 "register_operand"      
"0,0,v,v"))))]
-  "TARGET_HARD_FLOAT"
+  "TARGET_HARD_FLOAT && s390_fma_allowed_p (<MODE>mode)"
   "@
    ms<xde>br\t%0,%1,%2
    ms<xde>b\t%0,%1,%2
@@ -7448,7 +7493,7 @@ (define_insn "udivmoddisi3"
 ;
 
 ; dxbr, ddbr, debr, dxb, ddb, deb, ddtr, dxtr
-(define_insn "div<mode>3"
+(define_insn "div<mode>3<tf_fpr>"
   [(set (match_operand:FP         0 "register_operand" "=f,f,f,v,v")
         (div:FP (match_operand:FP 1 "register_operand"  "f,0,0,v,v")
                (match_operand:FP 2 "general_operand"   "f,f,R,v,v")))]
@@ -7460,7 +7505,7 @@ (define_insn "div<mode>3"
    wfddb\t%v0,%v1,%v2
    wfdsb\t%v0,%v1,%v2"
   [(set_attr "op_type"      "RRF,RRE,RXE,VRR,VRR")
-   (set_attr "type"         "fdiv<mode>")
+   (set_attr "type"         "fdiv<type>")
    (set_attr "cpu_facility" "*,*,*,vx,vxe")
    (set_attr "enabled"      "<nBFP>,<nDFP>,<DSF>,<DF>,<SF>")])
 
@@ -8777,10 +8822,10 @@ (define_split
    operands[6] = gen_label_rtx ();")
 
 ;
-; neg(df|sf)2 instruction pattern(s).
+; neg(tf|df|sf)2 instruction pattern(s).
 ;
 
-(define_expand "neg<mode>2"
+(define_expand "neg<mode>2<tf_fpr>"
   [(parallel
     [(set (match_operand:BFP          0 "register_operand")
           (neg:BFP (match_operand:BFP 1 "register_operand")))
@@ -8797,7 +8842,7 @@ (define_insn "*neg<mode>2_cc"
   "s390_match_ccmode (insn, CCSmode) && TARGET_HARD_FLOAT"
   "lc<xde>br\t%0,%1"
   [(set_attr "op_type"  "RRE")
-   (set_attr "type"     "fsimp<mode>")])
+   (set_attr "type"     "fsimp<type>")])
 
 ; lcxbr, lcdbr, lcebr
 (define_insn "*neg<mode>2_cconly"
@@ -8808,7 +8853,7 @@ (define_insn "*neg<mode>2_cconly"
   "s390_match_ccmode (insn, CCSmode) && TARGET_HARD_FLOAT"
   "lc<xde>br\t%0,%1"
   [(set_attr "op_type"  "RRE")
-   (set_attr "type"     "fsimp<mode>")])
+   (set_attr "type"     "fsimp<type>")])
 
 ; lcdfr
 (define_insn "*neg<mode>2_nocc"
@@ -8817,7 +8862,7 @@ (define_insn "*neg<mode>2_nocc"
   "TARGET_DFP"
   "lcdfr\t%0,%1"
   [(set_attr "op_type"  "RRE")
-   (set_attr "type"     "fsimp<mode>")])
+   (set_attr "type"     "fsimp<type>")])
 
 ; lcxbr, lcdbr, lcebr
 ; FIXME: wflcdb does not clobber cc
@@ -8833,7 +8878,7 @@ (define_insn "*neg<mode>2"
    wflcsb\t%0,%1"
   [(set_attr "op_type"      "RRE,VRR,VRR")
    (set_attr "cpu_facility" "*,vx,vxe")
-   (set_attr "type"         "fsimp<mode>,*,*")
+   (set_attr "type"         "fsimp<type>,*,*")
    (set_attr "enabled"      "*,<DF>,<SF>")])
 
 
@@ -8901,10 +8946,10 @@ (define_insn "abs<mode>2"
    (set_attr "z10prop" "z10_c")])
 
 ;
-; abs(df|sf)2 instruction pattern(s).
+; abs(tf|df|sf)2 instruction pattern(s).
 ;
 
-(define_expand "abs<mode>2"
+(define_expand "abs<mode>2<tf_fpr>"
   [(parallel
     [(set (match_operand:BFP 0 "register_operand" "=f")
           (abs:BFP (match_operand:BFP 1 "register_operand" "f")))
@@ -8922,7 +8967,7 @@ (define_insn "*abs<mode>2_cc"
   "s390_match_ccmode (insn, CCSmode) && TARGET_HARD_FLOAT"
   "lp<xde>br\t%0,%1"
   [(set_attr "op_type"  "RRE")
-   (set_attr "type"     "fsimp<mode>")])
+   (set_attr "type"     "fsimp<type>")])
 
 ; lpxbr, lpdbr, lpebr
 (define_insn "*abs<mode>2_cconly"
@@ -8933,7 +8978,7 @@ (define_insn "*abs<mode>2_cconly"
   "s390_match_ccmode (insn, CCSmode) && TARGET_HARD_FLOAT"
   "lp<xde>br\t%0,%1"
   [(set_attr "op_type"  "RRE")
-   (set_attr "type"     "fsimp<mode>")])
+   (set_attr "type"     "fsimp<type>")])
 
 ; lpdfr
 (define_insn "*abs<mode>2_nocc"
@@ -8942,7 +8987,7 @@ (define_insn "*abs<mode>2_nocc"
   "TARGET_DFP"
   "lpdfr\t%0,%1"
   [(set_attr "op_type"  "RRE")
-   (set_attr "type"     "fsimp<mode>")])
+   (set_attr "type"     "fsimp<type>")])
 
 ; lpxbr, lpdbr, lpebr
 ; FIXME: wflpdb does not clobber cc
@@ -8956,7 +9001,7 @@ (define_insn "*abs<mode>2"
     wflpdb\t%0,%1"
   [(set_attr "op_type"      "RRE,VRR")
    (set_attr "cpu_facility" "*,vx")
-   (set_attr "type"         "fsimp<mode>,*")
+   (set_attr "type"         "fsimp<type>,*")
    (set_attr "enabled"      "*,<DFDI>")])
 
 
@@ -9038,7 +9083,7 @@ (define_insn "*negabs<mode>2_cc"
   "s390_match_ccmode (insn, CCSmode) && TARGET_HARD_FLOAT"
   "ln<xde>br\t%0,%1"
   [(set_attr "op_type"  "RRE")
-   (set_attr "type"     "fsimp<mode>")])
+   (set_attr "type"     "fsimp<type>")])
 
 ; lnxbr, lndbr, lnebr
 (define_insn "*negabs<mode>2_cconly"
@@ -9049,7 +9094,7 @@ (define_insn "*negabs<mode>2_cconly"
   "s390_match_ccmode (insn, CCSmode) && TARGET_HARD_FLOAT"
   "ln<xde>br\t%0,%1"
   [(set_attr "op_type"  "RRE")
-   (set_attr "type"     "fsimp<mode>")])
+   (set_attr "type"     "fsimp<type>")])
 
 ; lndfr
 (define_insn "*negabs<mode>2_nocc"
@@ -9058,7 +9103,7 @@ (define_insn "*negabs<mode>2_nocc"
   "TARGET_DFP"
   "lndfr\t%0,%1"
   [(set_attr "op_type"  "RRE")
-   (set_attr "type"     "fsimp<mode>")])
+   (set_attr "type"     "fsimp<type>")])
 
 ; lnxbr, lndbr, lnebr
 ; FIXME: wflndb does not clobber cc
@@ -9072,7 +9117,7 @@ (define_insn "*negabs<mode>2"
    wflndb\t%0,%1"
   [(set_attr "op_type"      "RRE,VRR")
    (set_attr "cpu_facility" "*,vx")
-   (set_attr "type"         "fsimp<mode>,*")
+   (set_attr "type"         "fsimp<type>,*")
    (set_attr "enabled"      "*,<DFDI>")])
 
 ;;
@@ -9084,7 +9129,7 @@ (define_insn "*negabs<mode>2"
 ;
 
 ; sqxbr, sqdbr, sqebr, sqdb, sqeb
-(define_insn "sqrt<mode>2"
+(define_insn "sqrt<mode>2<tf_fpr>"
   [(set (match_operand:BFP           0 "register_operand" "=f,f,v")
        (sqrt:BFP (match_operand:BFP 1 "general_operand"   "f,R,v")))]
   "TARGET_HARD_FLOAT"
@@ -9093,7 +9138,7 @@ (define_insn "sqrt<mode>2"
    sq<xde>b\t%0,%1
    wfsqdb\t%v0,%v1"
   [(set_attr "op_type"      "RRE,RXE,VRR")
-   (set_attr "type"         "fsqrt<mode>")
+   (set_attr "type"         "fsqrt<type>")
    (set_attr "cpu_facility" "*,*,vx")
    (set_attr "enabled"      "*,<DSF>,<DFDI>")])
 
@@ -9294,8 +9339,8 @@ (define_expand "cbranch<mode>4"
 (define_expand "cbranch<mode>4"
   [(set (pc)
         (if_then_else (match_operator 0 "comparison_operator"
-                      [(match_operand:FP 1 "register_operand" "")
-                        (match_operand:FP 2 "general_operand" "")])
+                      [(match_operand:FP_ANYTF 1 "register_operand" "")
+                       (match_operand:FP_ANYTF 2 "general_operand" "")])
                      (label_ref (match_operand 3 "" ""))
                       (pc)))]
   "TARGET_HARD_FLOAT"
@@ -11790,7 +11835,7 @@ (define_expand "popcountqi2"
 ;;- Copy sign instructions
 ;;
 
-(define_insn "copysign<mode>3"
+(define_insn "copysign<mode>3<tf_fpr>"
   [(set (match_operand:FP 0 "register_operand" "=f")
       (unspec:FP [(match_operand:FP 1 "register_operand" "<fT0>")
                   (match_operand:FP 2 "register_operand" "f")]
@@ -11798,7 +11843,7 @@ (define_insn "copysign<mode>3"
   "TARGET_Z196"
   "cpsdr\t%0,%2,%1"
   [(set_attr "op_type"  "RRF")
-   (set_attr "type"     "fsimp<mode>")])
+   (set_attr "type"     "fsimp<type>")])
 
 
 ;;
diff --git a/gcc/config/s390/s390.opt b/gcc/config/s390/s390.opt
index 300309cddda..0afcea3c3fe 100644
--- a/gcc/config/s390/s390.opt
+++ b/gcc/config/s390/s390.opt
@@ -304,3 +304,14 @@ mnop-mcount
 Target Report Var(flag_nop_mcount)
 Generate mcount/__fentry__ calls as nops. To activate they need to be
 patched in.
+
+mvx-long-double-fma
+Target Report Undocumented Var(flag_vx_long_double_fma)
+Emit fused multiply-add instructions for long doubles in vector registers
+(wfmaxb, wfmsxb, wfnmaxb, wfnmsxb).  Reassociation pass does not handle
+fused multiply-adds, therefore code generated by the middle-end is prone to
+having long fused multiply-add chains.  This is not pipeline-friendly,
+and the default behavior is to emit separate multiplication and addition
+instructions for long doubles in vector registers, because measurements show
+that this improves performance.  This option allows overriding it for testing
+purposes.
diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 3e621daf7b1..31d323930b2 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -27,10 +27,14 @@ (define_mode_iterator VT
    V2SF V4SF V1DF V2DF V1TF V1TI TI])
 
 ; All modes directly supported by the hardware having full vector reg size
-; V_HW2 is duplicate of V_HW for having two iterators expanding
-; independently e.g. vcond
-(define_mode_iterator V_HW  [V16QI V8HI V4SI V2DI (V1TI "TARGET_VXE") V2DF 
(V4SF "TARGET_VXE") (V1TF "TARGET_VXE")])
-(define_mode_iterator V_HW2 [V16QI V8HI V4SI V2DI V2DF (V4SF "TARGET_VXE") 
(V1TF "TARGET_VXE")])
+; V_HW2 is for having two iterators expanding independently e.g. vcond.
+; It's similar to V_HW, but not fully identical: V1TI is not included, because
+; there are no 128-bit compares.
+(define_mode_iterator V_HW  [V16QI V8HI V4SI V2DI (V1TI "TARGET_VXE") V2DF
+                            (V4SF "TARGET_VXE") (V1TF "TARGET_VXE")
+                            (TF "TARGET_VXE")])
+(define_mode_iterator V_HW2 [V16QI V8HI V4SI V2DI V2DF (V4SF "TARGET_VXE")
+                            (V1TF "TARGET_VXE") (TF "TARGET_VXE")])
 
 (define_mode_iterator V_HW_64 [V2DI V2DF])
 (define_mode_iterator VT_HW_HSDT [V8HI V4SI V4SF V2DI V2DF V1TI V1TF TI TF])
@@ -55,19 +59,20 @@ (define_mode_iterator VI_QHS [V1QI V2QI V4QI V8QI V16QI 
V1HI V2HI V4HI V8HI V1SI
 
 (define_mode_iterator VFT [(V1SF "TARGET_VXE") (V2SF "TARGET_VXE") (V4SF 
"TARGET_VXE")
                           V1DF V2DF
-                          (V1TF "TARGET_VXE")])
+                          (V1TF "TARGET_VXE") (TF "TARGET_VXE")])
 
 ; FP vector modes directly supported by the HW.  This does not include
 ; vector modes using only part of a vector register and should be used
 ; for instructions which might trigger IEEE exceptions.
-(define_mode_iterator VF_HW [(V4SF "TARGET_VXE") V2DF (V1TF "TARGET_VXE")])
+(define_mode_iterator VF_HW [(V4SF "TARGET_VXE") V2DF (V1TF "TARGET_VXE")
+                            (TF "TARGET_VXE")])
 
 (define_mode_iterator V_8   [V1QI])
 (define_mode_iterator V_16  [V2QI  V1HI])
 (define_mode_iterator V_32  [V4QI  V2HI V1SI V1SF])
 (define_mode_iterator V_64  [V8QI  V4HI V2SI V2SF V1DI V1DF])
-(define_mode_iterator V_128 [V16QI V8HI V4SI V4SF V2DI V2DF V1TI V1TF])
-
+(define_mode_iterator V_128 [V16QI V8HI V4SI V4SF V2DI V2DF V1TI V1TF
+                            (TF "TARGET_VXE")])
 (define_mode_iterator V_128_NOSINGLE [V16QI V8HI V4SI V4SF V2DI V2DF])
 
 ; 32 bit int<->fp vector conversion instructions are available since VXE2 
(z15).
@@ -86,6 +91,11 @@ (define_mode_attr ti* [(V1QI "")  (V2QI "") (V4QI "") (V8QI 
"") (V16QI "")
                       (V1DF "")  (V2DF "")
                       (V1TF "")  (TF "")])
 
+;; Facilitate dispatching TFmode expanders on z14+.
+(define_mode_attr tf_vr [(TF "_vr") (V4SF "") (V2DF "") (V1TF "") (V1SF "")
+                        (V2SF "") (V1DF "") (V16QI "") (V8HI "") (V4SI "")
+                        (V2DI "") (V1TI "")])
+
 ; The element type of the vector.
 (define_mode_attr non_vec[(V1QI "QI") (V2QI "QI") (V4QI "QI") (V8QI "QI") 
(V16QI "QI")
                          (V1HI "HI") (V2HI "HI") (V4HI "HI") (V8HI "HI")
@@ -134,7 +144,7 @@ (define_mode_attr tointvec [(V1QI "V1QI") (V2QI "V2QI") 
(V4QI "V4QI") (V8QI "V8Q
                            (V1TI "V1TI")
                            (V1SF "V1SI") (V2SF "V2SI") (V4SF "V4SI")
                            (V1DF "V1DI") (V2DF "V2DI")
-                           (V1TF "V1TI")])
+                           (V1TF "V1TI") (TF "V1TI")])
 (define_mode_attr vw [(SF "w") (V1SF "w") (V2SF "v") (V4SF "v")
                      (DF "w") (V1DF "w") (V2DF "v")
                      (TF "w") (V1TF "w")])
@@ -194,7 +204,7 @@ (define_constants
 ; for TImode (use double-int for the calculations)
 
 ; vgmb, vgmh, vgmf, vgmg, vrepib, vrepih, vrepif, vrepig
-(define_insn "mov<mode>"
+(define_insn "mov<mode><tf_vr>"
   [(set (match_operand:V_128 0 "nonimmediate_operand" "=v,v,R,  v,  v,  v,  v, 
 v,v,*d,*d,?o")
        (match_operand:V_128 1 "general_operand"      " 
v,R,v,j00,jm1,jyy,jxx,jKK,d, v,dT,*d"))]
   ""
@@ -214,6 +224,12 @@ (define_insn "mov<mode>"
   [(set_attr "cpu_facility" "vx,vx,vx,vx,vx,vx,vx,vx,vx,vx,*,*")
    (set_attr "op_type"      "VRR,VRX,VRX,VRI,VRI,VRI,VRI,VRI,VRR,*,*,*")])
 
+(define_expand "movtf"
+  [(match_operand:TF 0 "nonimmediate_operand" "")
+   (match_operand:TF 1 "general_operand"      "")]
+  ""
+  { EXPAND_MOVTF(movtf); })
+
 ; VR -> GPR, no instruction so split it into 64 element sets.
 (define_split
   [(set (match_operand:V_128 0 "register_operand" "")
@@ -565,10 +581,10 @@ (define_insn "*vec_splats_bswap_elem<mode>"
 
 ; A TFmode operand resides in FPR register pairs while V1TF is in a
 ; single vector register.
-(define_insn "*vec_tf_to_v1tf"
+(define_insn "*vec_tf_to_v1tf_fpr"
   [(set (match_operand:V1TF                   0 "nonimmediate_operand" 
"=v,v,R,v,v")
        (vec_duplicate:V1TF (match_operand:TF 1 "general_operand"       
"f,R,f,G,d")))]
-  "TARGET_VX"
+  "TARGET_VX && !TARGET_VXE"
   "@
    vmrhg\t%v0,%1,%N1
    vl\t%v0,%1%A1
@@ -577,6 +593,26 @@ (define_insn "*vec_tf_to_v1tf"
    vlvgp\t%v0,%1,%N1"
   [(set_attr "op_type" "VRR,VRX,VRX,VRI,VRR")])
 
+; Both TFmode and V1TFmode operands reside in vector registers.
+(define_insn "*vec_tf_to_v1tf_vr"
+  [(set (match_operand:V1TF                   0 "nonimmediate_operand" 
"=v,v,R,v,v")
+       (vec_duplicate:V1TF (match_operand:TF 1 "general_operand"       
"v,R,v,G,d")))]
+  "TARGET_VXE"
+  "@
+   vlr\t%v0,%1
+   vl\t%v0,%1%A1
+   vst\t%v1,%0%A0
+   vzero\t%v0
+   vlvgp\t%v0,%1,%N1"
+  [(set_attr "op_type" "VRR,VRX,VRX,VRI,VRR")])
+
+(define_insn "*fprx2_to_tf"
+  [(set (match_operand:TF               0 "nonimmediate_operand" "=v")
+       (subreg:TF (match_operand:FPRX2 1 "general_operand"       "f") 0))]
+  "TARGET_VXE"
+  "vmrhg\t%v0,%1,%N1"
+  [(set_attr "op_type" "VRR")])
+
 (define_insn "*vec_ti_to_v1ti"
   [(set (match_operand:V1TI                   0 "nonimmediate_operand" 
"=v,v,R,  v,  v,v")
        (vec_duplicate:V1TI (match_operand:TI 1 "general_operand"       
"v,R,v,j00,jm1,d")))]
@@ -691,6 +727,21 @@ (define_insn "*vec_perm<mode>"
   "vperm\t%v0,%v1,%v2,%v3"
   [(set_attr "op_type" "VRR")])
 
+(define_insn "*mov_tf_to_fprx2_0"
+  [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "=f") 0)
+       (subreg:DF (match_operand:TF    1 "general_operand"       "v") 0))]
+  "TARGET_VXE"
+  ; M4 == 1 corresponds to %v0[0] = %v1[0]; %v0[1] = %v0[1];
+  "vpdi\t%v0,%v1,%v0,1"
+  [(set_attr "op_type" "VRR")])
+
+(define_insn "*mov_tf_to_fprx2_1"
+  [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "=f") 8)
+       (subreg:DF (match_operand:TF    1 "general_operand"       "v") 8))]
+  "TARGET_VXE"
+  ; M4 == 5 corresponds to %V0[0] = %v1[1]; %V0[1] = %V0[1];
+  "vpdi\t%V0,%v1,%V0,5"
+  [(set_attr "op_type" "VRR")])
 
 ; vec_perm_const for V2DI using vpdi?
 
@@ -1253,7 +1304,7 @@ (define_expand "vec_widen_smult_hi_<mode>"
 ;;
 
 ; vfasb, vfadb, wfasb, wfadb, wfaxb
-(define_insn "add<mode>3"
+(define_insn "add<mode>3<tf_vr>"
   [(set (match_operand:VF_HW             0 "register_operand" "=v")
        (plus:VF_HW (match_operand:VF_HW 1 "register_operand"  "v")
                    (match_operand:VF_HW 2 "register_operand"  "v")))]
@@ -1261,8 +1312,15 @@ (define_insn "add<mode>3"
   "<vw>fa<sdx>b\t%v0,%v1,%v2"
   [(set_attr "op_type" "VRR")])
 
+(define_expand "addtf3"
+  [(match_operand:TF 0 "register_operand"     "")
+   (match_operand:TF 1 "nonimmediate_operand" "")
+   (match_operand:TF 2 "general_operand"      "")]
+  "HAVE_TF (addtf3)"
+  { EXPAND_TF (addtf3, 3); })
+
 ; vfssb, vfsdb, wfssb, wfsdb, wfsxb
-(define_insn "sub<mode>3"
+(define_insn "sub<mode>3<tf_vr>"
   [(set (match_operand:VF_HW              0 "register_operand" "=v")
        (minus:VF_HW (match_operand:VF_HW 1 "register_operand"  "v")
                     (match_operand:VF_HW 2 "register_operand"  "v")))]
@@ -1270,8 +1328,15 @@ (define_insn "sub<mode>3"
   "<vw>fs<sdx>b\t%v0,%v1,%v2"
   [(set_attr "op_type" "VRR")])
 
+(define_expand "subtf3"
+  [(match_operand:TF 0 "register_operand" "")
+   (match_operand:TF 1 "register_operand" "")
+   (match_operand:TF 2 "general_operand"  "")]
+  "HAVE_TF (subtf3)"
+  { EXPAND_TF (subtf3, 3); })
+
 ; vfmsb, vfmdb, wfmsb, wfmdb, wfmxb
-(define_insn "mul<mode>3"
+(define_insn "mul<mode>3<tf_vr>"
   [(set (match_operand:VF_HW             0 "register_operand" "=v")
        (mult:VF_HW (match_operand:VF_HW 1 "register_operand"  "v")
                    (match_operand:VF_HW 2 "register_operand"  "v")))]
@@ -1279,8 +1344,15 @@ (define_insn "mul<mode>3"
   "<vw>fm<sdx>b\t%v0,%v1,%v2"
   [(set_attr "op_type" "VRR")])
 
+(define_expand "multf3"
+  [(match_operand:TF 0 "register_operand"     "")
+   (match_operand:TF 1 "nonimmediate_operand" "")
+   (match_operand:TF 2 "general_operand"      "")]
+  "HAVE_TF (multf3)"
+  { EXPAND_TF (multf3, 3); })
+
 ; vfdsb, vfddb, wfdsb, wfddb, wfdxb
-(define_insn "div<mode>3"
+(define_insn "div<mode>3<tf_vr>"
   [(set (match_operand:VF_HW            0 "register_operand" "=v")
        (div:VF_HW (match_operand:VF_HW 1 "register_operand"  "v")
                   (match_operand:VF_HW 2 "register_operand"  "v")))]
@@ -1288,21 +1360,34 @@ (define_insn "div<mode>3"
   "<vw>fd<sdx>b\t%v0,%v1,%v2"
   [(set_attr "op_type" "VRR")])
 
+(define_expand "divtf3"
+  [(match_operand:TF 0 "register_operand" "")
+   (match_operand:TF 1 "register_operand" "")
+   (match_operand:TF 2 "general_operand"  "")]
+  "HAVE_TF (divtf3)"
+  { EXPAND_TF (divtf3, 3); })
+
 ; vfsqsb, vfsqdb, wfsqsb, wfsqdb, wfsqxb
-(define_insn "sqrt<mode>2"
-  [(set (match_operand:VF_HW           0 "register_operand" "=v")
+(define_insn "sqrt<mode>2<tf_vr>"
+  [(set (match_operand:VF_HW             0 "register_operand" "=v")
        (sqrt:VF_HW (match_operand:VF_HW 1 "register_operand"  "v")))]
   "TARGET_VX"
   "<vw>fsq<sdx>b\t%v0,%v1"
   [(set_attr "op_type" "VRR")])
 
+(define_expand "sqrttf2"
+  [(match_operand:TF 0 "register_operand" "")
+   (match_operand:TF 1 "general_operand"  "")]
+  "HAVE_TF (sqrttf2)"
+  { EXPAND_TF (sqrttf2, 2); })
+
 ; vfmasb, vfmadb, wfmasb, wfmadb, wfmaxb
 (define_insn "fma<mode>4"
   [(set (match_operand:VF_HW            0 "register_operand" "=v")
        (fma:VF_HW (match_operand:VF_HW 1 "register_operand"  "v")
                   (match_operand:VF_HW 2 "register_operand"  "v")
                   (match_operand:VF_HW 3 "register_operand"  "v")))]
-  "TARGET_VX"
+  "TARGET_VX && s390_fma_allowed_p (<MODE>mode)"
   "<vw>fma<sdx>b\t%v0,%v1,%v2,%v3"
   [(set_attr "op_type" "VRR")])
 
@@ -1312,7 +1397,7 @@ (define_insn "fms<mode>4"
        (fma:VF_HW (match_operand:VF_HW          1 "register_operand"  "v")
                   (match_operand:VF_HW          2 "register_operand"  "v")
                 (neg:VF_HW (match_operand:VF_HW 3 "register_operand"  "v"))))]
-  "TARGET_VX"
+  "TARGET_VX && s390_fma_allowed_p (<MODE>mode)"
   "<vw>fms<sdx>b\t%v0,%v1,%v2,%v3"
   [(set_attr "op_type" "VRR")])
 
@@ -1323,7 +1408,7 @@ (define_insn "neg_fma<mode>4"
         (fma:VF_HW (match_operand:VF_HW 1 "register_operand"  "v")
                    (match_operand:VF_HW 2 "register_operand"  "v")
                    (match_operand:VF_HW 3 "register_operand"  "v"))))]
-  "TARGET_VXE"
+  "TARGET_VXE && s390_fma_allowed_p (<MODE>mode)"
   "<vw>fnma<sdx>b\t%v0,%v1,%v2,%v3"
   [(set_attr "op_type" "VRR")])
 
@@ -1334,26 +1419,38 @@ (define_insn "neg_fms<mode>4"
         (fma:VF_HW (match_operand:VF_HW          1 "register_operand"  "v")
                    (match_operand:VF_HW          2 "register_operand"  "v")
                  (neg:VF_HW (match_operand:VF_HW 3 "register_operand"  
"v")))))]
-  "TARGET_VXE"
+  "TARGET_VXE && s390_fma_allowed_p (<MODE>mode)"
   "<vw>fnms<sdx>b\t%v0,%v1,%v2,%v3"
   [(set_attr "op_type" "VRR")])
 
 ; vflcsb, vflcdb, wflcsb, wflcdb, wflcxb
-(define_insn "neg<mode>2"
+(define_insn "neg<mode>2<tf_vr>"
   [(set (match_operand:VFT          0 "register_operand" "=v")
        (neg:VFT (match_operand:VFT 1 "register_operand"  "v")))]
   "TARGET_VX"
   "<vw>flc<sdx>b\t%v0,%v1"
   [(set_attr "op_type" "VRR")])
 
+(define_expand "negtf2"
+  [(match_operand:TF 0 "register_operand" "")
+   (match_operand:TF 1 "register_operand" "")]
+  "HAVE_TF (negtf2)"
+  { EXPAND_TF (negtf2, 2); })
+
 ; vflpsb, vflpdb, wflpsb, wflpdb, wflpxb
-(define_insn "abs<mode>2"
+(define_insn "abs<mode>2<tf_vr>"
   [(set (match_operand:VFT          0 "register_operand" "=v")
        (abs:VFT (match_operand:VFT 1 "register_operand"  "v")))]
   "TARGET_VX"
   "<vw>flp<sdx>b\t%v0,%v1"
   [(set_attr "op_type" "VRR")])
 
+(define_expand "abstf2"
+  [(match_operand:TF 0 "register_operand" "")
+   (match_operand:TF 1 "register_operand" "")]
+  "HAVE_TF (abstf2)"
+  { EXPAND_TF (abstf2, 2); })
+
 ; vflnsb, vflndb, wflnsb, wflndb, wflnxb
 (define_insn "negabs<mode>2"
   [(set (match_operand:VFT                   0 "register_operand" "=v")
@@ -2152,6 +2249,24 @@ (define_insn 
"float<VX_VEC_CONV_INT:mode><VX_VEC_CONV_BFP:mode>2"
   "vc<VX_VEC_CONV_BFP:xde><VX_VEC_CONV_INT:bhfgq>b\t%v0,%v1,0,0"
   [(set_attr "op_type" "VRR")])
 
+; There is no instruction for loading a signed integer into an extended BFP
+; operand in a VR, therefore we need to load it into a FPR pair first.
+(define_expand "float<mode>tf2_vr"
+  [(set (match_dup 2)
+       (float:FPRX2 (match_operand:DSI 1 "register_operand" "")))
+   (set (match_operand:TF               0 "register_operand" "")
+       (subreg:TF (match_dup 2) 0))]
+  "TARGET_VXE"
+{
+  operands[2] = gen_reg_rtx (FPRX2mode);
+})
+
+(define_expand "float<mode>tf2"
+  [(match_operand:TF  0 "register_operand" "")
+   (match_operand:DSI 1 "register_operand" "")]
+  "HAVE_TF (float<mode>tf2)"
+  { EXPAND_TF (float<mode>tf2, 2); })
+
 ; unsigned integer to floating point
 
 ; op2: inexact exception not suppressed (IEEE 754 2008)
@@ -2165,6 +2280,24 @@ (define_insn 
"floatuns<VX_VEC_CONV_INT:mode><VX_VEC_CONV_BFP:mode>2"
   "vc<VX_VEC_CONV_BFP:xde>l<VX_VEC_CONV_INT:bhfgq>b\t%v0,%v1,0,0"
   [(set_attr "op_type" "VRR")])
 
+; There is no instruction for loading an unsigned integer into an extended BFP
+; operand in a VR, therefore load it into a FPR pair first.
+(define_expand "floatuns<mode>tf2_vr"
+  [(set (match_dup 2)
+       (unsigned_float:FPRX2 (match_operand:GPR 1 "register_operand" "")))
+   (set (match_operand:TF                        0 "register_operand" "")
+       (subreg:TF (match_dup 2) 0))]
+  "TARGET_VXE"
+{
+  operands[2] = gen_reg_rtx (FPRX2mode);
+})
+
+(define_expand "floatuns<mode>tf2"
+  [(match_operand:TF  0 "register_operand" "")
+   (match_operand:GPR 1 "register_operand" "")]
+  "HAVE_TF (floatuns<mode>tf2)"
+  { EXPAND_TF (floatuns<mode>tf2, 2); })
+
 ; floating point to signed integer
 
 ; op2: inexact exception not suppressed (IEEE 754 2008)
@@ -2178,6 +2311,27 @@ (define_insn 
"fix_trunc<VX_VEC_CONV_BFP:mode><VX_VEC_CONV_INT:mode>2"
   "vc<VX_VEC_CONV_INT:bhfgq><VX_VEC_CONV_BFP:xde>b\t%v0,%v1,0,5"
   [(set_attr "op_type" "VRR")])
 
+; There is no instruction for rounding an extended BFP operand in a VR into
+; a signed integer, therefore copy it into a FPR pair first.
+(define_expand "fix_trunctf<mode>2_vr"
+  [(set (subreg:DF (match_dup 2) 0)
+       (subreg:DF (match_operand:TF 1 "register_operand" "") 0))
+   (set (subreg:DF (match_dup 2) 8) (subreg:DF (match_dup 1) 8))
+   (parallel [(set (match_operand:GPR 0 "register_operand" "")
+                  (fix:GPR (match_dup 2)))
+             (unspec:GPR [(const_int BFP_RND_TOWARD_0)] UNSPEC_ROUND)
+             (clobber (reg:CC CC_REGNUM))])]
+  "TARGET_VXE"
+{
+  operands[2] = gen_reg_rtx (FPRX2mode);
+})
+
+(define_expand "fix_trunctf<mode>2"
+  [(match_operand:GPR 0 "register_operand" "")
+   (match_operand:TF  1 "register_operand" "")]
+  "HAVE_TF (fix_trunctf<mode>2)"
+  { EXPAND_TF (fix_trunctf<mode>2, 2); })
+
 ; floating point to unsigned integer
 
 ; op2: inexact exception not suppressed (IEEE 754 2008)
@@ -2191,6 +2345,186 @@ (define_insn 
"fixuns_trunc<VX_VEC_CONV_BFP:mode><VX_VEC_CONV_INT:mode>2"
   "vcl<VX_VEC_CONV_INT:bhfgq><VX_VEC_CONV_BFP:xde>b\t%v0,%v1,0,5"
   [(set_attr "op_type" "VRR")])
 
+; There is no instruction for rounding an extended BFP operand in a VR into
+; an unsigned integer, therefore copy it into a FPR pair first.
+(define_expand "fixuns_trunctf<mode>2_vr"
+  [(set (subreg:DF (match_dup 2) 0)
+       (subreg:DF (match_operand:TF 1 "register_operand" "") 0))
+   (set (subreg:DF (match_dup 2) 8) (subreg:DF (match_dup 1) 8))
+   (parallel [(set (match_operand:GPR 0 "register_operand" "")
+                  (unsigned_fix:GPR (match_dup 2)))
+             (unspec:GPR [(const_int BFP_RND_TOWARD_0)] UNSPEC_ROUND)
+             (clobber (reg:CC CC_REGNUM))])]
+  "TARGET_VXE"
+{
+  operands[2] = gen_reg_rtx (FPRX2mode);
+})
+
+(define_expand "fixuns_trunctf<mode>2"
+  [(match_operand:GPR 0 "register_operand" "")
+   (match_operand:TF  1 "register_operand" "")]
+  "HAVE_TF (fixuns_trunctf<mode>2)"
+  { EXPAND_TF (fixuns_trunctf<mode>2, 2); })
+
+; load fp integer
+
+; vfisb, wfisb, vfidb, wfidb, wfixb; suppress inexact exceptions
+(define_insn "<FPINT:fpint_name><VF_HW:mode>2<VF_HW:tf_vr>"
+  [(set (match_operand:VF_HW                0 "register_operand" "=v")
+       (unspec:VF_HW [(match_operand:VF_HW 1 "register_operand"  "v")]
+                     FPINT))]
+  "TARGET_VX"
+  "<vw>fi<VF_HW:sdx>b\t%v0,%v1,4,<FPINT:fpint_roundingmode>"
+  [(set_attr "op_type" "VRR")])
+
+(define_expand "<FPINT:fpint_name>tf2"
+  [(match_operand:TF 0 "register_operand" "")
+   (match_operand:TF 1 "register_operand" "")
+   ; recognize FPINT as an iterator
+   (unspec:TF [(match_dup 1)] FPINT)]
+  "HAVE_TF (<FPINT:fpint_name>tf2)"
+  { EXPAND_TF (<FPINT:fpint_name>tf2, 2); })
+
+; vfisb, wfisb, vfidb, wfidb, wfixb; raise inexact exceptions
+(define_insn "rint<mode>2<tf_vr>"
+  [(set (match_operand:VF_HW                0 "register_operand" "=v")
+       (unspec:VF_HW [(match_operand:VF_HW 1 "register_operand"  "v")]
+                     UNSPEC_FPINT_RINT))]
+  "TARGET_VX"
+  "<vw>fi<sdx>b\t%v0,%v1,0,0"
+  [(set_attr "op_type" "VRR")])
+
+(define_expand "rinttf2"
+  [(match_operand:TF 0 "register_operand" "")
+   (match_operand:TF 1 "register_operand" "")]
+  "HAVE_TF (rinttf2)"
+  { EXPAND_TF (rinttf2, 2); })
+
+; load rounded
+
+; wflrx
+(define_insn "*trunctfdf2_vr"
+  [(set (match_operand:DF                    0 "register_operand" "=f")
+       (float_truncate:DF (match_operand:TF 1 "register_operand"  "v")))
+   (unspec:DF [(match_operand                2 "const_int_operand" "")]
+              UNSPEC_ROUND)]
+  "TARGET_VXE"
+  "wflrx\t%v0,%v1,0,%2"
+  [(set_attr "op_type" "VRR")])
+
+(define_expand "trunctfdf2_vr"
+  [(parallel [
+     (set (match_operand:DF                    0 "register_operand" "")
+         (float_truncate:DF (match_operand:TF 1 "register_operand" "")))
+     (unspec:DF [(const_int BFP_RND_CURRENT)] UNSPEC_ROUND)])]
+  "TARGET_VXE")
+
+(define_expand "trunctfdf2"
+  [(match_operand:DF 0 "register_operand" "")
+   (match_operand:TF 1 "register_operand" "")]
+  "HAVE_TF (trunctfdf2)"
+  { EXPAND_TF (trunctfdf2, 2); })
+
+; wflrx + (ledbr|wledb)
+(define_expand "trunctfsf2_vr"
+  [(parallel [
+     (set (match_dup 2)
+         (float_truncate:DF (match_operand:TF 1 "register_operand" "")))
+     (unspec:DF [(const_int BFP_RND_PREP_FOR_SHORT_PREC)] UNSPEC_ROUND)])
+   (set (match_operand:SF                    0 "register_operand" "")
+       (float_truncate:SF (match_dup 2)))]
+  "TARGET_VXE"
+{
+  operands[2] = gen_reg_rtx(DFmode);
+})
+
+(define_expand "trunctfsf2"
+  [(match_operand:SF 0 "register_operand" "")
+   (match_operand:TF 1 "register_operand" "")]
+  "HAVE_TF (trunctfsf2)"
+  { EXPAND_TF (trunctfsf2, 2); })
+
+; load lengthened
+
+(define_insn "extenddftf2_vr"
+  [(set (match_operand:TF                  0 "register_operand" "=v")
+       (float_extend:TF (match_operand:DF 1 "register_operand"  "f")))]
+  "TARGET_VXE"
+  "wflld\t%v0,%v1"
+  [(set_attr "op_type" "VRR")])
+
+(define_expand "extenddftf2"
+  [(match_operand:TF 0 "register_operand" "")
+   (match_operand:DF 1 "nonimmediate_operand" "")]
+  "HAVE_TF (extenddftf2)"
+  { EXPAND_TF (extenddftf2, 2); })
+
+(define_expand "extendsftf2_vr"
+  [(set (match_dup 2)
+       (float_extend:DF (match_operand:SF 1 "nonimmediate_operand" "")))
+   (set (match_operand:TF                  0 "register_operand"     "")
+       (float_extend:TF (match_dup 2)))]
+  "TARGET_VXE"
+{
+  operands[2] = gen_reg_rtx(DFmode);
+})
+
+(define_expand "extendsftf2"
+  [(match_operand:TF 0 "register_operand" "")
+   (match_operand:SF 1 "nonimmediate_operand" "")]
+  "HAVE_TF (extendsftf2)"
+  { EXPAND_TF (extendsftf2, 2); })
+
+; test data class
+
+(define_expand "signbittf2_vr"
+  [(parallel
+    [(set (reg:CCRAW CC_REGNUM)
+         (unspec:CCRAW [(match_operand:TF 1 "register_operand" "")
+                        (match_dup        2)]
+                       UNSPEC_VEC_VFTCICC))
+     (clobber (scratch:V1TI))])
+   (set (match_operand:SI                  0 "register_operand" "")
+       (const_int 0))
+   (set (match_dup                         0)
+       (if_then_else:SI (eq (reg:CCRAW CC_REGNUM) (const_int 8))
+                        (const_int 1)
+                        (match_dup        0)))]
+  "TARGET_VXE"
+{
+  operands[2] = GEN_INT (S390_TDC_SIGNBIT_SET);
+})
+
+(define_expand "signbittf2"
+  [(match_operand:SI 0 "register_operand" "")
+   (match_operand:TF 1 "register_operand" "")]
+  "HAVE_TF (signbittf2)"
+  { EXPAND_TF (signbittf2, 2); })
+
+(define_expand "isinftf2_vr"
+  [(parallel
+    [(set (reg:CCRAW CC_REGNUM)
+         (unspec:CCRAW [(match_operand:TF 1 "register_operand" "")
+                        (match_dup        2)]
+                       UNSPEC_VEC_VFTCICC))
+     (clobber (scratch:V1TI))])
+   (set (match_operand:SI                  0 "register_operand" "")
+       (const_int 0))
+   (set (match_dup                         0)
+       (if_then_else:SI (eq (reg:CCRAW CC_REGNUM) (const_int 8))
+                        (const_int 1)
+                        (match_dup        0)))]
+  "TARGET_VXE"
+{
+  operands[2] = GEN_INT (S390_TDC_INFINITY);
+})
+
+(define_expand "isinftf2"
+  [(match_operand:SI 0 "register_operand" "")
+   (match_operand:TF 1 "register_operand" "")]
+  "HAVE_TF (isinftf2)"
+  { EXPAND_TF (isinftf2, 2); })
+
 ;
 ; Vector byte swap patterns
 ;
diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md
index 6f1add02d0b..010db4d1115 100644
--- a/gcc/config/s390/vx-builtins.md
+++ b/gcc/config/s390/vx-builtins.md
@@ -1940,22 +1940,22 @@ (define_expand "vec_st2f"
 ; These ignore the vector result and only want CC stored to an int
 ; pointer.
 
-; vftcisb, vftcidb
+; vftcisb, vftcidb, wftcixb
 (define_insn "*vftci<mode>_cconly"
   [(set (reg:CCRAW CC_REGNUM)
-       (unspec:CCRAW [(match_operand:VECF_HW 1 "register_operand")
-                      (match_operand:HI      2 "const_int_operand")]
+       (unspec:CCRAW [(match_operand:VF_HW 1 "register_operand"  "v")
+                      (match_operand:HI    2 "const_int_operand" "J")]
                      UNSPEC_VEC_VFTCICC))
-   (clobber (match_scratch:<tointvec> 0))]
+   (clobber (match_scratch:<tointvec> 0 "=v"))]
   "TARGET_VX && CONST_OK_FOR_CONSTRAINT_P (INTVAL (operands[2]), 'J', \"J\")"
-  "vftci<sdx>b\t%v0,%v1,%x2"
+  "<vw>ftci<sdx>b\t%v0,%v1,%x2"
   [(set_attr "op_type" "VRR")])
 
 (define_expand "vftci<mode>_intcconly"
   [(parallel
     [(set (reg:CCRAW CC_REGNUM)
-         (unspec:CCRAW [(match_operand:VECF_HW 0 "register_operand")
-                        (match_operand:HI      1 "const_int_operand")]
+         (unspec:CCRAW [(match_operand:VF_HW 0 "register_operand")
+                        (match_operand:HI    1 "const_int_operand")]
                        UNSPEC_VEC_VFTCICC))
      (clobber (scratch:<tointvec>))])
    (set (match_operand:SI 2 "register_operand" "")
@@ -1965,27 +1965,27 @@ (define_expand "vftci<mode>_intcconly"
 ; vec_fp_test_data_class wants the result vector and the CC stored to
 ; an int pointer.
 
-; vftcisb, vftcidb
-(define_insn "*vftci<mode>"
-  [(set (match_operand:VECF_HW                  0 "register_operand"  "=v")
-       (unspec:VECF_HW [(match_operand:VECF_HW 1 "register_operand"   "v")
-                        (match_operand:HI      2 "const_int_operand"  "J")]
-                       UNSPEC_VEC_VFTCI))
+; vftcisb, vftcidb, wftcixb
+(define_insn "vftci<mode>"
+  [(set (match_operand:VF_HW                0 "register_operand"  "=v")
+       (unspec:VF_HW [(match_operand:VF_HW 1 "register_operand"   "v")
+                      (match_operand:HI    2 "const_int_operand"  "J")]
+                     UNSPEC_VEC_VFTCI))
    (set (reg:CCRAW CC_REGNUM)
        (unspec:CCRAW [(match_dup 1) (match_dup 2)] UNSPEC_VEC_VFTCICC))]
   "TARGET_VX && CONST_OK_FOR_CONSTRAINT_P (INTVAL (operands[2]), 'J', \"J\")"
-  "vftci<sdx>b\t%v0,%v1,%x2"
+  "<vw>ftci<sdx>b\t%v0,%v1,%x2"
   [(set_attr "op_type" "VRR")])
 
 (define_expand "vftci<mode>_intcc"
   [(parallel
-    [(set (match_operand:VECF_HW                  0 "register_operand")
-         (unspec:VECF_HW [(match_operand:VECF_HW 1 "register_operand")
-                          (match_operand:HI      2 "const_int_operand")]
-                         UNSPEC_VEC_VFTCI))
+    [(set (match_operand:VF_HW                0 "register_operand")
+         (unspec:VF_HW [(match_operand:VF_HW 1 "register_operand")
+                        (match_operand:HI    2 "const_int_operand")]
+                       UNSPEC_VEC_VFTCI))
      (set (reg:CCRAW CC_REGNUM)
          (unspec:CCRAW [(match_dup 1) (match_dup 2)] UNSPEC_VEC_VFTCICC))])
-   (set (match_operand:SI 3 "memory_operand" "")
+   (set (match_operand:SI                     3 "nonimmediate_operand")
        (unspec:SI [(reg:CCRAW CC_REGNUM)] UNSPEC_CC_TO_INT))]
   "TARGET_VX && CONST_OK_FOR_CONSTRAINT_P (INTVAL (operands[2]), 'J', \"J\")")
 
-- 
2.25.4

[PATCH 1/2] IBM Z: Store long doubles in vector registers when possible

Reply via email to