Hi Michael, I am working on adding builtins for the sha insns in powerpc. I just had 2 questions regarding the new __dmf type/keyword:
1. The current builtin mma functions for e.g., __builtin_mma_xvbf16ger2pp, use __vector_quad* as a parameter. Would we require to change signatures of these builtin be changed to the new type __dmf, in case both dense math and mma is enabled? 2. Would it be possible to add a __dmf_pair (2048 bits) type needed for some of the sha instructions (dmsha3hash)? If not, I can follow a similar implementation done in this for __dmf, but I am not sure if the name should be __dmf_pair or __dmf2048. Also just one small comment on the patch below. Thanks and regards, Avinash Jayakar On Fri, 2025-11-14 at 02:57 -0500, Michael Meissner wrote: > This patch is a prelimianry patch to add the full 1,024 bit dense > math register > (DMRs) for -mcpu=future. The MMA 512-bit accumulators map onto the > top of the > DMR register. > > This patch only adds the new 1,024 bit register support. It does not > add > support for any instructions that need 1,024 bit registers instead of > 512 bit > registers. > > I used the new mode 'TDOmode' to be the opaque mode used for 1,024 > bit > registers. The 'wD' constraint added in previous patches is used for > these > registers. I added support to do load and store of DMRs via the VSX > registers, > since there are no load/store dense math instructions. I added the > new keyword > '__dmf' to create 1,024 bit types that can be loaded into DMRs. At > present, I > don't have aliases for __dmf512 and __dmf1024 that we've discussed > internally. > > I have built bootstrap GCC compilers on little endian and big endian > PowerPC servers, and there were no regressions. Can I commit this > patch to GCC 16 once the following patches have been applied? > > * > https://gcc.gnu.org/pipermail/gcc-patches/2025-November/700539.html > * > https://gcc.gnu.org/pipermail/gcc-patches/2025-November/700540.html > * > https://gcc.gnu.org/pipermail/gcc-patches/2025-November/700542.html > * > https://gcc.gnu.org/pipermail/gcc-patches/2025-November/700543.html > > 2025-11-14 Michael Meissner <[email protected]> > > gcc/ > > * config/rs6000/mma.md (UNSPEC_DM_INSERT512_UPPER): New > unspec. > (UNSPEC_DM_INSERT512_LOWER): Likewise. > (UNSPEC_DM_EXTRACT512): Likewise. > (UNSPEC_DMF_RELOAD_FROM_MEMORY): Likewise. > (UNSPEC_DMF_RELOAD_TO_MEMORY): Likewise. > (movtdo): New define_expand and define_insn_and_split to > implement 1,024 > bit DMR registers. > (movtdo_insert512_upper): New insn. > (movtdo_insert512_lower): Likewise. > (movtdo_extract512): Likewise. > (reload_dmf_from_memory): Likewise. > (reload_dmf_to_memory): Likewise. > * config/rs6000/rs6000-builtin.cc (rs6000_type_string): Add > DMF > support. > (rs6000_init_builtins): Add support for __dmf keyword. > * config/rs6000/rs6000-call.cc (rs6000_return_in_memory): > Add support > for TDOmode. > (rs6000_function_arg): Likewise. > * config/rs6000/rs6000-modes.def (TDOmode): New mode. > * config/rs6000/rs6000.cc > (rs6000_hard_regno_nregs_internal): Add > support for TDOmode. > (rs6000_hard_regno_mode_ok_uncached): Likewise. > (rs6000_hard_regno_mode_ok): Likewise. > (rs6000_modes_tieable_p): Likewise. > (rs6000_debug_reg_global): Likewise. > (rs6000_setup_reg_addr_masks): Likewise. > (rs6000_init_hard_regno_mode_ok): Add support for TDOmode. > Setup reload > hooks for DMF mode. > (reg_offset_addressing_ok_p): Add support for TDOmode. > (rs6000_emit_move): Likewise. > (rs6000_secondary_reload_simple_move): Likewise. > (rs6000_preferred_reload_class): Likewise. > (rs6000_secondary_reload_class): Likewise. > (rs6000_mangle_type): Add mangling for __dmf type. > (rs6000_dmf_register_move_cost): Add support for TDOmode. > (rs6000_split_multireg_move): Likewise. > (rs6000_invalid_conversion): Likewise. > * config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P): Add TDOmode. > (enum rs6000_builtin_type_index): Add DMF type nodes. > (dmf_type_node): Likewise. > (ptr_dmf_type_node): Likewise. > > gcc/testsuite/ > > * gcc.target/powerpc/dm-1024bit.c: New test. > * lib/target-supports.exp > (check_effective_target_ppc_dmf_ok): New > target test. > --- > gcc/config/rs6000/mma.md | 154 > ++++++++++++++++++ > gcc/config/rs6000/rs6000-builtin.cc | 17 ++ > gcc/config/rs6000/rs6000-call.cc | 10 +- > gcc/config/rs6000/rs6000-modes.def | 4 + > gcc/config/rs6000/rs6000.cc | 101 ++++++++---- > gcc/config/rs6000/rs6000.h | 6 +- > gcc/testsuite/gcc.target/powerpc/dm-1024bit.c | 63 +++++++ > gcc/testsuite/lib/target-supports.exp | 35 ++++ > 8 files changed, 356 insertions(+), 34 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/dm-1024bit.c > > diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md > index 3f5852ca2bb..d7df2a1a71a 100644 > --- a/gcc/config/rs6000/mma.md > +++ b/gcc/config/rs6000/mma.md > @@ -91,6 +91,11 @@ (define_c_enum "unspec" > UNSPEC_MMA_XXMFACC > UNSPEC_MMA_XXMTACC > UNSPEC_MMA_DMSETDMRZ > + UNSPEC_DM_INSERT512_UPPER > + UNSPEC_DM_INSERT512_LOWER > + UNSPEC_DM_EXTRACT512 > + UNSPEC_DMF_RELOAD_FROM_MEMORY > + UNSPEC_DMF_RELOAD_TO_MEMORY > ]) > > (define_c_enum "unspecv" > @@ -699,3 +704,152 @@ (define_insn "mma_<avvi4i4i4>" > "<avvi4i4i4> %A0,%x2,%x3,%4,%5,%6" > [(set_attr "type" "mma") > (set_attr "prefixed" "yes")]) > + > +;; TDOmode (__dmf keyword for 1,024 bit registers). > +(define_expand "movtdo" > + [(set (match_operand:TDO 0 "nonimmediate_operand") > + (match_operand:TDO 1 "input_operand"))] > + "TARGET_DENSE_MATH" > +{ > + rs6000_emit_move (operands[0], operands[1], TDOmode); > + DONE; > +}) > + > +(define_insn_and_split "*movtdo" > + [(set (match_operand:TDO 0 "nonimmediate_operand" > "=wa,m,wa,wD,wD,wa") > + (match_operand:TDO 1 "input_operand" "m,wa,wa,wa,wD,wD"))] > + "TARGET_DENSE_MATH > + && (gpc_reg_operand (operands[0], TDOmode) > + || gpc_reg_operand (operands[1], TDOmode))" > + "@ > + # > + # > + # > + # > + dmmr %0,%1 > + #" > + "&& reload_completed > + && (!dmf_operand (operands[0], TDOmode) || !dmf_operand > (operands[1], TDOmode))" > + [(const_int 0)] > +{ > + rtx op0 = operands[0]; > + rtx op1 = operands[1]; > + > + if (REG_P (op0) && REG_P (op1)) > + { > + int regno0 = REGNO (op0); > + int regno1 = REGNO (op1); > + > + if (DMF_REGNO_P (regno0) && VSX_REGNO_P (regno1)) > + { > + rtx op1_upper = gen_rtx_REG (XOmode, regno1); > + rtx op1_lower = gen_rtx_REG (XOmode, regno1 + 4); > + emit_insn (gen_movtdo_insert512_upper (op0, op1_upper)); > + emit_insn (gen_movtdo_insert512_lower (op0, op0, > op1_lower)); > + DONE; > + } > + > + else if (VSX_REGNO_P (regno0) && DMF_REGNO_P (regno1)) > + { > + rtx op0_upper = gen_rtx_REG (XOmode, regno0); > + rtx op0_lower = gen_rtx_REG (XOmode, regno0 + 4); > + emit_insn (gen_movtdo_extract512 (op0_upper, op1, > const0_rtx)); > + emit_insn (gen_movtdo_extract512 (op0_lower, op1, > const1_rtx)); > + DONE; > + } > + > + else > + gcc_assert (VSX_REGNO_P (regno0) && VSX_REGNO_P (regno1)); > + } > + > + rs6000_split_multireg_move (operands[0], operands[1]); > + DONE; > +} > + [(set_attr "type" > "vecload,vecstore,vecmove,vecmove,vecmove,vecmove") > + (set_attr "length" "*,*,32,8,*,8") > + (set_attr "max_prefixed_insns" "4,4,*,*,*,*")]) > + > +;; Move from VSX registers to DMF registers via two insert 512 bit > +;; instructions. > +(define_insn "movtdo_insert512_upper" > + [(set (match_operand:TDO 0 "dmf_operand" "=wD") > + (unspec:TDO [(match_operand:XO 1 "vsx_register_operand" > "wa")] > + UNSPEC_DM_INSERT512_UPPER))] > + "TARGET_DENSE_MATH" > + "dmxxinstdmr512 %0,%1,%Y1,0" > + [(set_attr "type" "mma")]) > + > +(define_insn "movtdo_insert512_lower" > + [(set (match_operand:TDO 0 "dmf_operand" "=wD") > + (unspec:TDO [(match_operand:TDO 1 "dmf_operand" "0") > + (match_operand:XO 2 "vsx_register_operand" > "wa")] > + UNSPEC_DM_INSERT512_LOWER))] > + "TARGET_DENSE_MATH" > + "dmxxinstdmr512 %0,%2,%Y2,1" > + [(set_attr "type" "mma")]) > + > +;; Move from DMF registers to VSX registers via two extract 512 bit > +;; instructions. > +(define_insn "movtdo_extract512" > + [(set (match_operand:XO 0 "vsx_register_operand" "=wa") > + (unspec:XO [(match_operand:TDO 1 "dmf_operand" "wD") > + (match_operand 2 "const_0_to_1_operand" "n")] > + UNSPEC_DM_EXTRACT512))] > + "TARGET_DENSE_MATH" > + "dmxxextfdmr512 %0,%Y0,%1,%2" > + [(set_attr "type" "mma")]) > + > +;; Reload DMF registers from memory > +(define_insn_and_split "reload_dmf_from_memory" > + [(set (match_operand:TDO 0 "dmf_operand" "=wD") > + (unspec:TDO [(match_operand:TDO 1 "memory_operand" "m")] > + UNSPEC_DMF_RELOAD_FROM_MEMORY)) > + (clobber (match_operand:XO 2 "vsx_register_operand" "=wa"))] > + "TARGET_DENSE_MATH" > + "#" > + "&& reload_completed" > + [(const_int 0)] > +{ > + rtx dest = operands[0]; > + rtx src = operands[1]; > + rtx tmp = operands[2]; > + rtx mem_upper = adjust_address (src, XOmode, BYTES_BIG_ENDIAN ? 0 > : 64); > + rtx mem_lower = adjust_address (src, XOmode, BYTES_BIG_ENDIAN ? 64 > : 0); > + > + emit_move_insn (tmp, mem_upper); > + emit_insn (gen_movtdo_insert512_upper (dest, tmp)); > + > + emit_move_insn (tmp, mem_lower); > + emit_insn (gen_movtdo_insert512_lower (dest, dest, tmp)); > + DONE; > +} > + [(set_attr "length" "16") > + (set_attr "max_prefixed_insns" "2") > + (set_attr "type" "vecload")]) > + > +;; Reload dense math registers to memory > +(define_insn_and_split "reload_dmf_to_memory" > + [(set (match_operand:TDO 0 "memory_operand" "=m") > + (unspec:TDO [(match_operand:TDO 1 "dmf_operand" "wD")] > + UNSPEC_DMF_RELOAD_TO_MEMORY)) > + (clobber (match_operand:XO 2 "vsx_register_operand" "=wa"))] > + "TARGET_DENSE_MATH" > + "#" > + "&& reload_completed" > + [(const_int 0)] > +{ > + rtx dest = operands[0]; > + rtx src = operands[1]; > + rtx tmp = operands[2]; > + rtx mem_upper = adjust_address (dest, XOmode, BYTES_BIG_ENDIAN ? 0 > : 64); > + rtx mem_lower = adjust_address (dest, XOmode, BYTES_BIG_ENDIAN ? > 64 : 0); > + > + emit_insn (gen_movtdo_extract512 (tmp, src, const0_rtx)); > + emit_move_insn (mem_upper, tmp); > + > + emit_insn (gen_movtdo_extract512 (tmp, src, const1_rtx)); > + emit_move_insn (mem_lower, tmp); > + DONE; > +} > + [(set_attr "length" "16") > + (set_attr "max_prefixed_insns" "2")]) > diff --git a/gcc/config/rs6000/rs6000-builtin.cc > b/gcc/config/rs6000/rs6000-builtin.cc > index 6b7e5686f0c..a02e4cd03ef 100644 > --- a/gcc/config/rs6000/rs6000-builtin.cc > +++ b/gcc/config/rs6000/rs6000-builtin.cc > @@ -495,6 +495,8 @@ const char *rs6000_type_string (tree type_node) > return "__vector_pair"; > else if (type_node == vector_quad_type_node) > return "__vector_quad"; > + else if (type_node == dmf_type_node) > + return "__dmf"; > > return "unknown"; > } > @@ -781,6 +783,21 @@ rs6000_init_builtins (void) > t = build_qualified_type (vector_quad_type_node, TYPE_QUAL_CONST); > ptr_vector_quad_type_node = build_pointer_type (t); > > + /* For TDOmode (1,024 bit dense math accumulators), don't use an > alignment of > + 1,024, use 512. TDOmode loads and stores are always broken up > into 2 > + vector pair loads or stores. In addition, we don't have > support for > + aligning the stack to 1,024 bits. */ > + dmf_type_node = make_node (OPAQUE_TYPE); > + SET_TYPE_MODE (dmf_type_node, TDOmode); > + TYPE_SIZE (dmf_type_node) = bitsize_int (GET_MODE_BITSIZE > (TDOmode)); > + TYPE_PRECISION (dmf_type_node) = GET_MODE_BITSIZE (TDOmode); > + TYPE_SIZE_UNIT (dmf_type_node) = size_int (GET_MODE_SIZE > (TDOmode)); > + SET_TYPE_ALIGN (dmf_type_node, 512); > + TYPE_USER_ALIGN (dmf_type_node) = 0; > + lang_hooks.types.register_builtin_type (dmf_type_node, "__dmf"); > + t = build_qualified_type (dmf_type_node, TYPE_QUAL_CONST); > + ptr_dmf_type_node = build_pointer_type (t); > + > tdecl = add_builtin_type ("__bool char", bool_char_type_node); > TYPE_NAME (bool_char_type_node) = tdecl; > > diff --git a/gcc/config/rs6000/rs6000-call.cc > b/gcc/config/rs6000/rs6000-call.cc > index 8fe5652442e..7541050ffe7 100644 > --- a/gcc/config/rs6000/rs6000-call.cc > +++ b/gcc/config/rs6000/rs6000-call.cc > @@ -437,14 +437,15 @@ rs6000_return_in_memory (const_tree type, > const_tree fntype ATTRIBUTE_UNUSED) > if (cfun > && !cfun->machine->mma_return_type_error > && TREE_TYPE (cfun->decl) == fntype > - && (TYPE_MODE (type) == OOmode || TYPE_MODE (type) == XOmode)) > + && OPAQUE_MODE_P (TYPE_MODE (type))) > { > /* Record we have now handled function CFUN, so the next time > we > are called, we do not re-report the same error. */ > cfun->machine->mma_return_type_error = true; > if (TYPE_CANONICAL (type) != NULL_TREE) > type = TYPE_CANONICAL (type); > - error ("invalid use of MMA type %qs as a function return > value", > + error ("invalid use of %s type %qs as a function return > value", > + (TYPE_MODE (type) == TDOmode) ? "dense math" : "MMA", > IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type)))); > } > > @@ -1632,11 +1633,12 @@ rs6000_function_arg (cumulative_args_t cum_v, > const function_arg_info &arg) > int n_elts; > > /* We do not allow MMA types being used as function arguments. */ > - if (mode == OOmode || mode == XOmode) > + if (OPAQUE_MODE_P (mode)) > { > if (TYPE_CANONICAL (type) != NULL_TREE) > type = TYPE_CANONICAL (type); > - error ("invalid use of MMA operand of type %qs as a function > parameter", > + error ("invalid use of %s operand of type %qs as a function > parameter", > + (mode == TDOmode) ? "dense math" : "MMA", > IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type)))); > return NULL_RTX; > } > diff --git a/gcc/config/rs6000/rs6000-modes.def > b/gcc/config/rs6000/rs6000-modes.def > index f89e4ef403c..9a8b505ab6a 100644 > --- a/gcc/config/rs6000/rs6000-modes.def > +++ b/gcc/config/rs6000/rs6000-modes.def > @@ -79,3 +79,7 @@ PARTIAL_INT_MODE (TI, 128, PTI); > /* Modes used by __vector_pair and __vector_quad. */ > OPAQUE_MODE (OO, 32); > OPAQUE_MODE (XO, 64); > + > +/* Mode used by __dmf. */ > +OPAQUE_MODE (TDO, 128); > + > diff --git a/gcc/config/rs6000/rs6000.cc > b/gcc/config/rs6000/rs6000.cc In this file, should we also update rs6000_opaque_type_invalid_use_p function to say __dmf requires "-mdense-math" flag? > index 570e8a14f2d..635f05d0d02 100644 > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -1842,7 +1842,8 @@ rs6000_hard_regno_nregs_internal (int regno, > machine_mode mode) > 128-bit floating point that can go in vector registers, which > has VSX > memory addressing. */ > if (FP_REGNO_P (regno)) > - reg_size = (VECTOR_MEM_VSX_P (mode) || VECTOR_ALIGNMENT_P (mode) > + reg_size = (VECTOR_MEM_VSX_P (mode) > + || VECTOR_ALIGNMENT_P (mode) > ? UNITS_PER_VSX_WORD > : UNITS_PER_FP_WORD); > > @@ -1882,13 +1883,13 @@ rs6000_hard_regno_mode_ok_uncached (int > regno, machine_mode mode) > Because we just use the VSX registers for load/store > operations, we just > need to make sure load vector pair and store vector pair > instructions can > be used. */ > - if (mode == XOmode) > + if (mode == XOmode || mode == TDOmode) > { > if (!TARGET_MMA) > return 0; > > else if (!TARGET_DENSE_MATH) > - return (FP_REGNO_P (regno) && (regno & 3) == 0); > + return (mode == XOmode && FP_REGNO_P (regno) && (regno & 3) > == 0); > > else if (DMF_REGNO_P (regno)) > return 1; > @@ -1899,7 +1900,7 @@ rs6000_hard_regno_mode_ok_uncached (int regno, > machine_mode mode) > && (regno & 1) == 0); > } > > - /* No other types other than XOmode can go in DMFs. */ > + /* No other types other than XOmode can go in dense math > registers. */ > if (DMF_REGNO_P (regno)) > return 0; > > @@ -2007,9 +2008,11 @@ rs6000_hard_regno_mode_ok (unsigned int regno, > machine_mode mode) > GPR registers, and TImode can go in any GPR as well as VSX > registers (PR > 57744). > > - Similarly, don't allow OOmode (vector pair, restricted to even > VSX > - registers) or XOmode (vector quad, restricted to FPR registers > divisible > - by 4) to tie with other modes. > + Similarly, don't allow OOmode (vector pair), XOmode (vector > quad), or > + TDOmode (dense math register) to pair with anything else. Vector > pairs are > + restricted to even/odd VSX registers. Without dense math, vector > quads are > + limited to FPR registers divisible by 4. With dense math, vector > quads are > + limited to even VSX registers or DMF registers. > > Altivec/VSX vector tests were moved ahead of scalar float mode, > so that IEEE > 128-bit floating point on VSX systems ties with other vectors. > */ > @@ -2018,7 +2021,8 @@ static bool > rs6000_modes_tieable_p (machine_mode mode1, machine_mode mode2) > { > if (mode1 == PTImode || mode1 == OOmode || mode1 == XOmode > - || mode2 == PTImode || mode2 == OOmode || mode2 == XOmode) > + || mode1 == TDOmode || mode2 == PTImode || mode2 == OOmode > + || mode2 == XOmode || mode2 == TDOmode) > return mode1 == mode2; > > if (ALTIVEC_OR_VSX_VECTOR_MODE (mode1)) > @@ -2309,6 +2313,7 @@ rs6000_debug_reg_global (void) > V4DFmode, > OOmode, > XOmode, > + TDOmode, > CCmode, > CCUNSmode, > CCEQmode, > @@ -2674,7 +2679,7 @@ rs6000_setup_reg_addr_masks (void) > /* Special case DMF registers. */ > if (rc == RELOAD_REG_DMF) > { > - if (TARGET_DENSE_MATH && m2 == XOmode) > + if (TARGET_DENSE_MATH && (m2 == XOmode || m2 == > TDOmode)) > { > addr_mask = RELOAD_REG_VALID; > reg_addr[m].addr_mask[rc] = addr_mask; > @@ -2781,10 +2786,10 @@ rs6000_setup_reg_addr_masks (void) > > /* Vector pairs can do both indexed and offset loads if > the > instructions are enabled, otherwise they can only do > offset loads > - since it will be broken into two vector moves. Vector > quads can > - only do offset loads. */ > + since it will be broken into two vector moves. Vector > quads and > + dense math types can only do offset loads. */ > else if ((addr_mask != 0) && TARGET_MMA > - && (m2 == OOmode || m2 == XOmode)) > + && (m2 == OOmode || m2 == XOmode || m2 == > TDOmode)) > { > addr_mask |= RELOAD_REG_OFFSET; > if (rc == RELOAD_REG_FPR || rc == RELOAD_REG_VMX) > @@ -3012,6 +3017,14 @@ rs6000_init_hard_regno_mode_ok (bool > global_init_p) > rs6000_vector_align[XOmode] = 512; > } > > + /* Add support for 1,024 bit DMF registers. */ > + if (TARGET_DENSE_MATH) > + { > + rs6000_vector_unit[TDOmode] = VECTOR_NONE; > + rs6000_vector_mem[TDOmode] = VECTOR_VSX; > + rs6000_vector_align[TDOmode] = 512; > + } > + > /* Register class constraints for the constraints that depend on > compile > switches. When the VSX code was added, different constraints > were added > based on the type (DFmode, V2DFmode, V4SFmode). For the vector > types, all > @@ -3224,6 +3237,12 @@ rs6000_init_hard_regno_mode_ok (bool > global_init_p) > } > } > > + if (TARGET_DENSE_MATH) > + { > + reg_addr[TDOmode].reload_load = > CODE_FOR_reload_dmf_from_memory; > + reg_addr[TDOmode].reload_store = > CODE_FOR_reload_dmf_to_memory; > + } > + > /* Precalculate HARD_REGNO_NREGS. */ > for (r = 0; HARD_REGISTER_NUM_P (r); ++r) > for (m = 0; m < NUM_MACHINE_MODES; ++m) > @@ -8738,12 +8757,15 @@ reg_offset_addressing_ok_p (machine_mode > mode) > return mode_supports_dq_form (mode); > break; > > - /* The vector pair/quad types support offset addressing if the > - underlying vectors support offset addressing. */ > + /* The vector pair/quad types and the dense math types support > offset > + addressing if the underlying vectors support offset > addressing. */ > case E_OOmode: > case E_XOmode: > return TARGET_MMA; > > + case E_TDOmode: > + return TARGET_DENSE_MATH; > + > case E_SDmode: > /* If we can do direct load/stores of SDmode, restrict it to > reg+reg > addressing for the LFIWZX and STFIWX instructions. */ > @@ -11297,6 +11319,12 @@ rs6000_emit_move (rtx dest, rtx source, > machine_mode mode) > (mode == OOmode) ? "__vector_pair" : > "__vector_quad"); > break; > > + case E_TDOmode: > + if (CONST_INT_P (operands[1])) > + error ("%qs is an opaque type, and you cannot set it to > constants", > + "__dmf"); > + break; > + > case E_SImode: > case E_DImode: > /* Use default pattern for address of ELF small data */ > @@ -12760,7 +12788,7 @@ rs6000_secondary_reload_simple_move (enum > rs6000_reg_type to_type, > > /* We can transfer between VSX registers and DMF registers without > needing > extra registers. */ > - if (TARGET_DENSE_MATH && mode == XOmode > + if (TARGET_DENSE_MATH && (mode == XOmode || mode == TDOmode) > && ((to_type == DMF_REG_TYPE && from_type == VSX_REG_TYPE) > || (to_type == VSX_REG_TYPE && from_type == > DMF_REG_TYPE))) > return true; > @@ -13561,6 +13589,9 @@ rs6000_preferred_reload_class (rtx x, enum > reg_class rclass) > if (mode == XOmode) > return TARGET_DENSE_MATH ? VSX_REGS : FLOAT_REGS; > > + if (mode == TDOmode) > + return VSX_REGS; > + > if (GET_MODE_CLASS (mode) == MODE_INT) > return GENERAL_REGS; > } > @@ -20740,6 +20771,8 @@ rs6000_mangle_type (const_tree type) > return "u13__vector_pair"; > if (type == vector_quad_type_node) > return "u13__vector_quad"; > + if (type == dmf_type_node) > + return "u5__dmf"; > > /* For all other types, use the default mangling. */ > return NULL; > @@ -22870,6 +22903,10 @@ rs6000_dmf_register_move_cost (machine_mode > mode, reg_class_t rclass) > if (mode == XOmode) > return reg_move_base; > > + /* __dmf (i.e. TDOmode) is transferred in 2 instructions. */ > + else if (mode == TDOmode) > + return reg_move_base * 2; > + > else > return reg_move_base * 2 * hard_regno_nregs > (FIRST_DMF_REGNO, mode); > } > @@ -27556,9 +27593,10 @@ rs6000_split_multireg_move (rtx dst, rtx > src) > mode = GET_MODE (dst); > nregs = hard_regno_nregs (reg, mode); > > - /* If we have a vector quad register for MMA, and this is a load > or store, > - see if we can use vector paired load/stores. */ > - if (mode == XOmode && TARGET_MMA > + /* If we have a vector quad register for MMA or DMF register for > dense math, > + and this is a load or store, see if we can use vector paired > + load/stores. */ > + if ((mode == XOmode || mode == TDOmode) && TARGET_MMA > && (MEM_P (dst) || MEM_P (src))) > { > reg_mode = OOmode; > @@ -27566,7 +27604,7 @@ rs6000_split_multireg_move (rtx dst, rtx src) > } > /* If we have a vector pair/quad mode, split it into two/four > separate > vectors. */ > - else if (mode == OOmode || mode == XOmode) > + else if (mode == OOmode || mode == XOmode || mode == TDOmode) > reg_mode = V1TImode; > else if (FP_REGNO_P (reg)) > reg_mode = DECIMAL_FLOAT_MODE_P (mode) ? DDmode : > @@ -27612,13 +27650,13 @@ rs6000_split_multireg_move (rtx dst, rtx > src) > return; > } > > - /* The __vector_pair and __vector_quad modes are multi-register > - modes, so if we have to load or store the registers, we have to > be > - careful to properly swap them if we're in little endian mode > - below. This means the last register gets the first memory > - location. We also need to be careful of using the right > register > - numbers if we are splitting XO to OO. */ > - if (mode == OOmode || mode == XOmode) > + /* The __vector_pair, __vector_quad, and __dmf modes are multi- > register > + modes, so if we have to load or store the registers, we have to > be careful > + to properly swap them if we're in little endian mode below. > This means > + the last register gets the first memory location. We also need > to be > + careful of using the right register numbers if we are splitting > XO to > + OO. */ > + if (mode == OOmode || mode == XOmode || mode == TDOmode) > { > nregs = hard_regno_nregs (reg, mode); > int reg_mode_nregs = hard_regno_nregs (reg, reg_mode); > @@ -27755,7 +27793,7 @@ rs6000_split_multireg_move (rtx dst, rtx src) > overlap. */ > int i; > /* XO/OO are opaque so cannot use subregs. */ > - if (mode == OOmode || mode == XOmode ) > + if (mode == OOmode || mode == XOmode || mode == TDOmode) > { > for (i = nregs - 1; i >= 0; i--) > { > @@ -27929,7 +27967,7 @@ rs6000_split_multireg_move (rtx dst, rtx src) > continue; > > /* XO/OO are opaque so cannot use subregs. */ > - if (mode == OOmode || mode == XOmode ) > + if (mode == OOmode || mode == XOmode || mode == TDOmode) > { > rtx dst_i = gen_rtx_REG (reg_mode, REGNO (dst) + j); > rtx src_i = gen_rtx_REG (reg_mode, REGNO (src) + j); > @@ -28957,7 +28995,8 @@ rs6000_invalid_conversion (const_tree > fromtype, const_tree totype) > > if (frommode != tomode) > { > - /* Do not allow conversions to/from XOmode and OOmode types. > */ > + /* Do not allow conversions to/from XOmode, OOmode, and > TDOmode > + types. */ > if (frommode == XOmode) > return N_("invalid conversion from type %<__vector_quad%>"); > if (tomode == XOmode) > @@ -28966,6 +29005,10 @@ rs6000_invalid_conversion (const_tree > fromtype, const_tree totype) > return N_("invalid conversion from type %<__vector_pair%>"); > if (tomode == OOmode) > return N_("invalid conversion to type %<__vector_pair%>"); > + if (frommode == TDOmode) > + return N_("invalid conversion from type %<__dmf%>"); > + if (tomode == TDOmode) > + return N_("invalid conversion to type %<__dmf%>"); > } > > /* Conversion allowed. */ > diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h > index 169d81e208e..cae8f269cf1 100644 > --- a/gcc/config/rs6000/rs6000.h > +++ b/gcc/config/rs6000/rs6000.h > @@ -986,7 +986,7 @@ enum data_align { align_abi, align_opt, > align_both }; > /* Modes that are not vectors, but require vector alignment. Treat > these like > vectors in terms of loads and stores. */ > #define > VECTOR_ALIGNMENT_P(MODE) \ > - (FLOAT128_VECTOR_P (MODE) || (MODE) == OOmode || (MODE) == XOmode) > + (FLOAT128_VECTOR_P (MODE) || OPAQUE_MODE_P (MODE)) > > #define > ALTIVEC_VECTOR_MODE(MODE) \ > ((MODE) == > V16QImode \ > @@ -2277,6 +2277,7 @@ enum rs6000_builtin_type_index > RS6000_BTI_const_str, /* pointer to const char * > */ > RS6000_BTI_vector_pair, /* unsigned 256-bit types (vector > pair). */ > RS6000_BTI_vector_quad, /* unsigned 512-bit types (vector > quad). */ > + RS6000_BTI_dmf, /* unsigned 1,024-bit types (dmf). > */ > RS6000_BTI_const_ptr_void, /* const pointer to void */ > RS6000_BTI_ptr_V16QI, > RS6000_BTI_ptr_V1TI, > @@ -2315,6 +2316,7 @@ enum rs6000_builtin_type_index > RS6000_BTI_ptr_dfloat128, > RS6000_BTI_ptr_vector_pair, > RS6000_BTI_ptr_vector_quad, > + RS6000_BTI_ptr_dmf, > RS6000_BTI_ptr_long_long, > RS6000_BTI_ptr_long_long_unsigned, > RS6000_BTI_MAX > @@ -2372,6 +2374,7 @@ enum rs6000_builtin_type_index > #define const_str_type_node > (rs6000_builtin_types[RS6000_BTI_const_str]) > #define vector_pair_type_node > (rs6000_builtin_types[RS6000_BTI_vector_pair]) > #define vector_quad_type_node > (rs6000_builtin_types[RS6000_BTI_vector_quad]) > +#define dmf_type_node > (rs6000_builtin_types[RS6000_BTI_dmf]) > #define pcvoid_type_node > (rs6000_builtin_types[RS6000_BTI_const_ptr_void]) > #define ptr_V16QI_type_node > (rs6000_builtin_types[RS6000_BTI_ptr_V16QI]) > #define ptr_V1TI_type_node > (rs6000_builtin_types[RS6000_BTI_ptr_V1TI]) > @@ -2410,6 +2413,7 @@ enum rs6000_builtin_type_index > #define ptr_dfloat128_type_node > (rs6000_builtin_types[RS6000_BTI_ptr_dfloat128]) > #define ptr_vector_pair_type_node > (rs6000_builtin_types[RS6000_BTI_ptr_vector_pair]) > #define ptr_vector_quad_type_node > (rs6000_builtin_types[RS6000_BTI_ptr_vector_quad]) > +#define ptr_dmf_type_node > (rs6000_builtin_types[RS6000_BTI_ptr_dmf]) > #define ptr_long_long_integer_type_node > (rs6000_builtin_types[RS6000_BTI_ptr_long_long]) > #define ptr_long_long_unsigned_type_node > (rs6000_builtin_types[RS6000_BTI_ptr_long_long_unsigned]) > > diff --git a/gcc/testsuite/gcc.target/powerpc/dm-1024bit.c > b/gcc/testsuite/gcc.target/powerpc/dm-1024bit.c > new file mode 100644 > index 00000000000..1d52184c998 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/dm-1024bit.c > @@ -0,0 +1,63 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target powerpc_dense_math_ok } */ > +/* { dg-options "-mdejagnu-cpu=future -O2" } */ > + > +/* Test basic load/store for __dmf type. */ > + > +#ifndef CONSTRAINT > +#if defined(USE_D) > +#define CONSTRAINT "d" > + > +#elif defined(USE_V) > +#define CONSTRAINT "v" > + > +#elif defined(USE_WA) > +#define CONSTRAINT "wa" > + > +#else > +#define CONSTRAINT "wD" > +#endif > +#endif > +const char constraint[] = CONSTRAINT; > + > +void foo_mem_asm (__dmf *p, __dmf *q) > +{ > + /* 2 LXVP instructions. */ > + __dmf vq = *p; > + > + /* 2 DMXXINSTDMR512 instructions to transfer VSX to DMF. */ > + __asm__ ("# foo (" CONSTRAINT ") %A0" : "+" CONSTRAINT (vq)); > + /* 2 DMXXEXTFDMR512 instructions to transfer DMF to VSX. */ > + > + /* 2 STXVP instructions. */ > + *q = vq; > +} > + > +void foo_mem_asm2 (__dmf *p, __dmf *q) > +{ > + /* 2 LXVP instructions. */ > + __dmf vq = *p; > + __dmf vq2; > + __dmf vq3; > + > + /* 2 DMXXINSTDMR512 instructions to transfer VSX to DMF. */ > + __asm__ ("# foo1 (" CONSTRAINT ") %A0" : "+" CONSTRAINT (vq)); > + /* 2 DMXXEXTFDMR512 instructions to transfer DMF to VSX. */ > + > + vq2 = vq; > + __asm__ ("# foo2 (wa) %0" : "+wa" (vq2)); > + > + /* 2 STXVP instructions. */ > + *q = vq2; > +} > + > +void foo_mem (__dmf *p, __dmf *q) > +{ > + /* 2 LXVP, 2 STXVP instructions, no DMF transfer. */ > + *q = *p; > +} > + > +/* { dg-final { scan-assembler-times {\mdmxxextfdmr512\M} 4 } } */ > +/* { dg-final { scan-assembler-times {\mdmxxinstdmr512\M} 4 } } */ > +/* { dg-final { scan-assembler-times {\mlxvp\M} 12 } } */ > +/* { dg-final { scan-assembler-times {\mstxvp\M} 12 } } */ > diff --git a/gcc/testsuite/lib/target-supports.exp > b/gcc/testsuite/lib/target-supports.exp > index 67f1a3c8230..4f9a79702cb 100644 > --- a/gcc/testsuite/lib/target-supports.exp > +++ b/gcc/testsuite/lib/target-supports.exp > @@ -7839,6 +7839,41 @@ proc check_effective_target_power10_ok { } { > } > } > > +# Return 1 if this is a PowerPC target supporting -mcpu=future which > enables > +# some potential new instructions. > +proc check_effective_target_powerpc_future_ok { } { > + return [check_no_compiler_messages powerpc_future_ok object { > + #ifndef _ARCH_PWR_FUTURE > + #error "-mcpu=future is not supported" > + #else > + int dummy; > + #endif > + } "-mcpu=future"] > +} > + > +# Return 1 if this is a PowerPC target supporting -mcpu=future which > enables > +# the dense math operations. > +proc check_effective_target_powerpc_dense_math_ok { } { > + if { ([istarget powerpc*-*-*]) } { > + return [check_no_compiler_messages powerpc_dense_math_ok > object { > + __vector_quad vq; > + int main (void) { > + #ifndef __DENSE_MATH__ > + #error "target does not have dense math support." > + #else > + /* Make sure we have dense math support. */ > + __vector_quad dmr; > + __asm__ ("dmsetaccz %A0" : "=wD" (dmr)); > + vq = dmr; > + #endif > + return 0; > + } > + } "-mcpu=future"] > + } else { > + return 0; > + } > +} > + > # Return 1 if this is a PowerPC target supporting -mfloat128 via > either > # software emulation on power7/power8 systems or hardware support on > power9. > > -- > 2.51.1 >
