Hi! On 2/7/22 5:05 PM, Segher Boessenkool wrote: > Hi! > > On Mon, Feb 07, 2022 at 04:20:24PM -0600, Bill Schmidt wrote: >> I observed recently that a couple of Power10 instructions and built-in >> functions >> were somehow not implemented. This patch adds one of them (vmsumcud). >> Although >> this isn't normally stage-4 material, this is really simple and carries no >> discernible risk, so I hope it can be considered. > But what is the advantage? That will be very tiny as well, afaics? > > Ah, this implements a builtin as well. But that builtin is not in the > PVIPR, so no one yet uses it most likely?
It's in the yet unpublished version of PVIPR that adds ISA 3.1 support, currently awaiting public review. It should have been implemented with the rest of the ISA 3.1 built-ins. (There are two more that were missed as well, which I haven't yet addressed.) >> gcc/ >> * config/rs6000/rs6000-builtins.def (VMSUMCUD): New. >> * config/rs6000/rs6000-overload.def (VEC_MSUMC): New. >> * config/rs6000/vsx.md (UNSPEC_VMSUMCUD): New constant. >> (vmsumcud): New define_insn. >> >> gcc/testsuite/ >> * gcc.target/powerpc/vec-msumc.c: New test. >> +;; vmsumcud >> +(define_insn "vmsumcud" >> +[(set (match_operand:V1TI 0 "register_operand" "+v") >> + (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v") >> + (match_operand:V2DI 2 "register_operand" "v") >> + (match_operand:V1TI 3 "register_operand" "v")] >> + UNSPEC_VMSUMCUD))] >> + "TARGET_POWER10" >> + "vmsumcud %0,%1,%2,%3" >> + [(set_attr "type" "vecsimple")] >> +) > This can be properly described in RTL instead of using an unspec. This > is much preferable. I would say compare to maddhd[u], but those insns > aren't implemented either (maddld is though). Is it? Note that vmsumcud produces the carry out of the final result, not the result itself. I couldn't immediately see how to express this in RTL. The full operation multiplies the corresponding lanes of each doubleword of arguments 1 and 2, adds them together with the 128-bit value in argument 3, and produces the carry out of the result as a 128-bit value in the result. I think I'd need to have a 256-bit mode to express this properly in RTL, right? Thanks, Bill > > > Segher