On 2/8/22 9:45 AM, Segher Boessenkool wrote: > On Mon, Feb 07, 2022 at 10:06:36PM -0600, Bill Schmidt wrote: >> On 2/7/22 5:05 PM, Segher Boessenkool wrote: >>> On Mon, Feb 07, 2022 at 04:20:24PM -0600, Bill Schmidt wrote: >>>> I observed recently that a couple of Power10 instructions and built-in >>>> functions >>>> were somehow not implemented. This patch adds one of them (vmsumcud). >>>> Although >>>> this isn't normally stage-4 material, this is really simple and carries no >>>> discernible risk, so I hope it can be considered. >>> But what is the advantage? That will be very tiny as well, afaics? >>> >>> Ah, this implements a builtin as well. But that builtin is not in the >>> PVIPR, so no one yet uses it most likely? >> It's in the yet unpublished version of PVIPR that adds ISA 3.1 support, >> currently awaiting public review. It should have been implemented with >> the rest of the ISA 3.1 built-ins. (There are two more that were missed >> as well, which I haven't yet addressed.) > Ugh. Too much process, not enough speed. > >>>> +;; vmsumcud >>>> +(define_insn "vmsumcud" >>>> +[(set (match_operand:V1TI 0 "register_operand" "+v") >>>> + (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v") >>>> + (match_operand:V2DI 2 "register_operand" "v") >>>> + (match_operand:V1TI 3 "register_operand" "v")] >>>> + UNSPEC_VMSUMCUD))] >>>> + "TARGET_POWER10" >>>> + "vmsumcud %0,%1,%2,%3" >>>> + [(set_attr "type" "vecsimple")] >>>> +) >>> This can be properly described in RTL instead of using an unspec. This >>> is much preferable. I would say compare to maddhd[u], but those insns >>> aren't implemented either (maddld is though). >> Is it? Note that vmsumcud produces the carry out of the final >> result, not the result itself. I couldn't immediately see how >> to express this in RTL. > It produces thw top 128 bits of the (infinitely precise) result. But > yeah that requires an OImode here (for the temp itself), and we do not > have that in the backend yet. > >> The full operation multiplies the corresponding lanes of each >> doubleword of arguments 1 and 2, adds them together with the >> 128-bit value in argument 3, and produces the carry out of the >> result as a 128-bit value in the result. I think I'd need to >> have a 256-bit mode to express this properly in RTL, right? > Not if you actually calculate the carry, instead of computing the > 256-bit result and truncating it. But this is very unwieldy (it > would be fine if adding just two datums, but here there are three). > > Should the type be vecsimple? Don't we have a type for multiplications? > Hrm it looks like we use veccomplex usually. > > Okay for trunk with that taken care of. Thanks!
Thanks! Revised as requested and pushed as r12-7110 (943d631abdd7be623c). Bill > > > Segher