On 2/8/22 9:45 AM, Segher Boessenkool wrote:
> On Mon, Feb 07, 2022 at 10:06:36PM -0600, Bill Schmidt wrote:
>> On 2/7/22 5:05 PM, Segher Boessenkool wrote:
>>> On Mon, Feb 07, 2022 at 04:20:24PM -0600, Bill Schmidt wrote:
>>>> I observed recently that a couple of Power10 instructions and built-in 
>>>> functions
>>>> were somehow not implemented.  This patch adds one of them (vmsumcud).  
>>>> Although
>>>> this isn't normally stage-4 material, this is really simple and carries no
>>>> discernible risk, so I hope it can be considered.
>>> But what is the advantage?  That will be very tiny as well, afaics?
>>>
>>> Ah, this implements a builtin as well.  But that builtin is not in the
>>> PVIPR, so no one yet uses it most likely?
>> It's in the yet unpublished version of PVIPR that adds ISA 3.1 support,
>> currently awaiting public review.  It should have been implemented with
>> the rest of the ISA 3.1 built-ins.  (There are two more that were missed
>> as well, which I haven't yet addressed.)
> Ugh.  Too much process, not enough speed.
>
>>>> +;; vmsumcud
>>>> +(define_insn "vmsumcud"
>>>> +[(set (match_operand:V1TI 0 "register_operand" "+v")
>>>> +      (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
>>>> +                    (match_operand:V2DI 2 "register_operand" "v")
>>>> +              (match_operand:V1TI 3 "register_operand" "v")]
>>>> +             UNSPEC_VMSUMCUD))]
>>>> +  "TARGET_POWER10"
>>>> +  "vmsumcud %0,%1,%2,%3"
>>>> +  [(set_attr "type" "vecsimple")]
>>>> +)
>>> This can be properly described in RTL instead of using an unspec.  This
>>> is much preferable.  I would say compare to maddhd[u], but those insns
>>> aren't implemented either (maddld is though).
>> Is it?  Note that vmsumcud produces the carry out of the final
>> result, not the result itself.  I couldn't immediately see how
>> to express this in RTL.
> It produces thw top 128 bits of the (infinitely precise) result.  But
> yeah that requires an OImode here (for the temp itself), and we do not
> have that in the backend yet.
>
>> The full operation multiplies the corresponding lanes of each
>> doubleword of arguments 1 and 2, adds them together with the
>> 128-bit value in argument 3, and produces the carry out of the
>> result as a 128-bit value in the result.  I think I'd need to
>> have a 256-bit mode to express this properly in RTL, right?
> Not if you actually calculate the carry, instead of computing the
> 256-bit result and truncating it.  But this is very unwieldy (it
> would be fine if adding just two datums, but here there are three).
>
> Should the type be vecsimple?  Don't we have a type for multiplications?
> Hrm it looks like we use veccomplex usually.
>
> Okay for trunk with that taken care of.  Thanks!

Thanks!  Revised as requested and pushed as r12-7110 (943d631abdd7be623c).

Bill

>
>
> Segher

Reply via email to