Hi!

On 2/7/22 5:05 PM, Segher Boessenkool wrote:
> Hi!
>
> On Mon, Feb 07, 2022 at 04:20:24PM -0600, Bill Schmidt wrote:
>> I observed recently that a couple of Power10 instructions and built-in 
>> functions
>> were somehow not implemented.  This patch adds one of them (vmsumcud).  
>> Although
>> this isn't normally stage-4 material, this is really simple and carries no
>> discernible risk, so I hope it can be considered.
> But what is the advantage?  That will be very tiny as well, afaics?
>
> Ah, this implements a builtin as well.  But that builtin is not in the
> PVIPR, so no one yet uses it most likely?

It's in the yet unpublished version of PVIPR that adds ISA 3.1 support,
currently awaiting public review.  It should have been implemented with
the rest of the ISA 3.1 built-ins.  (There are two more that were missed
as well, which I haven't yet addressed.)

>> gcc/
>>      * config/rs6000/rs6000-builtins.def (VMSUMCUD): New.
>>      * config/rs6000/rs6000-overload.def (VEC_MSUMC): New.
>>      * config/rs6000/vsx.md (UNSPEC_VMSUMCUD): New constant.
>>      (vmsumcud): New define_insn.
>>
>> gcc/testsuite/
>>      * gcc.target/powerpc/vec-msumc.c: New test.
>> +;; vmsumcud
>> +(define_insn "vmsumcud"
>> +[(set (match_operand:V1TI 0 "register_operand" "+v")
>> +      (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
>> +                    (match_operand:V2DI 2 "register_operand" "v")
>> +                (match_operand:V1TI 3 "register_operand" "v")]
>> +               UNSPEC_VMSUMCUD))]
>> +  "TARGET_POWER10"
>> +  "vmsumcud %0,%1,%2,%3"
>> +  [(set_attr "type" "vecsimple")]
>> +)
> This can be properly described in RTL instead of using an unspec.  This
> is much preferable.  I would say compare to maddhd[u], but those insns
> aren't implemented either (maddld is though).

Is it?  Note that vmsumcud produces the carry out of the final
result, not the result itself.  I couldn't immediately see how
to express this in RTL.

The full operation multiplies the corresponding lanes of each
doubleword of arguments 1 and 2, adds them together with the
128-bit value in argument 3, and produces the carry out of the
result as a 128-bit value in the result.  I think I'd need to
have a 256-bit mode to express this properly in RTL, right?

Thanks,
Bill

>
>
> Segher

Reply via email to