Andrew Stubbs <andrew_stu...@mentor.com> writes:
> On 06/12/2019 18:21, Richard Sandiford wrote:
>> Andrew Stubbs <andrew_stu...@mentor.com> writes:
>>> Hi all,
>>>
>>> This patch re-enables the V64QImode and V64HImode for GCN.
>>>
>>> GCC does not make these easy to work with because there is (was?) an
>>> assumption that vector registers do not have excess bits in vector
>>> registers, and therefore does not need to worry about truncating or
>>> extending smaller types, when  vectorized. This is not true on GCN where
>>> each vector lane is always at least 32-bits wide, so we only really
>>> implement loading at storing these vectors modes (for now).
>> 
>> FWIW, partial SVE modes work the same way, and this is supposed to be
>> supported now.  E.g. SVE's VNx4QI is a vector of QIs stored in SI
>> containers; in other words, it's a VNx4SI in which only the low 8 bits
>> of each SI are used.
>> 
>> sext_optab, zext_optab and trunc_optab now support vector modes,
>> so e.g. extendv64qiv64si2 provides sign extension from V64QI to V64SI.
>> At the moment, in-register truncations like truncv64siv16qi2 have to
>> be provided as patterns, even though they're no-ops for the target
>> machine, since they're not no-ops in rtl terms.
>> 
>> And the main snag is rtl, because this isn't the way GCC expects vector
>> registers to be laid out.  It looks like you already handle that in
>> TARGET_CAN_CHANGE_MODE_CLASS and TARGET_SECONDARY_RELOAD though.
>> 
>> For SVE, partial vector loads are actually extending loads and partial
>> vector stores are truncating stores.  Maybe it's the same for amdgcn.
>> If so, there's a benefit to providing both native movv64qis
>> and V64QI->V64SI extending loads, i.e. a combine pattern the fuses
>> movv64qi with a sign_extend or zero_extend.
>> 
>> (Probably none of that is news, sorry, just saying in case.)
>
> Thanks, Richard.
>
> That it's now supposed to work is news to me; good news! :-)
>
> GCN has both unsigned and signed subword loads, so we should be able to 
> have both independent and combined loads.

Yeah, SVE supports both signed and unsigned too.  We used unsigned
for "pure" QI moves.

> How does the middle end know that QImode and HImode should be extended 
> before use? Is there a hook for that?

For SVE we just provide .md patterns for all modes and hide any adjustment
there.  This means that we can decide on a case-by-case basis whether to
use the narrow "element" mode or the wide "container" mode.

E.g. rshifts by VNx2QI would still use QImode shifts and just ignore the
extra elements.  But other operations use the container mode instead.  E.g.:

(define_insn "vec_series<mode>"
  [(set (match_operand:SVE_I 0 "register_operand" "=w, w, w")
        (vec_series:SVE_I
          (match_operand:<VEL> 1 "aarch64_sve_index_operand" "Usi, r, r")
          (match_operand:<VEL> 2 "aarch64_sve_index_operand" "r, Usi, r")))]
  "TARGET_SVE"
  "@
   index\t%0.<Vctype>, #%1, %<vwcore>2
   index\t%0.<Vctype>, %<vwcore>1, #%2
   index\t%0.<Vctype>, %<vwcore>1, %<vwcore>2"
)

(define_mode_attr Vctype [(VNx16QI "b") (VNx8QI "h") (VNx4QI "s") (VNx2QI "d")
                          ...)

So VNx2QI is actually a 64-bit ("d") operation.

For things like addition and logic ops it doesn't whether we pick the
element mode or the container mode.

I guess if the wide mode is the only option, the .md patterns for things
like rshifts would need to extend the inputs first.  There's currently
no specific option to force the vectoriser to do this itself.  (In most
cases, you might get that effect if you don't provide QI rshift patterns,
since rshifts are usually still int operations on entry to the vectoriser.
That doesn't sound very robust though.)

Thanks,
Richard

Reply via email to