Tamar Christina <[email protected]> writes:
>> -----Original Message-----
>> From: Richard Sandiford <[email protected]>
>> Sent: Wednesday, March 5, 2025 11:27 AM
>> To: Tamar Christina <[email protected]>
>> Cc: [email protected]; nd <[email protected]>; Richard Earnshaw
>> <[email protected]>; [email protected]
>> Subject: Re: [1/3 PATCH]AArch64: add support for partial modes to last
>> extractions [PR118464]
>>
>> Tamar Christina <[email protected]> writes:
>> >> > diff --git a/gcc/config/aarch64/aarch64-sve.md
>> b/gcc/config/aarch64/aarch64-
>> >> sve.md
>> >> > index
>> >>
>> e975286a01904bec0b283b7ba4afde6f0fd60bf1..6c0be3c1a51449274720175b
>> >> 5e6e7d7535928de6 100644
>> >> > --- a/gcc/config/aarch64/aarch64-sve.md
>> >> > +++ b/gcc/config/aarch64/aarch64-sve.md
>> >> > @@ -3107,7 +3107,7 @@ (define_insn "@extract_<last_op>_<mode>"
>> >> > [(set (match_operand:<VEL> 0 "register_operand")
>> >> > (unspec:<VEL>
>> >> > [(match_operand:<VPRED> 1 "register_operand")
>> >> > - (match_operand:SVE_FULL 2 "register_operand")]
>> >> > + (match_operand:SVE_ALL 2 "register_operand")]
>> >> > LAST))]
>> >> > "TARGET_SVE"
>> >> > {@ [ cons: =0 , 1 , 2 ]
>> >>
>> >> It looks like this will use (say):
>> >>
>> >> lasta b<n>, pg, z<m>.b
>> >>
>> >> for VNx4QI, is that right? I don't think that's safe, since the .b form
>> >> treats all bits of the pg input as significant, whereas only one in every
>> >> four bits of pg is defined for VNx4BI (the predicate associated with
>> >> VNx4QI).
>> >>
>> >> I think converting these patterns to partial vectors means operating
>> >> on containers rather than elements. E.g. the VNx4QI case should use
>> >> .s rather than .b. That should just be a case of changing vwcore to
>> >> vccore and Vetype to Vctype, but I haven't looked too deeply.
>> >
>> > Ah I see, so for partial types, the values are not expected to be packed
>> > in the
>> lower
>> > part of the vector, but instead are "padded"?
>>
>> Right.
>>
>> > That explains some of the other patterns
>> > I was confused about.
>> >
>> > Any ideas how to test these? It's hard to control what modes the vectorizer
>> picks..
>>
>> Yeah, agreed. I think it'd be difficult to trigger it reliably from the
>> vectoriser given its current limited use of the ifns.
>>
>> A gimple frontend test might work though, with a predicate/mask
>> generated from (say) 16-bit elements, then bitcast to a predicate/mask
>> for 32-bit elements and used as an input to an explicit ifn on 32-bit
>> elements. If the 16-bit predicate contains 0, 1 for some even-aligned
>> pair, after the last 1, 0 aligned pair, then the code would likely have
>> picked the wrong element.
>
> *insert chaos emoji* I realized the testcases in the testsuite early break
> vect tests needed this since that's how I noticed it anyway.. so it's enough
> to run that with SVE enabled (as there are tests depending on partial vectors
> anyway)
>
> So...
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>
> PR tree-optimization/118464
> PR tree-optimization/116855
> * config/aarch64/aarch64-sve.md (@extract_<last_op>_<mode>,
> @fold_extract_<last_op>_<mode>,
> @aarch64_fold_extract_vector_<last_op>_<mode>): Change SVE_FULL to
> SVE_ALL/
>
> -- inline copy of patch --
>
> diff --git a/gcc/config/aarch64/aarch64-sve.md
> b/gcc/config/aarch64/aarch64-sve.md
> index
> a93bc463a909ea28460cc7877275fce16e05f7e6..205eeec2e35544de848e0dbb48e3f5ae59391a88
> 100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -3107,12 +3107,12 @@ (define_insn "@extract_<last_op>_<mode>"
> [(set (match_operand:<VEL> 0 "register_operand")
> (unspec:<VEL>
> [(match_operand:<VPRED> 1 "register_operand")
> - (match_operand:SVE_FULL 2 "register_operand")]
> + (match_operand:SVE_ALL 2 "register_operand")]
> LAST))]
> "TARGET_SVE"
> {@ [ cons: =0 , 1 , 2 ]
> - [ ?r , Upl , w ] last<ab>\t%<vwcore>0, %1, %2.<Vetype>
> - [ w , Upl , w ] last<ab>\t%<Vetype>0, %1, %2.<Vetype>
> + [ ?r , Upl , w ] last<ab>\t%<vwcore>0, %1, %2.<Vctype>
This looks good for the Vetype->Vctype part, but it still needs the
vwcore->vccore change too. If we don't do that, we could end up pairing
a w destination with a .d source, which would trigger an assembler error.
Thanks,
Richard
> + [ w , Upl , w ] last<ab>\t%<Vctype>0, %1, %2.<Vctype>
> }
> )
>
> @@ -8899,26 +8899,26 @@ (define_insn "@fold_extract_<last_op>_<mode>"
> (unspec:<VEL>
> [(match_operand:<VEL> 1 "register_operand")
> (match_operand:<VPRED> 2 "register_operand")
> - (match_operand:SVE_FULL 3 "register_operand")]
> + (match_operand:SVE_ALL 3 "register_operand")]
> CLAST))]
> "TARGET_SVE"
> {@ [ cons: =0 , 1 , 2 , 3 ]
> - [ ?r , 0 , Upl , w ] clast<ab>\t%<vwcore>0, %2, %<vwcore>0,
> %3.<Vetype>
> - [ w , 0 , Upl , w ] clast<ab>\t%<Vetype>0, %2, %<Vetype>0,
> %3.<Vetype>
> + [ ?r , 0 , Upl , w ] clast<ab>\t%<vwcore>0, %2, %<vwcore>0,
> %3.<Vctype>
> + [ w , 0 , Upl , w ] clast<ab>\t%<Vctype>0, %2, %<Vctype>0,
> %3.<Vctype>
> }
> )
>
> (define_insn "@aarch64_fold_extract_vector_<last_op>_<mode>"
> - [(set (match_operand:SVE_FULL 0 "register_operand")
> - (unspec:SVE_FULL
> - [(match_operand:SVE_FULL 1 "register_operand")
> + [(set (match_operand:SVE_ALL 0 "register_operand")
> + (unspec:SVE_ALL
> + [(match_operand:SVE_ALL 1 "register_operand")
> (match_operand:<VPRED> 2 "register_operand")
> - (match_operand:SVE_FULL 3 "register_operand")]
> + (match_operand:SVE_ALL 3 "register_operand")]
> CLAST))]
> "TARGET_SVE"
> {@ [ cons: =0 , 1 , 2 , 3 ]
> - [ w , 0 , Upl , w ] clast<ab>\t%0.<Vetype>, %2, %0.<Vetype>,
> %3.<Vetype>
> - [ ?&w , w , Upl , w ] movprfx\t%0, %1\;clast<ab>\t%0.<Vetype>,
> %2, %0.<Vetype>, %3.<Vetype>
> + [ w , 0 , Upl , w ] clast<ab>\t%0.<Vctype>, %2, %0.<Vctype>,
> %3.<Vctype>
> + [ ?&w , w , Upl , w ] movprfx\t%0, %1\;clast<ab>\t%0.<Vctype>,
> %2, %0.<Vctype>, %3.<Vctype>
> }
> )