> -----Original Message----- > From: Richard Sandiford <[email protected]> > Sent: Wednesday, March 5, 2025 11:27 AM > To: Tamar Christina <[email protected]> > Cc: [email protected]; nd <[email protected]>; Richard Earnshaw > <[email protected]>; [email protected] > Subject: Re: [1/3 PATCH]AArch64: add support for partial modes to last > extractions [PR118464] > > Tamar Christina <[email protected]> writes: > >> > diff --git a/gcc/config/aarch64/aarch64-sve.md > b/gcc/config/aarch64/aarch64- > >> sve.md > >> > index > >> > e975286a01904bec0b283b7ba4afde6f0fd60bf1..6c0be3c1a51449274720175b > >> 5e6e7d7535928de6 100644 > >> > --- a/gcc/config/aarch64/aarch64-sve.md > >> > +++ b/gcc/config/aarch64/aarch64-sve.md > >> > @@ -3107,7 +3107,7 @@ (define_insn "@extract_<last_op>_<mode>" > >> > [(set (match_operand:<VEL> 0 "register_operand") > >> > (unspec:<VEL> > >> > [(match_operand:<VPRED> 1 "register_operand") > >> > - (match_operand:SVE_FULL 2 "register_operand")] > >> > + (match_operand:SVE_ALL 2 "register_operand")] > >> > LAST))] > >> > "TARGET_SVE" > >> > {@ [ cons: =0 , 1 , 2 ] > >> > >> It looks like this will use (say): > >> > >> lasta b<n>, pg, z<m>.b > >> > >> for VNx4QI, is that right? I don't think that's safe, since the .b form > >> treats all bits of the pg input as significant, whereas only one in every > >> four bits of pg is defined for VNx4BI (the predicate associated with > >> VNx4QI). > >> > >> I think converting these patterns to partial vectors means operating > >> on containers rather than elements. E.g. the VNx4QI case should use > >> .s rather than .b. That should just be a case of changing vwcore to > >> vccore and Vetype to Vctype, but I haven't looked too deeply. > > > > Ah I see, so for partial types, the values are not expected to be packed in > > the > lower > > part of the vector, but instead are "padded"? > > Right. > > > That explains some of the other patterns > > I was confused about. > > > > Any ideas how to test these? It's hard to control what modes the vectorizer > picks.. > > Yeah, agreed. I think it'd be difficult to trigger it reliably from the > vectoriser given its current limited use of the ifns. > > A gimple frontend test might work though, with a predicate/mask > generated from (say) 16-bit elements, then bitcast to a predicate/mask > for 32-bit elements and used as an input to an explicit ifn on 32-bit > elements. If the 16-bit predicate contains 0, 1 for some even-aligned > pair, after the last 1, 0 aligned pair, then the code would likely have > picked the wrong element.
*insert chaos emoji* I realized the testcases in the testsuite early break
vect tests needed this since that's how I noticed it anyway.. so it's enough
to run that with SVE enabled (as there are tests depending on partial vectors
anyway)
So...
Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Ok for master?
Thanks,
Tamar
gcc/ChangeLog:
PR tree-optimization/118464
PR tree-optimization/116855
* config/aarch64/aarch64-sve.md (@extract_<last_op>_<mode>,
@fold_extract_<last_op>_<mode>,
@aarch64_fold_extract_vector_<last_op>_<mode>): Change SVE_FULL to
SVE_ALL/
-- inline copy of patch --
diff --git a/gcc/config/aarch64/aarch64-sve.md
b/gcc/config/aarch64/aarch64-sve.md
index
a93bc463a909ea28460cc7877275fce16e05f7e6..205eeec2e35544de848e0dbb48e3f5ae59391a88
100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -3107,12 +3107,12 @@ (define_insn "@extract_<last_op>_<mode>"
[(set (match_operand:<VEL> 0 "register_operand")
(unspec:<VEL>
[(match_operand:<VPRED> 1 "register_operand")
- (match_operand:SVE_FULL 2 "register_operand")]
+ (match_operand:SVE_ALL 2 "register_operand")]
LAST))]
"TARGET_SVE"
{@ [ cons: =0 , 1 , 2 ]
- [ ?r , Upl , w ] last<ab>\t%<vwcore>0, %1, %2.<Vetype>
- [ w , Upl , w ] last<ab>\t%<Vetype>0, %1, %2.<Vetype>
+ [ ?r , Upl , w ] last<ab>\t%<vwcore>0, %1, %2.<Vctype>
+ [ w , Upl , w ] last<ab>\t%<Vctype>0, %1, %2.<Vctype>
}
)
@@ -8899,26 +8899,26 @@ (define_insn "@fold_extract_<last_op>_<mode>"
(unspec:<VEL>
[(match_operand:<VEL> 1 "register_operand")
(match_operand:<VPRED> 2 "register_operand")
- (match_operand:SVE_FULL 3 "register_operand")]
+ (match_operand:SVE_ALL 3 "register_operand")]
CLAST))]
"TARGET_SVE"
{@ [ cons: =0 , 1 , 2 , 3 ]
- [ ?r , 0 , Upl , w ] clast<ab>\t%<vwcore>0, %2, %<vwcore>0,
%3.<Vetype>
- [ w , 0 , Upl , w ] clast<ab>\t%<Vetype>0, %2, %<Vetype>0,
%3.<Vetype>
+ [ ?r , 0 , Upl , w ] clast<ab>\t%<vwcore>0, %2, %<vwcore>0,
%3.<Vctype>
+ [ w , 0 , Upl , w ] clast<ab>\t%<Vctype>0, %2, %<Vctype>0,
%3.<Vctype>
}
)
(define_insn "@aarch64_fold_extract_vector_<last_op>_<mode>"
- [(set (match_operand:SVE_FULL 0 "register_operand")
- (unspec:SVE_FULL
- [(match_operand:SVE_FULL 1 "register_operand")
+ [(set (match_operand:SVE_ALL 0 "register_operand")
+ (unspec:SVE_ALL
+ [(match_operand:SVE_ALL 1 "register_operand")
(match_operand:<VPRED> 2 "register_operand")
- (match_operand:SVE_FULL 3 "register_operand")]
+ (match_operand:SVE_ALL 3 "register_operand")]
CLAST))]
"TARGET_SVE"
{@ [ cons: =0 , 1 , 2 , 3 ]
- [ w , 0 , Upl , w ] clast<ab>\t%0.<Vetype>, %2, %0.<Vetype>,
%3.<Vetype>
- [ ?&w , w , Upl , w ] movprfx\t%0, %1\;clast<ab>\t%0.<Vetype>, %2,
%0.<Vetype>, %3.<Vetype>
+ [ w , 0 , Upl , w ] clast<ab>\t%0.<Vctype>, %2, %0.<Vctype>,
%3.<Vctype>
+ [ ?&w , w , Upl , w ] movprfx\t%0, %1\;clast<ab>\t%0.<Vctype>, %2,
%0.<Vctype>, %3.<Vctype>
}
)
rb19246.patch
Description: rb19246.patch
