On Wed, Nov 16, 2022 at 1:39 AM Richard Sandiford <richard.sandif...@arm.com> wrote: > > Tamar Christina <tamar.christ...@arm.com> writes: > >> -----Original Message----- > >> From: Hongtao Liu <crazy...@gmail.com> > >> Sent: Tuesday, November 15, 2022 9:37 AM > >> To: Tamar Christina <tamar.christ...@arm.com> > >> Cc: Richard Sandiford <richard.sandif...@arm.com>; Tamar Christina via > >> Gcc-patches <gcc-patches@gcc.gnu.org>; nd <n...@arm.com>; > >> rguent...@suse.de > >> Subject: Re: [PATCH 3/8]middle-end: Support extractions of subvectors from > >> arbitrary element position inside a vector > >> > >> On Tue, Nov 15, 2022 at 4:51 PM Tamar Christina > >> <tamar.christ...@arm.com> wrote: > >> > > >> > > -----Original Message----- > >> > > From: Hongtao Liu <crazy...@gmail.com> > >> > > Sent: Tuesday, November 15, 2022 8:36 AM > >> > > To: Tamar Christina <tamar.christ...@arm.com> > >> > > Cc: Richard Sandiford <richard.sandif...@arm.com>; Tamar Christina > >> > > via Gcc-patches <gcc-patches@gcc.gnu.org>; nd <n...@arm.com>; > >> > > rguent...@suse.de > >> > > Subject: Re: [PATCH 3/8]middle-end: Support extractions of > >> > > subvectors from arbitrary element position inside a vector > >> > > > >> > > Hi: > >> > > I'm from https://gcc.gnu.org/pipermail/gcc-patches/2022- > >> > > November/606040.html. > >> > > > } > >> > > > > >> > > > /* See if we can get a better vector mode before extracting. > >> > > > */ diff --git a/gcc/optabs.cc b/gcc/optabs.cc index > >> > > > > >> > > > >> cff37ccb0dfc3dd79b97d0abfd872f340855dc96..f338df410265dfe55b68961600 > >> > > 9 > >> > > 0 > >> > > > a453cc6a28d9 100644 > >> > > > --- a/gcc/optabs.cc > >> > > > +++ b/gcc/optabs.cc > >> > > > @@ -6267,6 +6267,7 @@ expand_vec_perm_const (machine_mode > >> mode, > >> > > rtx v0, rtx v1, > >> > > > v0_qi = gen_lowpart (qimode, v0); > >> > > > v1_qi = gen_lowpart (qimode, v1); > >> > > > if (targetm.vectorize.vec_perm_const != NULL > >> > > > + && targetm.can_change_mode_class (mode, qimode, > >> > > > + ALL_REGS) > >> > > It looks like you want to guard gen_lowpart, shouldn't it be better > >> > > to use validate_subreg or (tmp = gen_lowpart_if_possible (mode, > >> target_qi)). > >> > > IMHO, targetm.can_change_mode_class is mostly used for RA, but not > >> > > to guard gen_lowpart. > >> > > >> > Hmm I don't think this is quite true, there are existing usages in > >> > expr.cc and rtanal.cc That do this and aren't part of RA. As I > >> > mentioned before for instance the canoncalization of vec_select to subreg > >> in rtlanal for instances uses this. > >> In theory, we need to iterate through all reg classes that can be assigned > >> for > >> both qimode and mode, if any regclass returns true for > >> targetm.can_change_mode_class, the bitcast(validate_subreg) should be ok. > >> Here we just passed ALL_REGS. > > > > Yes, and most targets where this transformation is valid return true here. > > > > I've checked: > > * alpha > > * arm > > * aarch64 > > * rs6000 > > * s390 > > * sparc > > * pa > > * mips > > > > And even the default example that other targets use from the documentation > > would return true as the size of the modes are the same. > > > > X86 and RISCV are the only two targets that I found (but didn't check all) > > that > > blankly return a result based on just the register classes. > > > > That is to say, there are more targets that adhere to the interpretation > > that > > rclass here means "should be possible in some class in rclass" rather than > > "should be possible in ALL classes of rclass". > > Yeah, I agree. A query "can something stored in ALL_REGS change from > mode M1 to mode M2?" is meaningful if at least one register R in ALL_REGS > can hold both M1 and M2. It's then the target's job to answer > conservatively so that the result covers all such R. > > In principle it's OK for a target to err on the side of caution and forbid > things that are actually OK. But that's going to risk losing performance > in some cases, and sometimes that loss of performance will be unacceptable. > IMO that's what's happening here. The target is applying x87 rules to > things that (AIUI) are never stored in x87 registers, and so losing Yes, it can be optimized since some mode will never assigned to x87 registers. > performance as a result. > > Note that the RA also uses ALL_REGS for some things, so this usage > isn't specific to non-RA code. RA passes the minimal reg class(REGNO_REG_CLASS) which contains REGN to decide if can_change_mode_class, not ALL_REGS. 511/* Given a hard REGN a FROM mode and a TO mode, return true if 512 REGN can change from mode FROM to mode TO. */ 513#define REG_CAN_CHANGE_MODE_P(REGN, FROM, TO) \ 514 (targetm.can_change_mode_class (FROM, TO, REGNO_REG_CLASS (REGN))) 515
So I still think using can_change_mode_class outside of RA with ALL_REGS passed to decide whether it's ok to generate subreg is not a good idea. > > IMO it's not the job of target-independent code to iterate through > individual classes and aggregate the result. One of the reasons for > having union classes is to avoid the need to do that. And ALL_REGS > is the ultimate union class. :-) > > The patch looks correct to me. > > Thanks, > Richard > > >> > > >> > So there are already existing precedence for this. And the > >> > documentation for the hook says: > >> > > >> > "This hook returns true if it is possible to bitcast values held in > >> > registers of > >> class rclass from mode from to mode to and if doing so preserves the low- > >> order bits that are common to both modes. The result is only meaningful if > >> rclass has registers that can hold both from and to. The default > >> implementation returns true" > >> > > >> > So it looks like it's use outside of RA is perfectly valid.. and the > >> > documentation also mentions in the example the use from the mid-end as > >> an example. > >> > > >> > But if the mid-end maintainers are happy I'll use something else. > >> > > >> > Tamar > >> > > >> > > I did similar things in > >> > > https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579296.html > >> > > (and ALL_REGS doesn't cover all cases for registers which are both > >> > > available for qimode and mode, ALL_REGS fail doesn't mean it can't > >> > > be subreg, it just means parts of ALL_REGS can't be subreg. but with > >> > > a subset of ALL_REGS, there could be a reg class which return true > >> > > for > >> > > targetm.can_change_mode_class) > >> > > > && targetm.vectorize.vec_perm_const (qimode, qimode, > >> > > > target_qi, > >> > > v0_qi, > >> > > > v1_qi, > >> > > > qimode_indices)) > >> > > > return gen_lowpart (mode, target_qi); @@ -6311,7 +6312,8 > >> > > > @@ expand_vec_perm_const (machine_mode mode, rtx v0, rtx v1, > >> > > > } > >> > > > > >> > > > if (qimode != VOIDmode > >> > > > - && selector_fits_mode_p (qimode, qimode_indices)) > >> > > > + && selector_fits_mode_p (qimode, qimode_indices) > >> > > > + && targetm.can_change_mode_class (mode, qimode, ALL_REGS)) > >> > > > { > >> > > > icode = direct_optab_handler (vec_perm_optab, qimode); > >> > > > if (icode != CODE_FOR_nothing) diff --git > >> > > > a/gcc/testsuite/gcc.target/aarch64/ext_1.c > >> > > > b/gcc/testsuite/gcc.target/aarch64/ext_1.c > >> > > > new file mode 100644 > >> > > > index > >> > > > > >> > > > >> 0000000000000000000000000000000000000000..18a10a14f1161584267a8472e5 > >> > > 71 > >> > > > b3bc2ddf887a > >> > > > >> > > > >> > > > >> > > > >> > > -- > >> > > BR, > >> > > Hongtao > >> > >> > >> > >> -- > >> BR, > >> Hongtao -- BR, Hongtao