Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes: > On Thu, 6 Apr 2023 at 16:05, Richard Sandiford > <richard.sandif...@arm.com> wrote: >> >> Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes: >> > On Tue, 4 Apr 2023 at 23:35, Richard Sandiford >> > <richard.sandif...@arm.com> wrote: >> >> > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc >> >> > b/gcc/config/aarch64/aarch64-sve-builtins-base.cc >> >> > index cd9cace3c9b..3de79060619 100644 >> >> > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc >> >> > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc >> >> > @@ -817,6 +817,62 @@ public: >> >> > >> >> > class svdupq_impl : public quiet<function_base> >> >> > { >> >> > +private: >> >> > + gimple * >> >> > + fold_nonconst_dupq (gimple_folder &f, unsigned factor) const >> >> > + { >> >> > + /* Lower lhs = svdupq (arg0, arg1, ..., argN} into: >> >> > + tmp = {arg0, arg1, ..., arg<N-1>} >> >> > + lhs = VEC_PERM_EXPR (tmp, tmp, {0, 1, 2, N-1, ...}) */ >> >> > + >> >> > + /* TODO: Revisit to handle factor by padding zeros. */ >> >> > + if (factor > 1) >> >> > + return NULL; >> >> >> >> Isn't the key thing here predicate vs. vector rather than factor == 1 vs. >> >> factor != 1? Do we generate good code for b8, where factor should be 1? >> > Hi, >> > It generates the following code for svdup_n_b8: >> > https://pastebin.com/ypYt590c >> >> Hmm, yeah, not pretty :-) But it's not pretty without either. >> >> > I suppose lowering to ctor+vec_perm_expr is not really useful >> > for this case because it won't simplify ctor, unlike the above case of >> > svdupq_s32 (x[0], x[1], x[2], x[3]); >> > However I wonder if it's still a good idea to lower svdupq for predicates, >> > for >> > representing svdupq (or other intrinsics) using GIMPLE constructs as >> > far as possible ? >> >> It's possible, but I think we'd need an example in which its a clear >> benefit. > Sorry I posted for wrong test case above. > For the following test: > svbool_t f(uint8x16_t x) > { > return svdupq_n_b8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], > x[8], x[9], x[10], x[11], x[12], > x[13], x[14], x[15]); > } > > Code-gen: > https://pastebin.com/maexgeJn > > I suppose it's equivalent to following ? > > svbool_t f2(uint8x16_t x) > { > svuint8_t tmp = svdupq_n_u8 ((bool) x[0], (bool) x[1], (bool) x[2], > (bool) x[3], > (bool) x[4], (bool) x[5], (bool) x[6], > (bool) x[7], > (bool) x[8], (bool) x[9], (bool) x[10], > (bool) x[11], > (bool) x[12], (bool) x[13], (bool) > x[14], (bool) x[15]); > return svcmpne_n_u8 (svptrue_b8 (), tmp, 0); > }
Yeah, this is essentially the transformation that the svdupq rtl expander uses. It would probably be a good idea to do that in gimple too. Thanks, Richard > > which generates: > f2: > .LFB3901: > .cfi_startproc > movi v1.16b, 0x1 > ptrue p0.b, all > cmeq v0.16b, v0.16b, #0 > bic v0.16b, v1.16b, v0.16b > dup z0.q, z0.q[0] > cmpne p0.b, p0/z, z0.b, #0 > ret > > Thanks, > Prathamesh