On Wed, Aug 9, 2023 at 12:23 PM Robin Dapp <rdapp....@gmail.com> wrote:
>
> > We seem to be looking at promotions of the call argument, lhs_type
> > is the same as the type of the call LHS.  But the comment mentions .POPCOUNT
> > and the following code also handles others, so maybe handling should be
> > moved.  Also when we look to vectorize popcount (x) instead of 
> > popcount((T)x)
> > we can simply promote the result accordingly.
>
> IMHO lhs_type is the type of the conversion
>
>   lhs_oprnd = gimple_assign_lhs (last_stmt);
>   lhs_type = TREE_TYPE (lhs_oprnd);
>
> and rhs/unprom_diff has the type of the call's input argument
>
>   rhs_oprnd = gimple_call_arg (call_stmt, 0);
>   vect_look_through_possible_promotion (vinfo, rhs_oprnd, &unprom_diff);
>
> So we can potentially have
>   T0 arg
>   T1 in = (T1)arg
>   T2 ret = __builtin_popcount (in)
>   T3 lhs = (T3)ret
>
> and we're checking if precision (T0) == precision (T3).

Looks like so.  Note T1 == T2.  What we're really after is
changing T1/T2 and the actual popcount used closer to
T0/T3, like in case T0 was 'char' and T3 was 'long' we
could still use popcountqi and then widen to T3 (or the
other way around).  So yes, I think requiring that T0 and T3
are equal isn't necessary.

> This will never be true for a proper __builtin_popcountll except if
> the return value is cast to uint64_t (which I just happened to do
> in my test...).  Therefore it still doesn't really make sense to me.
>
> Interestingly though, it helps for an aarch64 __builtin_popcountll
> testcase where we abort here and then manage to vectorize via
> vectorizable_call.  When we skip this check, recognition succeeds
> and replaces the call with the pattern.  Then scalar costs are lower
> than in the vectorizable_call case because __builtin_popcountll is
> not STMT_VINFO_RELEVANT_P anymore (not live or so?).
> Then, vectorization costs are too high compared to the wrong scalar
> costs and we don't vectorize... Odd, might require fixing separately.
> We might need to calculate the scalar costs in advance?
>
> > It looks like vect_recog_popcount_clz_ctz_ffs_pattern is specifcally for
> > the conversions, so your fallback should possibly apply even when not
> > matching them.
>
> Mhm, yes it appears to only match when casting the return value to
> something else than an int.  So we'd need a fallback in vectorizable_call?
> And it would potentially look a bit out of place there only handling
> popcount and not ctz, clz, ...  Not sure if it is worth it then?

I'd keep the handling as pattern just also match on popcount directly
when not converted.

>
> Regards
>  Robin
>

Reply via email to