On Wed, Aug 9, 2023 at 12:23 PM Robin Dapp <rdapp....@gmail.com> wrote: > > > We seem to be looking at promotions of the call argument, lhs_type > > is the same as the type of the call LHS. But the comment mentions .POPCOUNT > > and the following code also handles others, so maybe handling should be > > moved. Also when we look to vectorize popcount (x) instead of > > popcount((T)x) > > we can simply promote the result accordingly. > > IMHO lhs_type is the type of the conversion > > lhs_oprnd = gimple_assign_lhs (last_stmt); > lhs_type = TREE_TYPE (lhs_oprnd); > > and rhs/unprom_diff has the type of the call's input argument > > rhs_oprnd = gimple_call_arg (call_stmt, 0); > vect_look_through_possible_promotion (vinfo, rhs_oprnd, &unprom_diff); > > So we can potentially have > T0 arg > T1 in = (T1)arg > T2 ret = __builtin_popcount (in) > T3 lhs = (T3)ret > > and we're checking if precision (T0) == precision (T3).
Looks like so. Note T1 == T2. What we're really after is changing T1/T2 and the actual popcount used closer to T0/T3, like in case T0 was 'char' and T3 was 'long' we could still use popcountqi and then widen to T3 (or the other way around). So yes, I think requiring that T0 and T3 are equal isn't necessary. > This will never be true for a proper __builtin_popcountll except if > the return value is cast to uint64_t (which I just happened to do > in my test...). Therefore it still doesn't really make sense to me. > > Interestingly though, it helps for an aarch64 __builtin_popcountll > testcase where we abort here and then manage to vectorize via > vectorizable_call. When we skip this check, recognition succeeds > and replaces the call with the pattern. Then scalar costs are lower > than in the vectorizable_call case because __builtin_popcountll is > not STMT_VINFO_RELEVANT_P anymore (not live or so?). > Then, vectorization costs are too high compared to the wrong scalar > costs and we don't vectorize... Odd, might require fixing separately. > We might need to calculate the scalar costs in advance? > > > It looks like vect_recog_popcount_clz_ctz_ffs_pattern is specifcally for > > the conversions, so your fallback should possibly apply even when not > > matching them. > > Mhm, yes it appears to only match when casting the return value to > something else than an int. So we'd need a fallback in vectorizable_call? > And it would potentially look a bit out of place there only handling > popcount and not ctz, clz, ... Not sure if it is worth it then? I'd keep the handling as pattern just also match on popcount directly when not converted. > > Regards > Robin >