> We seem to be looking at promotions of the call argument, lhs_type > is the same as the type of the call LHS. But the comment mentions .POPCOUNT > and the following code also handles others, so maybe handling should be > moved. Also when we look to vectorize popcount (x) instead of popcount((T)x) > we can simply promote the result accordingly.
IMHO lhs_type is the type of the conversion lhs_oprnd = gimple_assign_lhs (last_stmt); lhs_type = TREE_TYPE (lhs_oprnd); and rhs/unprom_diff has the type of the call's input argument rhs_oprnd = gimple_call_arg (call_stmt, 0); vect_look_through_possible_promotion (vinfo, rhs_oprnd, &unprom_diff); So we can potentially have T0 arg T1 in = (T1)arg T2 ret = __builtin_popcount (in) T3 lhs = (T3)ret and we're checking if precision (T0) == precision (T3). This will never be true for a proper __builtin_popcountll except if the return value is cast to uint64_t (which I just happened to do in my test...). Therefore it still doesn't really make sense to me. Interestingly though, it helps for an aarch64 __builtin_popcountll testcase where we abort here and then manage to vectorize via vectorizable_call. When we skip this check, recognition succeeds and replaces the call with the pattern. Then scalar costs are lower than in the vectorizable_call case because __builtin_popcountll is not STMT_VINFO_RELEVANT_P anymore (not live or so?). Then, vectorization costs are too high compared to the wrong scalar costs and we don't vectorize... Odd, might require fixing separately. We might need to calculate the scalar costs in advance? > It looks like vect_recog_popcount_clz_ctz_ffs_pattern is specifcally for > the conversions, so your fallback should possibly apply even when not > matching them. Mhm, yes it appears to only match when casting the return value to something else than an int. So we'd need a fallback in vectorizable_call? And it would potentially look a bit out of place there only handling popcount and not ctz, clz, ... Not sure if it is worth it then? Regards Robin