> Well, not sure how VECT_COMPARE_COSTS can help here, we either > get the pattern or vectorize the original function. There's no special > handling > for popcount in vectorizable_call so all special cases are handled via > patterns. > I was thinking of popcounthi via popcountsi and zero-extend / truncate but > also popcountdi via popcountsi and reducing even/odd SI results via a plus > to a single DI result. It might be that targets without DI/TI popcount > support > but SI popcount support might exist and that this might be cheaper than > the generic open-coded scheme. But of course such target could then > implement the DImode version with that trick itself.
Ah, then I misunderstood. Yes, that would be a better fallback option. A thing for my "spare time" pile :) Btw another thing I noticed: /* Input and output of .POPCOUNT should be same-precision integer. */ if (TYPE_PRECISION (unprom_diff.type) != TYPE_PRECISION (lhs_type)) return NULL; This prevents us from vectorizing i.e. (uint64_t)__builtin_popcount(uint32_t). It appears like an unnecessary restriction as all types should be able to hold a popcount result (as long as TYPE_PRECISION > 6) if the result is properly converted? Maybe it complicates the fallback handling but in general we should be fine? > I agree with two cases it isn't too bad, note you probably get away > with using the full 64bit constant for both 64bit and 32bit, we simply > truncate it. Note rather than 'ull' we have the HOST_WIDE_INT_UC > macro which appends the appropriate suffix. > > The patch is OK with or without changing this detail. Thanks, changed to the full constant. Going to push after bootstrap and testsuite runs. Regards Robin