On Tue, Aug 8, 2023 at 1:37 PM Robin Dapp <rdapp....@gmail.com> wrote: > > > Well, not sure how VECT_COMPARE_COSTS can help here, we either > > get the pattern or vectorize the original function. There's no special > > handling > > for popcount in vectorizable_call so all special cases are handled via > > patterns. > > I was thinking of popcounthi via popcountsi and zero-extend / truncate but > > also popcountdi via popcountsi and reducing even/odd SI results via a plus > > to a single DI result. It might be that targets without DI/TI popcount > > support > > but SI popcount support might exist and that this might be cheaper than > > the generic open-coded scheme. But of course such target could then > > implement the DImode version with that trick itself. > > Ah, then I misunderstood. Yes, that would be a better fallback option. > A thing for my "spare time" pile :) > > Btw another thing I noticed: > > /* Input and output of .POPCOUNT should be same-precision integer. */ > if (TYPE_PRECISION (unprom_diff.type) != TYPE_PRECISION (lhs_type)) > return NULL; > > This prevents us from vectorizing i.e. > (uint64_t)__builtin_popcount(uint32_t). It appears like an > unnecessary restriction as all types should be able to hold a popcount > result (as long as TYPE_PRECISION > 6) if the result is properly > converted? Maybe it complicates the fallback handling but in general > we should be fine?
Hmm, the conversion should be a separate statement so I wonder why it would go wrong? Richard. > > > I agree with two cases it isn't too bad, note you probably get away > > with using the full 64bit constant for both 64bit and 32bit, we simply > > truncate it. Note rather than 'ull' we have the HOST_WIDE_INT_UC > > macro which appends the appropriate suffix. > > > > The patch is OK with or without changing this detail. > > Thanks, changed to the full constant. Going to push after bootstrap > and testsuite runs. > > Regards > Robin