> Well, not sure how VECT_COMPARE_COSTS can help here, we either
> get the pattern or vectorize the original function.  There's no special 
> handling
> for popcount in vectorizable_call so all special cases are handled via 
> patterns.
> I was thinking of popcounthi via popcountsi and zero-extend / truncate but
> also popcountdi via popcountsi and reducing even/odd SI results via a plus
> to a single DI result.  It might be that targets without DI/TI popcount 
> support
> but SI popcount support might exist and that this might be cheaper than
> the generic open-coded scheme.  But of course such target could then
> implement the DImode version with that trick itself.

Ah, then I misunderstood.  Yes, that would be a better fallback option.
A thing for my "spare time" pile :)

Btw another thing I noticed:

  /* Input and output of .POPCOUNT should be same-precision integer.  */
  if (TYPE_PRECISION (unprom_diff.type) != TYPE_PRECISION (lhs_type))
    return NULL;

This prevents us from vectorizing i.e.
(uint64_t)__builtin_popcount(uint32_t).  It appears like an
unnecessary restriction as all types should be able to hold a popcount
result (as long as TYPE_PRECISION > 6) if the result is properly
converted?  Maybe it complicates the fallback handling but in general
we should be fine?

> I agree with two cases it isn't too bad, note you probably get away
> with using the full 64bit constant for both 64bit and 32bit, we simply
> truncate it.  Note rather than 'ull' we have the HOST_WIDE_INT_UC
> macro which appends the appropriate suffix.
> 
> The patch is OK with or without changing this detail.

Thanks, changed to the full constant.  Going to push after bootstrap
and testsuite runs.

Regards
 Robin

Reply via email to