On Tue, Aug 8, 2023 at 1:37 PM Robin Dapp <rdapp....@gmail.com> wrote:
>
> > Well, not sure how VECT_COMPARE_COSTS can help here, we either
> > get the pattern or vectorize the original function.  There's no special 
> > handling
> > for popcount in vectorizable_call so all special cases are handled via 
> > patterns.
> > I was thinking of popcounthi via popcountsi and zero-extend / truncate but
> > also popcountdi via popcountsi and reducing even/odd SI results via a plus
> > to a single DI result.  It might be that targets without DI/TI popcount 
> > support
> > but SI popcount support might exist and that this might be cheaper than
> > the generic open-coded scheme.  But of course such target could then
> > implement the DImode version with that trick itself.
>
> Ah, then I misunderstood.  Yes, that would be a better fallback option.
> A thing for my "spare time" pile :)
>
> Btw another thing I noticed:
>
>   /* Input and output of .POPCOUNT should be same-precision integer.  */
>   if (TYPE_PRECISION (unprom_diff.type) != TYPE_PRECISION (lhs_type))
>     return NULL;
>
> This prevents us from vectorizing i.e.
> (uint64_t)__builtin_popcount(uint32_t).  It appears like an
> unnecessary restriction as all types should be able to hold a popcount
> result (as long as TYPE_PRECISION > 6) if the result is properly
> converted?  Maybe it complicates the fallback handling but in general
> we should be fine?

Hmm, the conversion should be a separate statement so I wonder
why it would go wrong?

Richard.

>
> > I agree with two cases it isn't too bad, note you probably get away
> > with using the full 64bit constant for both 64bit and 32bit, we simply
> > truncate it.  Note rather than 'ull' we have the HOST_WIDE_INT_UC
> > macro which appends the appropriate suffix.
> >
> > The patch is OK with or without changing this detail.
>
> Thanks, changed to the full constant.  Going to push after bootstrap
> and testsuite runs.
>
> Regards
>  Robin

Reply via email to