https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87528

--- Comment #6 from rguenther at suse dot de <rguenther at suse dot de> ---
On Mon, 8 Oct 2018, jamborm at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87528
> 
> --- Comment #5 from Martin Jambor <jamborm at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #3)
> > Can you point me to the source for which we generate the popcount call(s)? 
> > It might be not final value replacement but instead code-generating a niter
> > analysis result.
> 
> It turns out it is as simple as:
> 
> ----------------------------------------
> typedef unsigned long long BITBOARD;
> 
> int PopCount (BITBOARD b) {
>     int c = 0;
> 
>     while (b) {
>         b &= b - 1;
>         c++;
>     }
> 
>     return c;
> }
> ----------------------------------------
> with -O3.  Trunk generates a call to __popcountdi2 when GCC 7 (or
> trunk with -march=skylake) does not.

OK, I see.  They have that PopCount and also ThickPopCount which
appearantly is supposed to handle the high-density case better.
They have also open-coded ffs() & friends.

I guess without any idea of 'b' we need to assume that 'b' has
almost no bits set and thus the loop will be fast(er than a call).

This was also noted elsewhere and a GCC discussion circled around
the fact that we introduce popcount() into other contexts
(the above is final value replacement).  So that raises the
question whether we should simply open-code popcount
on the target?

I wonder if you can benchmark using ThickPopCount everywhere
(for a non-loopy replacement).

Note currently final value replacement uses expression_expensive_p
which eventually just checks is_inexpensive_builtin which
suffers from the very same issue (in theory).  Patching
is_inexpensive_builtin to check whether we have an optab might
do the trick (with other side-effects, like on inlining, of course).

Reply via email to