Do your test again. I just pushed a fairly fast variable-length bitcount.
Sorry for not pushing it earlier.

Posting from a mobile, pardon my terseness. ~ C.

I was perusing the commit log for mesa and stumbled upon the recently
added util_bitcount. It uses a rather naïve algorithm and I thought I'd
look into it as someone mentioned this problem to me before.
This is what I found, should anyone be interested:

In any case, I wrote a little profiler that I have attached (bc.c). The
interesting bit is what result it throws out.
[zha...@ztoshiba ~]$ gcc bc.c -O3 -o bc && ./bc
1 billion of __builtin_popcount(), fast_bitcount(), and naive() (in that
__builtin_popcount(): 12.541 seconds
fast_bitcount(): 7.312 seconds
naive(): 58.240 seconds

And this is the real reason I'm putting something this
unimportant/trivial on the ML. Should the builtin be removed in favor of
this wonder-algorithm? and can I even justify pushing this patch
considering the requirement of knowing CHAR_BIT (from limits.h, amount
of bits in a char)?

