At 04:37 PM 1/20/2006, Martijn van Oosterhout wrote:
On Fri, Jan 20, 2006 at 04:19:15PM -0500, Tom Lane wrote:
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls  Ks/call  Ks/call  name
>  98.96   1495.93  1495.93 33035195     0.00     0.00  hemdistsign

<snip>

> So we gotta fix hemdistsign ...

lol! Yeah, I guess so. Pretty nasty loop. LOOPBIT will iterate 8*63=504
times and it's going to do silly bit handling on each and every
iteration.

Given that all it's doing is counting bits, a simple fix would be to
loop over bytes, use XOR and count ones. For extreme speedup create a
lookup table with 256 entries to give you the answer straight away...
For an even more extreme speedup, don't most modern CPUs have an asm instruction that counts the bits (un)set (AKA "population counting") in various size entities (4b, 8b, 16b, 32b, 64b, and 128b for 64b CPUs with SWAR instructions)?

Ron


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Reply via email to