I wrote: > I'm still interested in the idea of doing a manual unroll instead of > relying on a compiler-specific feature. However, some quick testing > didn't find an unrolling that helps much.
Hmm, actually this seems to work ok: idx++; size >>= 1; if (size != 0) { idx++; size >>= 1; if (size != 0) { idx++; size >>= 1; if (size != 0) { idx++; size >>= 1; while (size != 0) { idx++; size >>= 1; } } } } (this is with the initial "if (size > (1 << ALLOC_MINBITS))" so that we know the starting value is nonzero) This seems to be about a wash or a small gain on x86_64, but on my PPC Mac laptop it's very nearly identical in speed to the __builtin_clz code. I also see a speedup on HPPA, for which my gcc is too old to know about __builtin_clz. Anyone want to see if they can beat that? Some testing on other architectures would help too. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers