I wrote:
> I'm still interested in the idea of doing a manual unroll instead of
> relying on a compiler-specific feature.  However, some quick testing
> didn't find an unrolling that helps much.

Hmm, actually this seems to work ok:

        idx++;
        size >>= 1;
        if (size != 0)
        {
            idx++;
            size >>= 1;
            if (size != 0)
            {
                idx++;
                size >>= 1;
                if (size != 0)
                {
                    idx++;
                    size >>= 1;
                    while (size != 0)
                    {
                        idx++;
                        size >>= 1;
                    }
                }
            }
        }

(this is with the initial "if (size > (1 << ALLOC_MINBITS))" so that
we know the starting value is nonzero)

This seems to be about a wash or a small gain on x86_64, but on my
PPC Mac laptop it's very nearly identical in speed to the __builtin_clz
code.  I also see a speedup on HPPA, for which my gcc is too old to
know about __builtin_clz.

Anyone want to see if they can beat that?  Some testing on other
architectures would help too.

                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to