Hi,
Also, are you still seeing the same improvement with the __builtin_clz as your inline asm implementation?
In my benchmark program, it is a little different performance in fls implementation and inline asm implementation. However, the result of a pgbench is almost the same improvement. Here is the result of my benchmark. Xeon(Core architecture) bytes : 4 8 16 32 64 128 256 512 1024 mix original : 0.780 0.790 0.820 0.870 0.930 0.980 1.040 1.080 1.140 0.910 inline asm: 0.320 0.180 0.190 0.180 0.190 0.180 0.190 0.180 0.190 0.170 fls : 0.270 0.260 0.290 0.290 0.290 0.290 0.290 0.300 0.290 0.380 Xeon(P4 architecrure) bytes : 4 8 16 32 64 128 256 512 1024 mix original : 0.520 0.520 0.670 0.780 0.950 1.000 1.060 1.190 1.250 0.940 inline asm: 0.610 0.530 0.530 0.520 0.520 0.540 0.540 0.580 0.540 0.600 fls : 0.390 0.370 0.780 0.780 0.780 0.790 0.780 0.780 0.780 0.520 pgbench result (measured by oprofile) CPU: Xeon(P4 architecrure) test program: pgbench -c 1 -t 50000 (fsync=off) original samples % symbol name 66854 6.6725 AllocSetAlloc 11817 1.1794 AllocSetFree inline asm samples % symbol name 47610 4.9333 AllocSetAlloc 6248 0.6474 AllocSetFree fls samples % symbol name 48779 4.9954 AllocSetAlloc 7648 0.7832 AllocSetFree Best regards, --- Atsushi Ogawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers