Am 16.01.2007 um 10:46 schrieb Gustaf Neumann:
s = (size-1) >> 3;
while (s>1) { s >>= 1; bucket++;
On Linux and Solaris (both x86 machines)
the "long" version:
s = (size-1) >> 4;
while (s > 0xFF) {
s = s >> 5;
bucket += 5;
}
while (s > 0x0F) {
s = s >> 4;
bucket += 4;
}
...
is faster then the "short" above.
On Mac OSX it is the same (no difference).
Look the Sun Solaris 10 (x86 box):
(the "short" version)
Test Tcl allocator with 4 threads, 16000 records ...
This allocator achieves 13753084 ops/sec under 4 threads
Press return to exit (observe the current memory footprint!)
(the "long" version)
-bash-3.00$ ./memtest
Test Tcl allocator with 4 threads, 16000 records ...
This allocator achieves 14341236 ops/sec under 4 threads
Press return to exit (observe the current memory footprint!)
That is ((14341236-13753084)/14341236)*100 = 4%
On Linux we had about 3% improvement. On Sun about 4% and
on Mac OSX none. Note: all were x86 (Intel, AMD) machines
just different OS and GHz-count.
When we go back to the "slow" (original) version:
Test Tcl allocator with 4 threads, 16000 records ...
This allocator achieves 13474091 ops/sec under 4 threads
Press return to exit (observe the current memory footprint!)
We get ((14341236-13474091)/14341236)*100 = 6% improvement.
Cheers
Zoran