Am 13.01.2007 um 10:45 schrieb Gustaf Neumann:

PPS: strangly, the only think making me supicious is the
huge amount of improvement, especially on Mac OS X.

Look...
Running the test program unmodified (on Mac Pro box):

Test Tcl allocator with 4 threads, 16000 records ...
This allocator achieves 35096360 ops/sec under 4 threads
Press return to exit (observe the current memory footprint!)

If I modify the memtest.c program at line 146 to read:

     if (dorealloc && (allocptr > tdata[tid].allocs) && (r & 1)) {
allocptr[-1] = reallocs[whichmalloc](allocptr[-1], *toallocptr);
     } else {
         allocptr[0] = mallocs[whichmalloc](*toallocptr);
/*-->*/  memset(allocptr[0], 0, *toallocptr > 64 ? 64 : *toallocptr);
         allocptr++;
     }

Test Tcl allocator with 4 threads, 16000 records ...
This allocator achieves 28377808 ops/sec under 4 threads
Press return to exit (observe the current memory footprint!)

If I memset the whole memory area, not just first 64 bytes:

Test Tcl allocator with 4 threads, 16000 records ...
This allocator achieves 14862477 ops/sec under 4 threads
Press return to exit (observe the current memory footprint!)


BUT, guess what! The system allocator gives me (using same test data
i.e. memsetting the whole allocated chunk):

Test standard allocator with 4 threads, 16000 records ...
This allocator achieves 869716 ops/sec under 4 threads
Press return to exit (observe the current memory footprint!)


So we are still: 14862477/869716 = 17 times faster. With increasing
thread count we get faster and faster whereas system allocator stays
at the same (low) level or is getting slower.

Now, I would really like to know why! Perhaps the fact that we are
using mmap() instead of god-knows-what Apple is using...

Anyways... either we have some very big error there (in which
case I'd like to know where, as everything is working as it should!)
or we have found much better way to handle memory on Mac OSX :-)

Cheers
Zoran





Reply via email to