I've been on a search for an allocator that will be fast enough and not so memory hungry as the allocator being built in Tcl. Unfortunately, as it mostly is, it turned out that I had to write my own.Vlad has written an allocator that uses mmap to obtain memory for the system and munmap that memory on thread exit, if possible. I have spent more than 3 weeks fiddling with that and discussing it with Vlad and this is what we bith come to: http://www.archiware.com/downloads/vtmalloc-0.0.1.tar.gz I believe we have solved most of my needs. Below is an excerpt from the README file for the qurious. If anybody would care to test it in his/her own environment? If all goes well, I might TIP this to be included in Tcl core as replacement of (or addition to) the zippy allocator.
Zoran, Because I am quite biased here, to avoid later being branded as biased,I want to explicitly state my bias up front: In my experience, very little good comes out of people writing their own memory allocators. There is a small number of people in this world for who this privilege should be reserved (outside of a classroom excercise, of course), and the rest of us humble folk should help them when we can but generally stay out of the way - setting out to reinvent the wheel is not a good thing. I downloaded the code in the previous mail. After some minor path adjustments, I was able to get the test program to compile and link under FreeBSD 6.1 running on a dual-processor PIII system, linked against a threaded tcl 8.5a. I could get this program to consistently do one of two things: - dump core - hang seemingly forever but absolutely nothing else. Running this program under the latest version of valgrind (using memcheck or helgrind tools) reveals numerous errors from valgrind, which I suspect (although I did not confirm) are the reason for the core dumps and infinite hangs when it is run on its own. I have no time to debug this myself, however in the interest of science and general progress, I'm happy to offer ssh access to a test box where you can reproduce these results. I strongly advise against using a benchmark with the above characteristics to make any decisions about speed or memory consumption improvements or problems. --- After toying around with this briefly, I was able to run the test program under valgrind after specifying a -rec value of 1000 or less. Despite some errors reported by valgrind, the test program does run to completion and report its results in these cases. standard allocator: This allocator achieves 43982 ops/sec under 4 threads tcl allocator: This allocator achieves 21251 ops/sec under 4 threads improved tcl allocator: This allocator achieves 21308 ops/sec under 4 threads But again, I would not draw any serious conclusions from these numbers.
