I've been on a search for an allocator that will be fast
enough and not so memory hungry as the allocator being
built in Tcl. Unfortunately, as it mostly is, it turned
out that I had to write my own.

Vlad has written an allocator that uses mmap to obtain
memory for the system and munmap that memory on thread
exit, if possible.

I have spent more than 3 weeks fiddling with that and
discussing it with Vlad and this is what we bith come to:

    http://www.archiware.com/downloads/vtmalloc-0.0.1.tar.gz

I believe we have solved most of my needs. Below is an excerpt
from the README file for the qurious.

If anybody would care to test it in his/her own environment?
If all goes well, I might TIP this to be included in Tcl core
as replacement of (or addition to) the zippy allocator.

Zoran,

Because I am quite biased here, to avoid later being branded as
biased,I want to explicitly state my bias up front: In my experience,
very little good comes out of people writing their own memory
allocators.  There is a small number of people in this world for who
this privilege should be reserved (outside of a classroom excercise,
of course), and the rest of us humble folk should help them when we
can but generally stay out of the way - setting out to reinvent the
wheel is not a good thing.

I downloaded the code in the previous mail.  After some minor path
adjustments, I was able to get the test program to compile and link
under FreeBSD 6.1 running on a dual-processor PIII system, linked
against a threaded tcl 8.5a.  I could get this program to consistently
do one of two things:
- dump core
- hang seemingly forever
but absolutely nothing else.
Running this program under the latest version of valgrind (using
memcheck or helgrind tools) reveals numerous errors from valgrind,
which I suspect (although I did not confirm) are the reason for the
core dumps and infinite hangs when it is run on its own.

I have no time to debug this myself,  however in the interest of
science and general progress, I'm happy to offer ssh access to a test
box where you can reproduce these results.  I strongly advise against
using a benchmark with the above characteristics to make any decisions
about speed or memory consumption improvements or problems.

---

After toying around with this briefly, I was able to run the test
program under valgrind after specifying a -rec value of 1000 or less.
Despite some errors reported by valgrind, the test program does run to
completion and report its results in these cases.

standard allocator:
This allocator achieves 43982 ops/sec under 4 threads
tcl allocator:
This allocator achieves 21251 ops/sec under 4 threads
improved tcl allocator:
This allocator achieves 21308 ops/sec under 4 threads

But again, I would not draw any serious conclusions from these numbers.

Reply via email to