Hello dsimcha,

== Quote from BCS (n...@anon.com)'s article

Hello dsimcha,

3.  This one is an order of magnitude less likely than the other two
to actually get implemented, at least by me, but how about
thread-local allocators so you can call malloc() without taking a
lock?  I vaguely remember Sean saying he was working on that a while
back, but I never heard anything about it again.  It's probably best
to wait for shared to be implemented for this so that unshared
objects can also be collected w/o stopping the world, but we should
start at least discussing this now.

I think there are general malloc functions that are lock free. On the
same note, I've been toying with an idea for how to do memory heap
structure that would be totally lock free (think CAS) but has some downsides (like it won't handle a non-continues memory space and
expanding the memory could get very expensive. Both of these could
be non issues in some cases; short lived job specific heaps, memory
mapped (shared?) data(base) files, resource limited systems, etc.
I've been meaning to get around to implementing it but never had
time. I'd be willing to collaborate or, if even that doesn't get me
moving, just hand over the whole project in exchange for a "coauthor" line in the comments.

But x86 atomic ops, while not as slow as locks, are pretty slow.  Some
benchmarks I did a while back indicated that, on Windows, atomic
increment is only about twice as fast as an uncontested synchronized {
var++; }.

IIRC a mutex/lock has to have a CAS at some point so if you can get a function to work with a single CAS, at worst an uncontested action will be as good as an uncontested mutex/lock. Once uncontention starts happening it depends on how much your retry costs. Another thing to take into account is the overhead vs granularity tradeoff you get with locks that can sometimes be skiped with direct use of CAS.

I think using thread-local storage and having each
thread-local heap interact with the global heap only occasionally, for
very large transactions, is a much better bet than using atomic CAS.

I got started thinking about this when I came across a comment that some particular malloc didn't have issues with freeing memory from a different thread than it was malloced from.

I'm also a little bit confused about what you're suggesting,
specifically about the non-continuous memory space.  Does this mean
that things would have to be freed LIFO?  If so, what use cases would
you be covering that TempAlloc doesn't already cover?  If not, what
does it mean?

IIRC, when the current GC runs out of space, it uses mmap (on linux) to get more address space mapped to something. This wouldn't work for my idea because you can't count of mmap returning a block of ram at any given location, it is free to leave gaping holes in the places you can allocate at and I wouldn't be able to handle that.


Reply via email to