Fawzi Mohamed Wrote: > On 2009-09-15 04:51:19 +0200, "Robert Jacques" <sandf...@jhu.edu> said: > > > On Mon, 14 Sep 2009 18:53:51 -0400, Fawzi Mohamed <fmoha...@mac.com> wrote: > > > >> On 2009-09-14 17:07:00 +0200, "Robert Jacques" <sandf...@jhu.edu> said: > >> > >>> On Mon, 14 Sep 2009 09:39:51 -0400, Leandro Lucarella > >>> <llu...@gmail.com> wrote: > >>>> Jeremie Pelletier, el 13 de septiembre a las 22:58 me escribiste: > >>> [snip] > >> [1) to allocate large objects that have a guard object it is a good > >> idea to pass through the GC because if memory is tight a gc collection > >> is triggered thereby possibly freeing some extra memory > >> 2) using gc malloc is not faster than malloc, especially with several > >> threads the single lock of the basic gc makes itself felt. > >> > >> for how I use D (not realtime) the two things I would like to see from > >> new gc are: > >> 1) multiple pools (at least one per cpu, with thread id hash to assign > >> threads to a given pool). > >> This to avoid the need of a global gc lock in the gc malloc, and if > >> possible use memory close to the cpu when a thread is pinned, not to > >> have really thread local memory, if you really need local memory > >> different from the stack then maybe a separate process should be used. > >> This is especially well doable with 64 bits, with 32 memory > >> usage/fragmentation could become an issue. > >> 2) multiple thread doing the collection (a main thread distributing the > >> work to other threads (one per cpu), that do the mark phase using > >> atomic ops). > >> > >> other better gc, less latency (but not at the cost of too much > >> computation), would be nice to have, but are not a priority for my > >> usage. > >> > >> Fawzi > >> > > > > For what it's worth, the whole point of thread-local GC is to do 1) and > > 2). For the purposes of clarity, thread-local GC refers to each thread > > having it's own GC for non-shared objects + a shared GC for shared > > objects. Each thread's GC may allocate and collect independently of > > each other (e.g. in parallel) without locking/atomics/etc. > > Well I want at least thread local pools (or almost, one can probably > restrict it to the number of cpus, which will give most of the > benefit), but not an extra partition of the memory in thread local and > shared. > Such a partition might be easier in D2 (I think it was discussed, but > even then I am not fully sure about the benefit), because then you have > to somehow be able to share and maybe even unshare an object, which > will be cumbersome. Thread local things add a level in the memory > hierarchy that I am not fully sure is worth having, in it you should > have almost only low level plumbing. > If you really want that much separation for many things then maybe a > separate process + memmap might be better. > The fast local storage for me is the stack, and one might think about > being more aggressive in using it, the heap is potentially shared. > Well at least that is my feeling. > > Note that on 64 bit one can easily use a few bits to subdivide the > memory in parts, making finding the pool group very quick, and this > discussion is orthogonal to being generational or not. > > Fawzi >
I just posted my memory manager to pastebin: http://pastebin.com/f7459ba9d I gave up on the generational feature, its indeed impossible without write barriers to keep track of pointers from old generations to newer ones. I had the whole tracing algorithm done but without generations, a naive scan and sweep is faster because it has way less cache misses. I'd like to get some feedback on it if possible.