On Tue, Oct 22, 2013 at 12:01 AM, Jonathan S. Shapiro <[email protected]>wrote:
> On Sun, Oct 20, 2013 at 9:18 PM, Ben Kloosterman <[email protected]>wrote: > >> On Mon, Oct 21, 2013 at 12:40 AM, Jonathan S. Shapiro >> <[email protected]>wrote: >> >>> On Sun, Oct 20, 2013 at 4:53 AM, Bennie Kloosteman >>> <[email protected]>wrote: >>> >>>> > >> However with a hybrid model the cost of using the main heap is much >>>> greater than with a generational GC due to introducing ref counting >>>> especially ref counting on short lived objects many of which may not have >>>> been counted and allowing cycles which may not have escaped a larger >>>> nursery. >>>> >>> >>> You may be right, but this is one of those tempting statements that >>> needs to be confirmed with measurement. Intuitions on this sort of thing >>> can be deceptive. Adolescent objects (objects newly promoted to the main >>> heap) are recently written, and therefore have their M *and* N bits set >>> (RC-immix didn't address this optimization). As such, writes to these >>> objects are unimpeded and the "integrate" operation on the object is >>> deferred. This mitigates both the ref counting cost and the likelihood of >>> cycle creation. >>> >> >> Yes we do know the impact is significantly higher we do need to quantify >> it ( or some other research) , Not syncronizing the new count with the >> Nursery sweep like rc-Immix is a good option but the impact is poor >> performance from small heaps due to young objects getting mixed in .. >> > > I'm actually a bit confused about this part of the RC-immix paper. I > should go look for myself, but I'm short of time right at the moment. > Me too .. Sometimes I think there Nursery is just the 2 Block local thread allocator which throws it in the main heap.. I dislike papers with no source when i have source (even if partial/unoptomized) it becomes so much less time consuming to understand ( which i dont have a lot of ) and clear . At least the Immix source is there ..( though there are a few varients) > > The steady-state model in the RC-immix generational heap is that objects > are logged before writing, and then placed in a write-enabled state by > setting the M bit. While write-enabled, incrementing reference counts are > deferred. There are several points that the paper does not discuss: > > 1. There is no need to log the non-reference fields. > 2. The log storage can be implemented in the nursery. > 3. If the object is new since the last GC, it contains no references > that need to be logged, and can be copied into the GC heap in the > write-enabled state. > > On the face of things, [3] conflicts with [2]. But I don't think it really > does. I spoke at one point about trying to keep the "last 10%" of the young > objects in the nursery so that they get a chance to die. An interesting > alternative is to go ahead and copy the nursery objects into the GC heap, > and instead of trying to keep *objects* in the nursery, we record * > pointers* to those objects to an inc-buf. The inc-buf is thread-local, > and exclusively contains pointers to objects that were evacuated during the > last nursery collection. It is stored in reserved space at the *top* of > the nursery. > So the Set of N objects since the last collecion. IF all N objects are counted in between each collection than they will be the same and you dont need N bits. ( Which would only work with a larger nursery) > > I'm thinking that if I play some games with that inc-buf I can find a > marking strategy that will tell us which of those objects are no longer > live at the start of the next collection. > Possibly but this to me is the big diffirence between the RC-Immix and URC model ( besides age) . RC-Immix has a small Nursery and they complain this causes problems with locality on small heaps by admitting objects with low life. > > >> >> >>> Not saying you're wrong. What I'm mainly saying is that the RC-immix >>> encodings can be extended to buy you a kind of "shared nursery" in the >>> general heap that survives until the next major cycle. >>> >> >> Its just something we should be aware of that in generational GCs small >> nurseries only attract the cost of a more frequent mark/sweep. >> > > Yes. I'm starting to write up a design note on BitC GC. Which won't > survive contact with the implementation, but it's a way to gather my > thoughts. > Keenly awaited .. a plan is important even if not fully implemented it gives direction. > One of the points not discussed in the *immix papers is that it helps very > much if the nursery size is a multiple of the large page size. > > You can always track survival rates and voluntarily tune down the limit > pointer on the nursery using some kind of weighted history, but having the > nursery covered by a large page TLB entry is a big win. There are a number > of ways to achieve that, but it's not something to forget. > Good point. I think you should do this anyway even if the size is less , you just waste the memory there are also good arguments about matching L2 or L3 cache size ( or L1 and L2 for the thread local allocator) . Though some bum will complain hello_word doesnt take 3Kb. > > > Yes, allocators of that form really are that bad. A contended interlocked > operation (and those would be, because they would be contending for the > bump pointers) can take *hundreds* of cycles. The interlocked overhead > has gotten dramatically worse over the last decade. > > 10 years ago is 2003. I don't know about 2003 specifically, but by 2004 > the research literature pretty much seems to take it for granted that > nursery allocation should be thread-local. The *immix papers go further by > making *all* bump allocation thread-local. > Which would be about right, production lags Research by a good 5 years normally. And duel cores only started shipping as the default around then . > >> Anyway the .NET 4.5 one uses a shared nursery but blocks are writtent to >> it by a thread local allocator ( probably as you said by handing out >> chunks ) and when there are no new blocks left its swept. This is probably >> the highest performance Nursery ( and its simple) but at a cost in global >> pauses. >> > > For small-scale multicore, that's a pretty good design, and I've > considered something similar. > To me i would do something like that first as the cost ( development time) is low and performance is good .. A allocator for high core counts can come later. > > >> >>> I should perhaps explain that I'm concerned to have an approach that >>> will work on weakly ordered memories. That makes the cost of interlock even >>> higher than it is on Intel processors, which is already astonishingly high. >>> >> >> Weakly ordered memory systems (ARM) though rarely have multiple CPUs so >> the cache lock is lower but its a good point.. Anyway the Nursery is in >> blocks seems to cover everything with few negatives . >> > > You need to look at the state of the ARM parts again. Essentially *all* of > the Cortex-A ARM cores now shipping are multicore. A rapidly increasing > percentage of the Cortex-M processors are also multicore. Some of the more > interesting Cortex-M's are asymmetric multicore (e.g. the NXP LPC43xx > parts). > I didnt mean multi cores , i specifically meant multiple physical CPUs , raising the LOCK on memory is not pretty. Ben
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
