On Mon, Nov 25, 2013 at 4:15 AM, Jonathan S. Shapiro <[email protected]>wrote:
> On Sun, Nov 24, 2013 at 1:45 AM, Ben Kloosterman <[email protected]>wrote: > >> People looking at Boehm / Mark Sweep and uncooperative environments >> should definetly look at non moving RC-Immix style collector. >> > > Unfortunately, the false retention rate for conservative collectors is > astonishingly high, and rises rapidly as a function of heap size (I speak > from *bitter* experience). This can easily turn that 10% stuck object > number into 30%, which is a different set of conditions entirely. > > Non-relocation and conservative retention are two different issues. > Conservative collectors, of necessity, can't relocate anything that has > been conservatively marked, but it's the fact that pointer identification > is conservative that leads to the high false retention rate. > Interesting , though a lot of lit. says these false references are very low i didnt think it was that bad when using boehm with mono (whi is the lit wrong does it require high virt memory address to avoid false references ?).. anyway im not advocating non precise but they are still better of with RC-Immix. > > >> The worst case is not a unsustained growth because new blocks are not >> needed you can reallocate holes at decreasing mutator efficiency until all >> space is used. We do have to be concerned about the fragmentation cost. >> > > I've been wondering about that. My version of this question is: "How many > islands can a 'free' block have and still reasonably be treated as free?" > The rate is configurable and they found 1 was best. > > If we ignore the question of nursery allocation, then I think free blocks > have two purposes: > > 1. They provide for mid-sized object allocation, and > 2. They avoid the need to leave holes in the current TLA's recycled > block. > > I suspect it might work out to define a free block as "a block having at > least *k* free regions suitable for mid-sized object allocation". > Allocating from the "free" block might now lead to unused subranges if > allocation requests come in some unfortunate order. When you fall off the > end of the free block, you stick that block in a queue, from which it will > be taken the next time the mutator's TLA requires a recycled block. In this > way the holes you left behind get filled in. > Yes that is the right way .. but you will favour groups of contiguous lines not number of free lines for mid size. > > >> For fragmentation the best case is every object is a multiple of the line >> size in which case there is zero fragmentation , the worst case is where >> you have lots of minimum sized objects ( say 32 bytes ) and exactly 1 >> survives for a 75 % fragmentation cost. Also its tunable a smaller line >> size has lower fragmention at a slight cost to mutator performance. I >> think the cost is about 10-15% which could even drop by half for say a 64 >> byte line size if memory is crucial eg on a mobile device run a >> SpaceEfficient allocator . Either way its not huge as many allocators have >> some fragmentation cost and in some cases 4 word headers. One thing to >> note is nursery fragmentation is a bigger issue due to the small size of >> most short lived objects. >> > > That's internal fragmentation, and it's not what worries me. What worries > me is *external* fragmentation, which comes from free ranges of lines > that don't get used because of bad luck in allocation sizes and orderings. > Unused portions of lines are external to the object ( so external fragmenation yes headers are internal ) , but lack of large enough free range is an issue . That said LOBs / malloc have the same issue , buddy derived collectors i think have over 30% fragmentation . Malloc could certainly relocate and defragment / compact but we dont as we think the pain is not worth the memory cost. We do have a worse issue because those collectors have size based lists , while we could build a sized based index in the background for mid sized blocks i think it will be counter productive outside the mobile space . > >> The much bigger issue is Rc-Immix has poor performance ( compared to >> cutting edge) for heaps up to 1.5* and it isnt great until about 2 * heap >> . The reasons is not so much related to copying but I will restate for >> emphasis. >> 1) Either copying cost is not as efficient as mark sweep or no copying >> has poor Gen 1 locality ( the diffirence between these is low) ( for >> 1-1.5* heap performance ) >> 2) More importantly because of the greater space availability there is >> far less work especially ref counting so cycles are more dispersed. >> Concurrency can help here depending on barrier cost .. but changes the >> whole nature of the collector . >> >> The fact most GC heaps require 1.5+* memory for really good performance >> is a bigger issue than 10% fragmentation. >> > > No no. Those 10% islands will likely lead to 15% or 20% fragmentation. > Dropping the heap requirement by 20% is the difference between a 1.5* heap > at a given performance level and a 1.2* heap at the same performance. > That's why this matters so much. > For really good performance you want extra blocks anyway guaranteeing enough space for medium objects ( though instead of free blocks we want mostly free which led me to sub blocks but mostly free will do) .. Im perfectly happy saying you need 20-25% more memory for heap until we get a Nursery happening and a mark sweep will do a lot of compaction and likely half that . With region analysis pushing things to the stack and regions and with objects in the LOB its quite possible the heap itself will be 1/3-1/2 the size so your looking at 7-12% of total data storage cost , and probably half that for a nursery . Note by having 8K blocks we push a lot of objects to the LOB ( compare CLR where its 80K) .... it is likely this has a higher fragmentation . > > I believe the best intial solutions is non relocating Rc-Immix >> instrument it , optomize it which is much easier to do when you dont need >> to worry about barrier costs / mutator non allocating behaviour . >> > > Yes. Associated with a precise marking capability. > > But in parallel with this we want to do a lame-ass relocating > implementation, because we want to keep ourselves honest at the runtime and > language semantics layers. > > Though I'd still do a copying nursery in gen1. > The copy nursery global pause could do a lame ass-copy ..of some blocks identified in the background. A proper region system with analysis is likely more valuable than non nursery copying . > > > > Also, the "1.5* heap" metric is a very peculiar metric, because nobody > runs comparable examinations of manually allocated heaps. > Agree 100% .. i nearly started a discussion on this. - There is no comparison to malloc and its hidden fragmentation (Which could be good or bad) - Min heap is the "No memory error" level , who decides the actual request of pages from the kernel eg do most CLR implimentations request 2* min heap ? ( RC-Immix has a high water mark scheme on the chunks when it reaches it , it requests more ) - Except for rarer cycle detection ,Hybrid Gcs should play nicer with the mm since they dont visit every page . Ben
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
