Re: [bitc-dev] Non copying Rc-Immix was Putting the stack on an RC-Immix heap

Ben Kloosterman Sun, 24 Nov 2013 20:36:16 -0800

On Mon, Nov 25, 2013 at 4:15 AM, Jonathan S. Shapiro <[email protected]>wrote:


> On Sun, Nov 24, 2013 at 1:45 AM, Ben Kloosterman <[email protected]>wrote:
>
>> People looking at Boehm / Mark Sweep  and uncooperative environments
>> should definetly look at non moving RC-Immix style collector.
>>
>
> Unfortunately, the false retention rate for conservative collectors is
> astonishingly high, and rises rapidly as a function of heap size (I speak
> from *bitter* experience). This can easily turn that 10% stuck object
> number into 30%, which is a different set of conditions entirely.
>
> Non-relocation and conservative retention are two different issues.
> Conservative collectors, of necessity, can't relocate anything that has
> been conservatively marked, but it's the fact that pointer identification
> is conservative that leads to the high false retention rate.
>

Interesting , though a lot of lit. says these false references are very low
i didnt think it was that bad when using boehm with mono (whi is the lit
wrong does it require high virt memory address to avoid false references
?).. anyway im not advocating non precise but  they are still better of
with RC-Immix.


>
>
>> The worst case is not a unsustained growth because new blocks are not
>> needed you can reallocate holes at decreasing mutator efficiency until all
>> space is  used.  We do have to be concerned about the fragmentation cost.
>>
>
> I've been wondering about that. My version of this question is: "How many
> islands can a 'free' block have and still reasonably be treated as free?"
>

The rate is configurable and they found 1 was best.


>
> If we ignore the question of nursery allocation, then I think free blocks
> have two purposes:
>
>    1. They provide for mid-sized object allocation, and
>    2. They avoid the need to leave holes in the current TLA's recycled
>    block.
>
> I suspect it might work out to define a free block as "a block having at
> least *k* free regions suitable for mid-sized object allocation".
> Allocating from the "free" block might now lead to unused subranges if
> allocation requests come in some unfortunate order. When you fall off the
> end of the free block, you stick that block in a queue, from which it will
> be taken the next time the mutator's TLA requires a recycled block. In this
> way the holes you left behind get filled in.
>

Yes that is the right way .. but you will favour groups of contiguous lines
 not number of free lines for mid size.

>
>
>> For fragmentation the best case is every object is a multiple of the line
>> size  in which case there is zero fragmentation , the worst case is where
>> you have lots of minimum sized objects ( say 32 bytes ) and exactly 1
>> survives for a 75 % fragmentation cost.  Also its tunable a smaller line
>> size has lower fragmention  at a slight cost to mutator performance. I
>> think the cost is about 10-15%  which could even drop by half for say a 64
>> byte line size if memory is crucial eg on a mobile device run a
>> SpaceEfficient allocator . Either way its not huge as many allocators have
>> some fragmentation cost and in some cases 4 word headers.  One thing to
>> note is nursery fragmentation is a bigger issue  due to the small size of
>> most short lived objects.
>>
>
> That's internal fragmentation, and it's not what worries me. What worries
> me is *external* fragmentation, which comes from free ranges of lines
> that don't get used because of bad luck in allocation sizes and orderings.
>

Unused portions of lines are external to the object ( so external
fragmenation yes headers are internal ) , but lack of large enough free
range is an issue . That said  LOBs / malloc  have the same issue , buddy
derived collectors i think have over 30% fragmentation . Malloc could
certainly relocate and defragment / compact but we dont as we think the
pain  is not worth the memory cost.  We do have a worse issue because those
collectors have  size based lists , while we could build a sized based
index in the background for mid sized blocks i think it will be counter
productive outside the mobile space .


>
>> The much bigger issue is Rc-Immix has poor performance ( compared to
>> cutting edge) for  heaps up to 1.5* and it isnt great until about 2 * heap
>> . The reasons is  not so much related to copying but I will restate for
>> emphasis.
>>  1) Either copying cost is not as efficient as mark sweep or  no copying
>> has poor Gen 1 locality ( the diffirence  between these is low)  ( for
>> 1-1.5* heap performance )
>> 2) More importantly because of the greater space availability  there is
>> far less work especially ref counting  so cycles are more dispersed.
>> Concurrency can help here depending on barrier cost .. but changes the
>> whole nature of the collector .
>>
>> The fact most GC heaps require 1.5+* memory for really good performance
>> is a bigger issue than 10% fragmentation.
>>
>
> No no. Those 10% islands will likely lead to 15% or 20% fragmentation.
> Dropping the heap requirement by 20% is the difference between a 1.5* heap
> at a given performance level and a 1.2* heap at the same performance.
> That's why this matters so much.
>

For really good performance you want extra blocks anyway guaranteeing
enough space for medium objects ( though instead of free blocks we want
mostly free which led me to sub blocks but mostly free will do) .. Im
perfectly happy saying you need 20-25% more memory for heap until we get a
Nursery happening and a mark sweep will do a lot of compaction and likely
half that .   With region analysis pushing things to the stack and regions
and with objects in the LOB its quite possible the heap itself will be
1/3-1/2 the size  so your looking at 7-12% of total data storage cost , and
probably half that for a nursery .

Note by having 8K blocks we push a lot of objects to the LOB ( compare CLR
where its 80K)  .... it is likely this has a higher fragmentation .


>
> I believe the best intial solutions is non relocating Rc-Immix
>> instrument it , optomize it which is much easier to do when you dont need
>> to worry about barrier costs / mutator non allocating behaviour .
>>
>
> Yes. Associated with a precise marking capability.
>
> But in parallel with this we want to do a lame-ass relocating
> implementation, because we want to keep ourselves honest at the runtime and
> language semantics layers.
>
> Though I'd still do a copying nursery in gen1.
>

The copy nursery global pause could do a lame ass-copy ..of some blocks
identified in the background.   A proper region system with analysis is
likely more valuable than non nursery copying .

>
>
>
> Also, the "1.5* heap" metric is a very peculiar metric, because nobody
> runs comparable examinations of manually allocated heaps.
>

Agree 100% ..  i nearly started a discussion on this.
- There is no comparison to malloc and its hidden fragmentation  (Which
could be good or bad)
- Min heap is the "No memory error" level , who decides the actual request
of pages from the kernel eg do most CLR implimentations request 2* min heap
?  ( RC-Immix has a high water mark scheme on the chunks when it reaches it
, it requests more )
- Except for rarer cycle detection ,Hybrid Gcs should play nicer with the
mm since they dont visit every page .

Ben

_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Re: [bitc-dev] Non copying Rc-Immix was Putting the stack on an RC-Immix heap

Reply via email to