>
>
>
> In the end, I'm pretty sure that Ben is right, and that relocation is only
> necessary in corner cases. For those "stuck" objects, I'd really like to
> know what the in-degree is, and I'd like some sense of how read-hot the
> containing blocks are once the other 90% of the objects are gone. That
> would tell us if a "mostly not copying" design is feasible. At the moment I
> don't feel like I have any basis for intuitions on that.
>

I dont disagree with your comments . A few comments .

People looking at Boehm / Mark Sweep  and uncooperative environments should
definetly look at non moving RC-Immix style collector.

The worst case is not a unsustained growth because new blocks are not
needed you can reallocate holes at decreasing mutator efficiency until all
space is  used.  We do have to be concerned about the fragmentation cost.

For fragmentation the best case is every object is a multiple of the line
size  in which case there is zero fragmentation , the worst case is where
you have lots of minimum sized objects ( say 32 bytes ) and exactly 1
survives for a 75 % fragmentation cost.  Also its tunable a smaller line
size has lower fragmention  at a slight cost to mutator performance. I
think the cost is about 10-15%  which could even drop by half for say a 64
byte line size if memory is crucial eg on a mobile device run a
SpaceEfficient allocator . Either way its not huge as many allocators have
some fragmentation cost and in some cases 4 word headers.  One thing to
note is nursery fragmentation is a bigger issue  due to the small size of
most short lived objects.

Potentially we could even fill fragmented lines  but the cost of
 allocating to these small holes is pretty big it would be an allocation
list .

The much bigger issue is Rc-Immix has poor performance ( compared to
cutting edge) for  heaps up to 1.5* and it isnt great until about 2 * heap
. The reasons is  not so much related to copying but I will restate for
emphasis.
1) Either copying cost is not as efficient as mark sweep or  no copying has
poor Gen 1 locality ( the diffirence  between these is low)  ( for 1-1.5*
heap performance )
2) More importantly because of the greater space availability  there is far
less work especially ref counting  so cycles are more dispersed.
Concurrency can help here depending on barrier cost .. but changes the
whole nature of the collector .

The fact most GC heaps require 1.5+* memory for really good performance is
a bigger issue than 10% fragmentation.

I believe the best intial solutions is non relocating Rc-Immix   instrument
it , optomize it which is much easier to do when you dont need to worry
about barrier costs / mutator non allocating behaviour .

For v2 I think this can be significantly improved by adding a  fixed
Nursery  ( eg N blocks which are always used first and swept) for 1 - 1.5 *
heap performance and it reduces a lot of our fragmentation this only
requires a cheap write barrier with a fast and slow path  ie  if a nursery
is the first N blocks all we need is a conditional on write to see if the
address is < x - still a cost but much cheaper than a card table actually
since we have a modbuf  we could avoid this and process the  mod buf when
the nursery is swept ) .

After this we can consider the high concurrency cases we can either resort
to non copying or  have individual mark / Sweep  nurseries for each
thread/core which need to be synced ( as discussed earlier possibly by
forcing sweep if a nursery references another nursery object or disallowing
it)  or more complex barriers which would also allow the main heap to copy.



>
> Unfortunately, the only way to get that data appears to be building an
> implementation we can measure. As nice and as clean as Jikes is, it is
> built to support a language with a qualitatively different set of objects
> and a significantly different BCVM semantics available to the compiler. And
> we're contemplating making further changes to the BCVM semantics.
>
>
Agree best way is to build it with instrumentation first . Unfortunately we
have no base line .. thats the nice part of Jikes so may need to write a
Mark sweep or ref count as a comparison .  Though we could just use malloc
as the baseline ..


> Do we know of any comparative data on *Java* programs between CLR and JVM
> that would tell us whether, for those programs, the behaviors of the
> respective BCVM machines are comparable?
>

There is nothing published im aware of .. I can only guess at things like
bigger but many fewer  objects.  ( which is an issue with the 8K block size
/ LOB size ) .

Ben
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to