Re: Confusion regarding 'mark-sweep-compact' naming

2017-08-14 Thread Gil Tene


On Sunday, August 13, 2017 at 11:45:09 PM UTC-7, Kirk Pepperdine wrote:
>
> Hi, 
>
>
> > On Aug 14, 2017, at 7:47 AM, Peter Veentjer  > wrote: 
> > 
> > I have been improving my gc knowledge and there is some confusion on my 
> side regarding the 'mark-sweep-compact' algorithm I frequently see 
> mentioned in posts, articles and some not too formal books on the topic. 
> E.g. 
> > 
> > AFAIK mark-sweep-compact is not correct. It is either mark-sweep or 
> mark-compact where as part of the compacting process the dead objects are 
> removed. When I check more formal literature like "The Garbage Collection 
> Handbook: the Art of Automatic Memory Management", then there is no 
> mentioning of mark-sweep-compact. 
> > 
> > So I guess that when 'mark-sweep-compact' is mentioned, they are 
> actually referring to 'mark-compact'. Is this assumption correct? 
>
> The terminology around this is very loose so the very short answer is…. 
> sort of. So, a very terse but hopefully useful summary of collectors. 
>
> When we talk about mark-sweep we are generally referring to a class of 
> garbage collectors known as tracing collectors. The actual details of how 
> the mark and sweep for each of the different tracing collectors is 
> implemented depend upon a couple of factors. One of the most important 
> factors is how memory pool(s) (data structure in heap space) is organized. 
> I tend to label a collector as either “in-place” or “evacuation”. The most 
> trivial examples for both are; if you only have a single memory pool then 
> you will be performing an “in-place” collection as the data remains in the 
> that memory pool at the end of the collection. If you have two spaces you 
> can set this up as what is known has hemispheric collections. In this case 
> you mark the data in the currently active memory pool and then “evacuate” 
> it to the inactive pool. When complete active becomes inactive (thus empty) 
> and inactive becomes active. Wash, rinse, repeat…. 
>
> In-place collections will need a free-list and need to some how deal with 
> fragmentation (of the free-list). Thus they compact. When and how often is 
> an implementation detail. Evacuating collectors use a top of heap pointer 
> (and other more complex tricks) that act as a free list. Since they 
> evacuate to another pool, the data will naturally be compacted. Thus no 
> compaction will be necessary. As you can imagine, the time/space cost 
> models for these two general approaches are quite different. 
>
> In OpenJDK we have both generational and regional collectors. The 
> generational collector divides memory into 4 memory pools named Eden, 
> Survivor 0, Survivor 1, and Tenured space. The first three are grouped into 
> young and Tenured into old space. In young we can use mark-evacuation where 
> as in old we are forced to use mark-sweep-compact. I’m going to claim (I’m 
> sure Gil will want to chime in here) that G1, C4, Balance and Shenandoah 
> are regional collectors in that they divide heap into a large number of 
> memory pools (regions).


Agreed. These are all regional Mark-Compact evacuating collectors. But G1, 
Balanced and Shenandoah are olden-only regional collectors (or single 
generation in the current Shenandoah case), while C4 uses both a regional 
Mark-compact newgen collector and a regional Mark-Compact oldgen collector. 
As such, it gets all the nice-for-your-electron-budget benefits of 
generational collection along with all the nice benefits of regional 
collection.
 

> G1 aims for 2048 memory pools. In all 4 cases, the collectors are very 
> very different variants of mark-evacuation each with their own quirks. The 
> advantage of using regions is that you only need an evacuating collector as 
> you don’t end up with a terminal pool (tenured) where you are forced to use 
> an in-place collector. If you are careful in how you select regions to be 
> recovered, you can realize some nice wins over generational collectors. 
> However, these collectors tend to be far more complex and harder on your 
> electron budget than generational collectors are.


The harder-on-electron-budget thing is not algorithmically inherent to 
evacuating regional collectors. It usually has to do with other algorithmic 
choices that may be made in collectors that happen to be regional. E.g. 
some region-based incremental STW compactors (G1, Balanced) generally try 
to compact a small increment of the heap and fixup a hopefully-limited set 
of regions that refer to the compacted set, all in a single STW pause. 
These algorithms tend to rely on cross-region remembered-set tracking which 
involves some interesting electron budget implications, but more 
importantly, because the same regions will be involved in multiple fixup 
passes, the resulting N^2 complexity potential can also cause the electron 
budget to significantly inflate. Evacuating regional collector algorithms 
that do not rely on cross-region remembered sets, and 

Re: Confusion regarding 'mark-sweep-compact' naming

2017-08-14 Thread Gil Tene
I've taken to using Mark-Sweep-Compact, Mark-Sweep (no Compact), and 
Mark-Compact (no sweep) to describe the variants, because you will find all 
of them in the wild these days.

The confusion starts with the fact that some algorithms that people often 
refer to as "Mark-Sweep" are actually Mark-Sweep-Compact, while other are 
purely Mark-
Sweep (with no compaction). Mark-Sweep with no compaction usually means 
that free memory is recycled in place, with no object movement. A good 
example of this in practice is the CMS collector in the HotSpot JVM, which 
tracks unused memory in the oldgen in some form of free lists, but does not 
move any objects in the oldgen to compact it. Mark-Sweep *with* compaction 
(Mark-Sweep-Compact) obviously does compact. It usually implies the use of 
the knowledge established during a sweep (e.g. a list of the empty ranges) 
to compact the space "in place", It also usually involves a complete fixup 
of all references in all live objects. E.g. a simplistic description of a 
common Mark-Sweep-Compact technique would involves: (A) mark the live 
objects in the heap, (B) sweep the heap to find the empty ranges, (C) move 
all live objects to one side of the heap (filling in the empty spaces), (D) 
linearly scan all live objects and fix up any references in them to point 
to the correct target location. A good example of this in practice is the 
ParallelGC and SerialGC collectors in the HotSpot JVM.

Similarly, more confusion is caused by the fact that some algorithms that 
people often refer to as "Mark-Compact" are actually "Mark-Sweep-Compact", 
while other are purely Mark-Compact (no sweep). Again a good example of the 
first in common use would be ParallelGC in the HotSpot JVM (and the 
Wikipedia entry https://en.wikipedia.org/wiki/Mark-compact_algorithm). Good 
examples for the latter in common use can be found in various regional 
evacuating collectors (like C4, G1, etc.), where there is no sweep per-se 
(live object are evacuated to outside of the originally marked heap, and 
there is some mechanism for finding/tracking them to do so, but that 
mechanism is not the traditional sweep used by Mark-Sweep style of 
collectors). Fixup behavior in such algorithms can also vary quite a bit 
(e.g. C4 normally uses the next cycle's Mark to perform fixups for the 
previous Compact's relocated objects, and as such does not require a 
scanning fixup pass).

Another key difference often pointed to is that with in-place compaction 
(Mark-Sweep-Compact), no additional empty memory is required to guaranteed 
full compaction, while regional evacuating collectors need to evacuate into 
some empty memory, so their ability to compact may be limited by the amount 
of empty memory available outside of the currently occupied heap regions. 
However,. several well documented regional collector algorithms (e.g. 
Pauseless 
, C4 
, 
Compressor ) 
perform hand-over-hand compaction to guarantee forward progress in 
evacuation, such that a single empty region is sufficient to achieve full 
compaction in a single pass.

On Sunday, August 13, 2017 at 10:47:01 PM UTC-7, Peter Veentjer wrote:
>
> I have been improving my gc knowledge and there is some confusion on my 
> side regarding the 'mark-sweep-compact' algorithm I frequently see 
> mentioned in posts, articles and some not too formal books on the topic. 
> E.g.
>
>
> https://plumbr.eu/handbook/garbage-collection-algorithms/removing-unused-objects/compact
> https://www.infoq.com/news/2015/12/java-garbage-collection-minibook
>
> AFAIK mark-sweep-compact is not correct. It is either mark-sweep or 
> mark-compact where as part of the compacting process the dead objects are 
> removed. When I check more formal literature like "The Garbage Collection 
> Handbook: the Art of Automatic Memory Management", then there is no 
> mentioning of mark-sweep-compact.
>
> So I guess that when 'mark-sweep-compact' is mentioned, they are actually 
> referring to 'mark-compact'. Is this assumption correct?
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Confusion regarding 'mark-sweep-compact' naming

2017-08-14 Thread Kirk Pepperdine
Hi,


> On Aug 14, 2017, at 7:47 AM, Peter Veentjer  wrote:
> 
> I have been improving my gc knowledge and there is some confusion on my side 
> regarding the 'mark-sweep-compact' algorithm I frequently see mentioned in 
> posts, articles and some not too formal books on the topic. E.g.
> 
> AFAIK mark-sweep-compact is not correct. It is either mark-sweep or 
> mark-compact where as part of the compacting process the dead objects are 
> removed. When I check more formal literature like "The Garbage Collection 
> Handbook: the Art of Automatic Memory Management", then there is no 
> mentioning of mark-sweep-compact.
> 
> So I guess that when 'mark-sweep-compact' is mentioned, they are actually 
> referring to 'mark-compact'. Is this assumption correct?

The terminology around this is very loose so the very short answer is…. sort 
of. So, a very terse but hopefully useful summary of collectors.

When we talk about mark-sweep we are generally referring to a class of garbage 
collectors known as tracing collectors. The actual details of how the mark and 
sweep for each of the different tracing collectors is implemented depend upon a 
couple of factors. One of the most important factors is how memory pool(s) 
(data structure in heap space) is organized. I tend to label a collector as 
either “in-place” or “evacuation”. The most trivial examples for both are; if 
you only have a single memory pool then you will be performing an “in-place” 
collection as the data remains in the that memory pool at the end of the 
collection. If you have two spaces you can set this up as what is known has 
hemispheric collections. In this case you mark the data in the currently active 
memory pool and then “evacuate” it to the inactive pool. When complete active 
becomes inactive (thus empty) and inactive becomes active. Wash, rinse, repeat….

In-place collections will need a free-list and need to some how deal with 
fragmentation (of the free-list). Thus they compact. When and how often is an 
implementation detail. Evacuating collectors use a top of heap pointer (and 
other more complex tricks) that act as a free list. Since they evacuate to 
another pool, the data will naturally be compacted. Thus no compaction will be 
necessary. As you can imagine, the time/space cost models for these two general 
approaches are quite different.

In OpenJDK we have both generational and regional collectors. The generational 
collector divides memory into 4 memory pools named Eden, Survivor 0, Survivor 
1, and Tenured space. The first three are grouped into young and Tenured into 
old space. In young we can use mark-evacuation where as in old we are forced to 
use mark-sweep-compact. I’m going to claim (I’m sure Gil will want to chime in 
here) that G1, C4, Balance and Shenandoah are regional collectors in that they 
divide heap into a large number of memory pools (regions). G1 aims for 2048 
memory pools. In all 4 cases, the collectors are very very different variants 
of mark-evacuation each with their own quirks. The advantage of using regions 
is that you only need an evacuating collector as you don’t end up with a 
terminal pool (tenured) where you are forced to use an in-place collector. If 
you are careful in how you select regions to be recovered, you can realize some 
nice wins over generational collectors. However, these collectors tend to be 
far more complex and harder on your electron budget than generational 
collectors are.

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.