Re: Confusion regarding 'mark-sweep-compact' naming

2019-08-19 Thread Steve Blackburn
What I said above was not as sharp as it should have been.

I see a crisp distinction between 'evacuation' and 'compaction'.

- Evacuation moves (evacuates) survivors into a pre-existing reserve 
(however small) of heap space.
- Compaction requires no such reserve; survivors are only moved into space 
freed in the process of performing the compaction.

Evacuation may be done at a very fine grain, requiring absolutely minimal 
reserves, or in the other extreme, it may require reserving half the heap 
(semi-space).   Still, I would say 'evacuation' so long as some free 
reserve (however small) is required in order for the algorithm to proceed. 
  Thus while "in-place compaction" is a good description, I'd say it is a 
tautology.

I know this crisp distinction has not been universally observed in the 
literature, but I think it is helpful because it is absolutely crisp, and 
cuts to the very heart of the underlying algorithm.

We use the term 'defragmentation' to describe a fine-grained evacuation 
strategy (for example, one that moves single objects into small holes in 
the heap).   Still, we'd call this an evacuating collector, not a 
compacting collector.   We thought long and hard about this at the time we 
did our work, and settled on the above for the above reasons.

So I'd ask the question: does this algorithm require a reserve of heap 
space (even a small one) before it can proceed.   If so, it's evacuation.

One more thing: obviously we can and do routinely build collectors that are 
hybrids, so there's absolutely no reason why a collector could not do both.

--Steve

On Saturday, 17 August 2019 03:08:47 UTC+10, Gil Tene wrote:
>
>
>
> Sent from my iPad
>
> On Aug 16, 2019, at 2:53 AM, Steve Blackburn  > wrote:
>
> For what it is worth, Kathryn and I have previously clarified some of this 
> nomenclature (see our Immix paper 
> ). 
>
> Marking: identifying liveness via a trace
> Sweeping: identifying free space while leaving live objects in place 
> (classic mark-sweep and immix do this)
> Evacuating: freeing space by moving survivors to another place
> Compacting:  freeing space moving survivors within the same place.
>
>
> :-)
>
> Now we have the “is it Compacting or Evacuating?” question...
>
> Because of Quick Release (C4 paper [1], section 2.5, and originally 
> described in 2005 in the Pauseless paper [2], section 4.2) C4 [1] , 
> Pauseless [2], Collie [3], Compressor [4], and ZGC [reference needed] are 
> all examples of collectors created in the past ~15 years that can be 
> classified as either/both Compacting or Evacuating under the nomenclature 
> above.
>
> At first glance, one might be tempted to say that they are simply 
> Evacuating collectors: from the point of view of each region, they clearly 
> appear to free space by moving survivors to another place (to outside of 
> the region), rather then moving them within the same place (the same 
> region).
>
> But from an actual memory space perspective (the actual memory storage 
> used by any set of regions being compacted) they are clearly Compacting, 
> since with the hand-over-hand compaction enabled by Quick Release, 
> surviving objects are clearly moved within the same “place”: the set of 
> actual storage being compacted in a given relocate phase, where the 
> survivors are before the moving started. These collectors exhibit a 
> tell-tale quality of “in place” compaction: they move survivors directly on 
> top where other survivors were just before they themselves were moved, all 
> within the same pass.
>
> In virtual memory, these collectors have a clear separation between the 
> two places that are the “from space” and the “to space” for a given 
> relocation pass. Each is a distinct set of virtual memory ranges, and they 
> do not overlap.
>
> But in actual storage, that is not the case in any of these collectors, as 
> the “from space” and “to space” for a given relocation pass overlap with 
> each other in physical space.
>
> I obviously consider C4 to be a Compacting collector (it’s right there in 
> the name: “The Continuously Concurrent Compacting Collector”), and would 
> therefore classify Pauseless, Collie, Compressor, and ZGC as Compacting as 
> well.
>
> But I would also consider them all to be Evacuating regional collectors. 
> Because that behavior (in logical/virtual memory, which is where our 
> thoughts focus most of the time) clearly fits too. Region evacuation is 
> core to Pauseless and C4, and the C4 name was actually a nod at how it 
> frees memory by “blowing up” individual regions and moving all live objects 
> “elsewhere”.
>
> However, I wouldn’t call C4/Pauseless/Collie/Compressor/ZGC “in place 
> Compactors”, because even tho at the physical storage level that is what 
> happens, my brain hurts when I try to keep that picture in my head too long.
>
> We too attempted to further clarify the nomenculture in the Collie paper 
> [3] by 

Re: Confusion regarding 'mark-sweep-compact' naming

2019-08-19 Thread Steve Blackburn
It seems my last reply got lost, so I'll try again...

For me the distinction between evacuating and compacting is crisp: while 
both move objects, evacuation requires a reserve, compaction does not.

A compacting collector can use all available space because it moves objects 
into space regained in the process of its collection.

By contrast, an evacuating collector must put aside a reserve (however 
small).   So semispace is the poster-child evacuating collector.  More 
subtly, I describe immix's defragmentation as evacuation even though it 
does not need an explicit reserve---it moves objects into 'holes' generated 
by a previous collection.   This is an implicit reserve---space un-utilized 
by the mutator, not space scavenged by the same instance of the collection 
that performs the object movement.

This distinction between requiring a reserve and not requiring a reserve is 
algorithmically fundamental, which makes it a useful distinction in the 
nomenclature of collectors.

As an aside, sweeping need not visit dead objects.   For example, in a 
classic segregated fits free list, a sweep will identify entirely free 
blocks without visiting any of the constituents.   Likewise we describe 
immix as 'sweep to region', where 'holes' are identified, but the objects 
within it are not visited.   I'd describe sweeping as freeing memory by 
traversing the entire collected region (or metadata corresponding to that 
entire region), whether or not visiting objects within that region.

--Steve


On Monday, 19 August 2019 05:34:59 UTC+10, Gil Tene wrote:
>
>
>
> On Friday, August 16, 2019 at 1:30:10 AM UTC-7, Aleksey Shipilev wrote:
>>
>> On 8/16/19 10:07 AM, Gil Tene wrote: 
>> > Classification terms evolve as the art evolves. 
>>
>> What I want to emphasize here that this discussion reinvents the meaning 
>> of "sweep". That definition 
>> is not used in the way you describe in the sources I know of. Granted, 
>> definitions drift over time, 
>> but we have to be careful to separate what is the "agreed on" definition, 
>> and what is whatever 
>> definition we want to be the emergent one. 
>>
>> > On Thursday, August 15, 2019 at 1:42:23 AM UTC-7, Aleksey Shipilev 
>> wrote: 
>> > I am trying to understand what is your central argument here. This 
>> seems to be it. Are you saying 
>> > that "sweeping" is when you visit dead objects, and non-"sweeping" 
>> is when you do not? 
>> > 
>> > Yes. I think that's a very good summary. 
>>
>> Would be helpful if we started from this next time around! 
>>
>
> True. Maybe this discussion will help others start from there next time. 
> It's certainly helped me hone
> on that specific summary.
>  
>
>>
>> > Is walking _dead_ objects the discriminator for "sweeping" then? So 
>> in your book, if we take the 
>> > same Lisp2 collector, and compute-new-addresses and adjust-pointers 
>> steps are walking the 
>> > self-parsable heap (visiting dead objects along the way), it is 
>> M-S-C? But if it uses marking 
>> > bitmap 
>> > (thus only visiting live objects), it becomes M-C? [This would be a 
>> weird flex, but okay]. 
>> > 
>> > 
>> > Exactly. "In my book" adding an efficient livemark bit vector with 
>> possible summary layers would 
>> > covert the classic LISP2 GC from a Mark-Sweep-Compact to a Mark-Compact 
>> (with no sweep). 
>>
>> So this is what irks me here. In my Epsilon's mark-compact toy collector 
>> (https://shipilev.net/jvm/diy-gc/), changing from mark bitmaps to 
>> self-parsable heap traversal is 
>> about 5 lines of code. This does not feel like passing the bar for 
>> reclassification of the collector 
>> algo between major classes.
>
>
> Self-parseable heaps are expensive to traverse, and that expense grows 
> with the size non-live
> part of the heap. Being sensitive to live-occupancy % (or not) is a 
> fundamental quality in
> a collector.
>
> If a "5 lines of code" change allows your collector to no longer be 
> sensitive to heap size, and
> stay sensitive only to the live set, it is well worth it.  And it also 
> changes its classification.
>
> E.g. sometimes changing from a linked list to an array list touches only 
> one line of code, but
> that change can dramatically alter the classification from a complexity 
> and fit-for-use point
> of view.
>  
>
>> It feels more like the implementation detail that can swing both ways, 
>> depending on circumstances. 
>>
>
> ParalellGC probably wouldn't call the file that implements it's
> Mark-Compact (with sweep) algorithm "MarkSweep.cpp", if the authors
> of that code thought of their implementation as a Mark Compact that
> does not include a sweep, or if they thouight of the sweeping as a
> small implementation detail.
>
> If it were just an implementation detail, there wouldn't be scores of 
> academic
> papers that seems to classify the "Mark Compact" (the historic kind that 
> visits all
> dead objects, so implies sweeping) as inherently less CPU-efficient, and 
> back
> 

Re: Confusion regarding 'mark-sweep-compact' naming

2019-08-18 Thread Gil Tene


On Friday, August 16, 2019 at 1:30:10 AM UTC-7, Aleksey Shipilev wrote:
>
> On 8/16/19 10:07 AM, Gil Tene wrote: 
> > Classification terms evolve as the art evolves. 
>
> What I want to emphasize here that this discussion reinvents the meaning 
> of "sweep". That definition 
> is not used in the way you describe in the sources I know of. Granted, 
> definitions drift over time, 
> but we have to be careful to separate what is the "agreed on" definition, 
> and what is whatever 
> definition we want to be the emergent one. 
>
> > On Thursday, August 15, 2019 at 1:42:23 AM UTC-7, Aleksey Shipilev 
> wrote: 
> > I am trying to understand what is your central argument here. This 
> seems to be it. Are you saying 
> > that "sweeping" is when you visit dead objects, and non-"sweeping" 
> is when you do not? 
> > 
> > Yes. I think that's a very good summary. 
>
> Would be helpful if we started from this next time around! 
>

True. Maybe this discussion will help others start from there next time. 
It's certainly helped me hone
on that specific summary.
 

>
> > Is walking _dead_ objects the discriminator for "sweeping" then? So 
> in your book, if we take the 
> > same Lisp2 collector, and compute-new-addresses and adjust-pointers 
> steps are walking the 
> > self-parsable heap (visiting dead objects along the way), it is 
> M-S-C? But if it uses marking 
> > bitmap 
> > (thus only visiting live objects), it becomes M-C? [This would be a 
> weird flex, but okay]. 
> > 
> > 
> > Exactly. "In my book" adding an efficient livemark bit vector with 
> possible summary layers would 
> > covert the classic LISP2 GC from a Mark-Sweep-Compact to a Mark-Compact 
> (with no sweep). 
>
> So this is what irks me here. In my Epsilon's mark-compact toy collector 
> (https://shipilev.net/jvm/diy-gc/), changing from mark bitmaps to 
> self-parsable heap traversal is 
> about 5 lines of code. This does not feel like passing the bar for 
> reclassification of the collector 
> algo between major classes.


Self-parseable heaps are expensive to traverse, and that expense grows with 
the size non-live
part of the heap. Being sensitive to live-occupancy % (or not) is a 
fundamental quality in
a collector.

If a "5 lines of code" change allows your collector to no longer be 
sensitive to heap size, and
stay sensitive only to the live set, it is well worth it.  And it also 
changes its classification.

E.g. sometimes changing from a linked list to an array list touches only 
one line of code, but
that change can dramatically alter the classification from a complexity and 
fit-for-use point
of view.
 

> It feels more like the implementation detail that can swing both ways, 
> depending on circumstances. 
>

ParalellGC probably wouldn't call the file that implements it's
Mark-Compact (with sweep) algorithm "MarkSweep.cpp", if the authors
of that code thought of their implementation as a Mark Compact that
does not include a sweep, or if they thouight of the sweeping as a
small implementation detail.

If it were just an implementation detail, there wouldn't be scores of 
academic
papers that seems to classify the "Mark Compact" (the historic kind that 
visits all
dead objects, so implies sweeping) as inherently less CPU-efficient, and 
back
that category-wide conclusion with analyses and measurements that all use
sweeping in their Mark-Compact implementations.

And there wouldn't be similar analyses that seem to classify all 
non-Mark-Compact
moving collectors (putting them into the only remaining categories of 
semi-space or 
regional evacuating) as less memory-efficient based on their assumed need
for space outside of the "in place" compaction that canonical Mark-Compact
(with implied sweeping) is known to be able to do.

These classifications (historically assuming that Mark-Compact includes
visiting all dead objects, and/or that anything else can't do in-place 
compaction)
both lead to historical conclusions that no longer apply to more modern
implementations that still clearly belong in the Mark-Compact category.

Here are two classifications of implementation-orthogonal characteristics
of any moving collector, and are fundamental to analyzing fit for use in
various common scenarios:

1. Do you visit all objects, or just the live ones?
[which I call "do you sweep?]

2. Do you need extra space in order to compact, or can you
perform the compaction within the memory previously occupied
by dead objects?
[which I don't have a great name for, but it is demonstrably not 
 "do you do in place compaction"?]

E.g. for an algorithm used in a young generation collector, which 
fundamentally 
hopes to leverage the observed quality that the generation's live set is
much smaller than the generation's heap size, the first quality is 
critical. 

And e.g. for an algorithm used for the oldest generation, and is looking to
be able to handle large live occupancy % (e.g. in the 40%-90%), the second
quality matters quite a 

Re: Confusion regarding 'mark-sweep-compact' naming

2019-08-16 Thread Gil Tene


Sent from my iPad

On Aug 16, 2019, at 2:53 AM, Steve Blackburn 
mailto:steve1blackb...@gmail.com>> wrote:

For what it is worth, Kathryn and I have previously clarified some of this 
nomenclature (see our Immix 
paper).

Marking: identifying liveness via a trace
Sweeping: identifying free space while leaving live objects in place (classic 
mark-sweep and immix do this)
Evacuating: freeing space by moving survivors to another place
Compacting:  freeing space moving survivors within the same place.

:-)

Now we have the “is it Compacting or Evacuating?” question...

Because of Quick Release (C4 paper [1], section 2.5, and originally described 
in 2005 in the Pauseless paper [2], section 4.2) C4 [1] , Pauseless [2], Collie 
[3], Compressor [4], and ZGC [reference needed] are all examples of collectors 
created in the past ~15 years that can be classified as either/both Compacting 
or Evacuating under the nomenclature above.

At first glance, one might be tempted to say that they are simply Evacuating 
collectors: from the point of view of each region, they clearly appear to free 
space by moving survivors to another place (to outside of the region), rather 
then moving them within the same place (the same region).

But from an actual memory space perspective (the actual memory storage used by 
any set of regions being compacted) they are clearly Compacting, since with the 
hand-over-hand compaction enabled by Quick Release, surviving objects are 
clearly moved within the same “place”: the set of actual storage being 
compacted in a given relocate phase, where the survivors are before the moving 
started. These collectors exhibit a tell-tale quality of “in place” compaction: 
they move survivors directly on top where other survivors were just before they 
themselves were moved, all within the same pass.

In virtual memory, these collectors have a clear separation between the two 
places that are the “from space” and the “to space” for a given relocation 
pass. Each is a distinct set of virtual memory ranges, and they do not overlap.

But in actual storage, that is not the case in any of these collectors, as the 
“from space” and “to space” for a given relocation pass overlap with each other 
in physical space.

I obviously consider C4 to be a Compacting collector (it’s right there in the 
name: “The Continuously Concurrent Compacting Collector”), and would therefore 
classify Pauseless, Collie, Compressor, and ZGC as Compacting as well.

But I would also consider them all to be Evacuating regional collectors. 
Because that behavior (in logical/virtual memory, which is where our thoughts 
focus most of the time) clearly fits too. Region evacuation is core to 
Pauseless and C4, and the C4 name was actually a nod at how it frees memory by 
“blowing up” individual regions and moving all live objects “elsewhere”.

However, I wouldn’t call C4/Pauseless/Collie/Compressor/ZGC “in place 
Compactors”, because even tho at the physical storage level that is what 
happens, my brain hurts when I try to keep that picture in my head too long.

We too attempted to further clarify the nomenculture in the Collie paper [3] by 
differentiating different forms of Compaction in terms of whether or not the 
virtual address ranges of the from space and to space overlap, with “in-place 
compactors” being an example of a subset of compactors that exhibit such 
overlap:

“... Compaction of objects involves copying the reachable objects from a 
from-space into mostly contiguous memory locations in a to-space, replacing all 
pointers to from-space locations with corresponding pointers to to-space 
locations and reclaiming from-space memory for re-use. While the “from” and the 
“to” spaces could technically overlap in some collectors (such as the case in 
in-place compactors), this paper is mostly concerned with compactors that use 
non-overlapping from-space and to-space address ranges.”

This description is useful in the Collie paper, because it then delves into the 
ways that  consistency of object state is maintained during compaction, in any 
concurrent compacting collector, and introduces new terms useful in analysis: 
e.g. referrer set, transplantation, the constraints needed for general 
concurrent transplantation, and the key distinction between individually 
transplantable and non-individually transplantable objects that the paper then 
builds in to describe a wait free compactor.

[1] “C4: The Continuously Concurrent Compacting Collector” 
https://www.azul.com/files/c4_paper_acm1.pdf
[2] “The Pauseless GC Algorithm” 
https://www.usenix.net/legacy/events/vee05/full_papers/p46-click.pdf
[3] “The Collie: A Wait-Free Compacting Collector” 
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.406.7321=rep1=pdf
[4] “The Compressor: Concurrent, Incremental, and Parallel Compaction” 
https://www.cs.utexas.edu/~speedway/fp031-kermany.pdf


We use the term 

Re: Confusion regarding 'mark-sweep-compact' naming

2019-08-16 Thread Steve Blackburn
For what it is worth, Kathryn and I have previously clarified some of this 
nomenclature (see our Immix paper 
).

Marking: identifying liveness via a trace
Sweeping: identifying free space while leaving live objects in place 
(classic mark-sweep and immix do this)
Evacuating: freeing space by moving survivors to another place
Compacting:  freeing space moving survivors within the same place.

We use the term 'defragmentation' to refer to the use of evacuation to free 
up space in a regional collector.

--Steve

On Friday, 16 August 2019 18:30:10 UTC+10, Aleksey Shipilev wrote:
>
> On 8/16/19 10:07 AM, Gil Tene wrote: 
> > Classification terms evolve as the art evolves. 
>
> What I want to emphasize here that this discussion reinvents the meaning 
> of "sweep". That definition 
> is not used in the way you describe in the sources I know of. Granted, 
> definitions drift over time, 
> but we have to be careful to separate what is the "agreed on" definition, 
> and what is whatever 
> definition we want to be the emergent one. 
>
> > On Thursday, August 15, 2019 at 1:42:23 AM UTC-7, Aleksey Shipilev 
> wrote: 
> > I am trying to understand what is your central argument here. This 
> seems to be it. Are you saying 
> > that "sweeping" is when you visit dead objects, and non-"sweeping" 
> is when you do not? 
> > 
> > Yes. I think that's a very good summary. 
>
> Would be helpful if we started from this next time around! 
>
> > Is walking _dead_ objects the discriminator for "sweeping" then? So 
> in your book, if we take the 
> > same Lisp2 collector, and compute-new-addresses and adjust-pointers 
> steps are walking the 
> > self-parsable heap (visiting dead objects along the way), it is 
> M-S-C? But if it uses marking 
> > bitmap 
> > (thus only visiting live objects), it becomes M-C? [This would be a 
> weird flex, but okay]. 
> > 
> > 
> > Exactly. "In my book" adding an efficient livemark bit vector with 
> possible summary layers would 
> > covert the classic LISP2 GC from a Mark-Sweep-Compact to a Mark-Compact 
> (with no sweep). 
>
> So this is what irks me here. In my Epsilon's mark-compact toy collector 
> (https://shipilev.net/jvm/diy-gc/), changing from mark bitmaps to 
> self-parsable heap traversal is 
> about 5 lines of code. This does not feel like passing the bar for 
> reclassification of the collector 
> algo between major classes. It feels more like the implementation detail 
> that can swing both ways, 
> depending on circumstances. 
>
> -Aleksey 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
To view this discussion on the web, visit 
https://groups.google.com/d/msgid/mechanical-sympathy/869e1ac0-7edd-43ca-afe1-973b0410ff76%40googlegroups.com.


Re: Confusion regarding 'mark-sweep-compact' naming

2019-08-16 Thread Aleksey Shipilev
On 8/16/19 10:07 AM, Gil Tene wrote:
> Classification terms evolve as the art evolves.

What I want to emphasize here that this discussion reinvents the meaning of 
"sweep". That definition
is not used in the way you describe in the sources I know of. Granted, 
definitions drift over time,
but we have to be careful to separate what is the "agreed on" definition, and 
what is whatever
definition we want to be the emergent one.

> On Thursday, August 15, 2019 at 1:42:23 AM UTC-7, Aleksey Shipilev wrote:
> I am trying to understand what is your central argument here. This seems 
> to be it. Are you saying
> that "sweeping" is when you visit dead objects, and non-"sweeping" is 
> when you do not?
> 
> Yes. I think that's a very good summary.

Would be helpful if we started from this next time around!

> Is walking _dead_ objects the discriminator for "sweeping" then? So in 
> your book, if we take the
> same Lisp2 collector, and compute-new-addresses and adjust-pointers steps 
> are walking the
> self-parsable heap (visiting dead objects along the way), it is M-S-C? 
> But if it uses marking
> bitmap
> (thus only visiting live objects), it becomes M-C? [This would be a weird 
> flex, but okay].
> 
> 
> Exactly. "In my book" adding an efficient livemark bit vector with possible 
> summary layers would
> covert the classic LISP2 GC from a Mark-Sweep-Compact to a Mark-Compact (with 
> no sweep).

So this is what irks me here. In my Epsilon's mark-compact toy collector
(https://shipilev.net/jvm/diy-gc/), changing from mark bitmaps to self-parsable 
heap traversal is
about 5 lines of code. This does not feel like passing the bar for 
reclassification of the collector
algo between major classes. It feels more like the implementation detail that 
can swing both ways,
depending on circumstances.

-Aleksey

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
To view this discussion on the web, visit 
https://groups.google.com/d/msgid/mechanical-sympathy/3bc6ee45-67d2-076c-879a-4bbf487f63a4%40gmail.com.


signature.asc
Description: OpenPGP digital signature


Re: Confusion regarding 'mark-sweep-compact' naming

2019-08-16 Thread Gil Tene
You make a good point about historical terminology classifying "mark/sweep" 
and
"mark/compact". But that terminology predates, and could not have 
anticipated, the
multitude of (sometimes fairly dramatic) variants within the universe of 
compacting
collectors that came in the decades that followed.

I'll remind folks that back when those terms were used to mean those 
specific
things, the term "parallel" was also used to describe what we've evolved to
calling "concurrent" collectors for the past 25 years or so.

Classification terms evolve as the art evolves.

On Thursday, August 15, 2019 at 1:42:23 AM UTC-7, Aleksey Shipilev wrote:
>
> On 8/14/19 9:21 PM, Gil Tene wrote: 
> > The fact that they visit and parse dead objects in order to identify 
> recycle-able memory is what makes them 
> > sweepers. Other techniques that don't visit dead objects do not perform 
> a sweep. 
>
> I am trying to understand what is your central argument here. This seems 
> to be it. Are you saying 
> that "sweeping" is when you visit dead objects, and non-"sweeping" is when 
> you do not? 
>

Yes. I think that's a very good summary.

To draw a specific line: If you end up visiting each individual dead 
object, you are sweeping.
If you only visit each "hole" once (which may comprise of 1 or more dead 
objects, and covers
everything between one line object and the next), the number of visits you 
make will be no
larger than the number of live objects, and you would not be sweeping.
 

>
> >> I would leave "sweep" as something that reclaims the storage of the 
> objects (i.e. "sweeps away" the 
> >> garbage), which makes M-C and M-S classes more clearly defined and 
> easier to reason about. 
> > 
> > When M-C implies "also does a full linear traversal that visits all dead 
> objects" (aka "a sweep"), that> terminology fails. Examples like C4, G1GC, 
> Shenandoah, and ZGC are all Mark-Compact (NO Sweep) 
> > collectors, while ParallelGC and Serial GC are Mark-Compact (WITH Sweep) 
> collectors. Since the math 
> > behind those varies dramatically in wasy that effect everyday choices, 
> using "M-C" as a classification 
> > that covers both leads people to misunderstand the interactions between 
> e.g. heap size and GC work. 
>
> M-C does not imply "full linear traversal that visits all dead objects". 


We agree on that.

I am saying that M-C (with sweep) does imply that, and M-C (with no sweep) 
does not.

And I am pointing out that the difference between the two is profound 
enough that
use of "M-C" with no qualification makes the following misquotation of 
Stanislaw Ulam
a good fit:

"Describing all non-copying relocating collectors as 'Mark Compact' is like 
referring to the
bulk of zoology as the study of non-elephant animals."
 

> As the counter-example, it is very practical to use off-side marking 
> bitmaps to walk only live 

objects of the heap. Careful design of such the marking bitmap would yield 
> dramatically better 

results than using whatever self-parsable-heap walk. You can even make it 
> multi-level and 

quite dense to skip over large chunks of sparse heap. 


Yes. both straight livemark bit vectors and livemark bit vectors with 
summaries allow you to
avoid visiting individual dead objects, by effciency jumping form one live 
object to another
without ever visiting or parsing a dead object. And neither one would be 
"sweeping".


> When the marking phase is a separate phase, it stands to reason that the 
> subsequent phases would 
> have to process the marking results somehow. That would naturally do some 
> sort of walk over the data 
> structure that handles marks. If we call that walk "sweeping", then every 
> Mark-* algorithm is 
> necessarily "sweeping" too. So we have to break this somehow to make the 
> definition viable. 
>

Sweeping in not the processing of marking results, or the visiting or 
processing of the live objects.
It is the visiting and processing of dead objects.
 

>
> Is walking _dead_ objects the discriminator for "sweeping" then? So in 
> your book, if we take the 
> same Lisp2 collector, and compute-new-addresses and adjust-pointers steps 
> are walking the 
> self-parsable heap (visiting dead objects along the way), it is M-S-C? But 
> if it uses marking bitmap 
> (thus only visiting live objects), it becomes M-C? [This would be a weird 
> flex, but okay]. 
>

Exactly. "In my book" adding an efficient livemark bit vector with possible 
summary layers would
covert the classic LISP2 GC from a Mark-Sweep-Compact to a Mark-Compact 
(with no sweep).

ParallelGC doesn't do that, and sticks with the classic Mark-Sweep-Compact. 
But if it did
add that mechanism, it would significantly improve in efficiency for 
low-occupancy oldgens,
and significantly reduce it's sensitivity to heap size (for a fixed live 
set).

The reason I am keenly aware of the difference is that we use a concurrent 
Mark-Compact
(no sweep) collector not only for the oldgen in C4, but also for the young 

Re: Confusion regarding 'mark-sweep-compact' naming

2019-08-15 Thread Aleksey Shipilev
On 8/14/19 9:21 PM, Gil Tene wrote:
> The fact that they visit and parse dead objects in order to identify 
> recycle-able memory is what makes them
> sweepers. Other techniques that don't visit dead objects do not perform a 
> sweep.

I am trying to understand what is your central argument here. This seems to be 
it. Are you saying
that "sweeping" is when you visit dead objects, and non-"sweeping" is when you 
do not?

>> I would leave "sweep" as something that reclaims the storage of the objects 
>> (i.e. "sweeps away" the
>> garbage), which makes M-C and M-S classes more clearly defined and easier to 
>> reason about.
> 
> When M-C implies "also does a full linear traversal that visits all dead 
> objects" (aka "a sweep"), that> terminology fails. Examples like C4, G1GC, 
> Shenandoah, and ZGC are all Mark-Compact (NO Sweep)
> collectors, while ParallelGC and Serial GC are Mark-Compact (WITH Sweep) 
> collectors. Since the math
> behind those varies dramatically in wasy that effect everyday choices, using 
> "M-C" as a classification
> that covers both leads people to misunderstand the interactions between e.g. 
> heap size and GC work.

M-C does not imply "full linear traversal that visits all dead objects". As the 
counter-example, it
is very practical to use off-side marking bitmaps to walk only live objects of 
the heap. Careful
design of such the marking bitmap would yield dramatically better results than 
using whatever
self-parsable-heap walk. You can even make it multi-level and quite dense to 
skip over large chunks
of sparse heap.

When the marking phase is a separate phase, it stands to reason that the 
subsequent phases would
have to process the marking results somehow. That would naturally do some sort 
of walk over the data
structure that handles marks. If we call that walk "sweeping", then every 
Mark-* algorithm is
necessarily "sweeping" too. So we have to break this somehow to make the 
definition viable.

Is walking _dead_ objects the discriminator for "sweeping" then? So in your 
book, if we take the
same Lisp2 collector, and compute-new-addresses and adjust-pointers steps are 
walking the
self-parsable heap (visiting dead objects along the way), it is M-S-C? But if 
it uses marking bitmap
(thus only visiting live objects), it becomes M-C? [This would be a weird flex, 
but okay].

-Aleksey


-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
To view this discussion on the web, visit 
https://groups.google.com/d/msgid/mechanical-sympathy/1de1d683-164c-7163-7138-ee2f12f66638%40gmail.com.


signature.asc
Description: OpenPGP digital signature


Re: Confusion regarding 'mark-sweep-compact' naming

2019-08-14 Thread Gil Tene


> On Aug 14, 2019, at 10:40 AM, Aleksey Shipilev  
> wrote:
> 
> On 8/14/19 11:30 AM, Gil Tene wrote:
>> For an easy to follow code example of Mark, Sweep, and Compact, look at the 
>> (ahem...) psMarkSweep.c
>> code for the compacting OpenJDK ParallelGC oldgen implementation, which does 
>> 4 separate passes (
>> http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/7576bbd5a03c/src/share/vm/gc_implementation/parallelScavenge/psMarkSweep.cpp#l201
>>  ):
>> 
>> 1. A  tracing, pointer chasing pass that marks the live set.
>> Line 514: // Recursively traverse all live objects and mark them
>> 
>> 2. A sweeping pass that computes the new object addresses
>> Line 579: // Now all live objects are marked, compute the new object 
>> addresses.
>> (new addresses are computed to ”shift everything to one side”,
>>  which, when actually done in pass 4, will compact the heap in place)
>> 
>> 3. A sweeping pass to adjust references to point to the new object addresses:
>> Line 598: // Adjust the pointers to reflect the new locations
>> (this is sometimes referred to as a fixup pass)
>> 
>> 4. A compact pass that moves the objects to their previously chosen target 
>> locations.
>> Line 645: // All pointers are now adjusted, move objects accordingly
>> 
>> This is a classic Mark, Sweep, and Compact collector. Pass 1 only visits  
>> live objects, while passes
>> 2-4 linearly traverse the entire heap, live or dead), going forward one 
>> object at a time, and
>> skipping over the dead objects one by one.
> 
> I think the descriptions of steps (2) and (3) seriously stretch the 
> definition of "sweep". If we
> creatively redefine it to mean any (linear) heap walk, then most (all, except 
> for copying?)
> collectors should be marked as "sweep", which defies the purpose of the 
> definition.
> 
> This describes Lisp2-style collector, which is listed as (*drumroll*) 
> "Mark-Compact":
>  https://en.wikipedia.org/wiki/Mark-compact_algorithm#LISP2_algorithm
> 
> GC handbook mentions only "Mark-Compact" and "Mark-Sweep" styles, no 
> "Mark-Sweep-Compact".

Yup. those parts were written before Quick Release was commonly used to makes 
pure
Mark/Compact (no sweep) practical and profitable (over the other two variants).
I think that the G1GC evacuation stuff was also "too young" at the time of the 
writing to
consider the implications on terminology there, as G1GC-style collectors, and 
(I think Shenandoah
as well, since they share the same SATB marker logic), tend to keep separate 
live marks in
bits outside of object header for the same avoid-the-sweep reasons, and don't 
visit dead
objects when evacuating regions.

> The MM
> Glossary [https://www.memorymanagement.org/glossary/s.html#sweeping] says: 
> "Sweeping is the second
> phase (“the sweep phase”) of the mark-sweep algorithm. It performs a 
> sequential (address-order) pass
> over memory ***to recycle unmarked blocks.***" (emphasis mine)

Pass 2 above does exactly the "sequential (address-order) pass over memory to 
recycle unmarked blocks"
thing, and that makes it a classic sweep. It does a classic "find, visit, 
parse, and track all the dead stuff"
pass, and will use that information to recycle all of them (by compacting those 
holes). The way it tracks
the "free stuff" is just one of many possible ways to track the recycle-able 
holes identified in a sweep:
It 2 tracks them by reflecting the information about where the the dead holes 
are in the newly computed
address for the next live object it finds (the "how much to shift the next live 
object's address" amount
grows by the size of each dead object encountered and parsed to determine that 
size). If you wanted to,
you could directly deduce or reconstruct the list of dead hole addresses and 
sizes by walking the newly
computed object addresses backwards. It's just that the way this "list" is 
stored is optimized for the next
passes in this collector.

The way different sweepers track the recycle-able holes they find varies 
dramatically (lists, trees, bucketed
lists, buddy lists, some magic encoding in other data) but the work done to 
find them is fundamentally the
same in all sweepers: they linearly parse the heap, one object at a time, 
figuring out where the next object
starts by looking the at material within the object (dead or alive) being 
visited, and store (or embed) the
information they find about where the holes are into some data structure.

The fact that they visit and parse dead objects in order to identify 
recycle-able memory is what makes them
sweepers. Other techniques that don't visit dead objects do not perform a 
sweep. Those usually take the
apparoiach of "take all the live objects somewhere else, and the whole thing 
becomes one big free blob"
over the "sweep through the individual dead things and rack them for some 
recycling operation that fills
them in individually". Examples of those clearly-non-sweepers include copying 
collectors that evacuate and
fixup an 

Re: Confusion regarding 'mark-sweep-compact' naming

2019-08-14 Thread Gil Tene

On Aug 13, 2019, at 11:34 PM, Peter Veentjer 
mailto:pe...@hazelcast.com>> wrote:

Thanks for your answer Aleksey,

comments inline.

On Tuesday, August 6, 2019 at 12:41:28 PM UTC+3, Aleksey Shipilev wrote:
On 8/6/19 7:38 AM, Peter Veentjer wrote:
> There is still some confusion on my side. This time it is regarding the 
> algorithmic complexity of
> mark-sweep-compact.
>
> The complexity of the mark phase is linear to the live set.
> The complexity of the sweep phase is linear to the heap size.
> The complexity of the compact phase is linear to the live set.

Well...

The multiplicative factors in either case might make one linearity perform 
better than the other on
wildly different heap sizes and live sets. The adversarial example of this is N 
large byte arrays of
size M. Compact might have to copy N*M bytes (especially when doing sliding 
compaction), and sweep
might only need to visit N locations to check on object metadata.

There are second-order heap size effects for both mark and compact. Take two 
heaps with the same
live set, one very dense and one very sparse, and the mere fact you have to 
walk memory far away
adds up the secondary effect of heap size. It would not linger for compacting 
collectors, though, as
first few compaction cycles would collapse the heap to denser variant.

That is all to say that algorithmic complexity is a seriously bad way to reason 
about (GC) performance.

I understand. I created a bunch of flashcard to get a better conceptual 
understanding of the different gc algorithms;
I'll add some additional flashcard to refine my understanding.


> Why would you ever want to include the a sweep phase because the complexity 
> of mark-sweep-compact is
> now linear to the heap size; a lot worse compared to 'linear to the live 
> set'. Get rid of the sweep
> and complexity goes down to 'linear to the live set'. What is the added value 
> of such an inefficient
> GC implementation?

I think m-s-c nomenclature is confusing, as the actual implementations are 
doing either m-s or m-c
during their individual phases.

Exactly. And this was the source of my confusion and caused by an abuse of 
names.

There are very real and popular collectors that do Mark, Sweep, *and* Compact, 
e.g the OpenJDK oldgen collectors in both ParallelGC and SerialGC. There are 
very real and popular collectors that do only Mark and Sweep, e.g. the 
non-compacting oldgen of the CMS collector (as long as it does not fall back to 
the STW compacting fullGC). And there are very real and popular collectors that 
do only Mark and Compact (no sweep), e.g. C4, Pauseless, Compressor, and 
recently ZGC.

For an easy to follow code example of Mark, Sweep, and Compact, look at the 
(ahem...) psMarkSweep.c code for the compacting OpenJDK ParallelGC oldgen 
implementation, which does 4 separate passes (
http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/7576bbd5a03c/src/share/vm/gc_implementation/parallelScavenge/psMarkSweep.cpp#l201
 ):

1. A  tracing, pointer chasing pass that marks the live set.
Line 514: // Recursively traverse all live objects and mark them

2. A sweeping pass that computes the new object addresses
Line 579: // Now all live objects are marked, compute the new object 
addresses.
(new addresses are computed to ”shift everything to one side”,
 which, when actually done in pass 4, will compact the heap in place)

3. A sweeping pass to adjust references to point to the new object addresses:
Line 598: // Adjust the pointers to reflect the new locations
(this is sometimes referred to as a fixup pass)

4. A compact pass that moves the objects to their previously chosen target 
locations.
Line 645: // All pointers are now adjusted, move objects accordingly

This is a classic Mark, Sweep, and Compact collector. Pass 1 only visits  live 
objects, while passes 2-4 linearly traverse the entire heap, live or dead), 
going forward one object at a time, and skipping over the dead objects one by 
one. For at least the first sweeping pass, when you arrive at a dead object’s 
header, you still need to follow its type indicator to determine the object 
size in order to know how far to skip ahead to the next object. So you visit 
and follow pointer information in each and every object header in the heap, 
dead or alive.

In contrast, a pure mark-compact can be built to never do a “move from one 
object to another, including the dead ones” pass. E.g. an evacuating mark 
compact can use the mark pass to produce an efficient way to move from one live 
object to the next when evacuating and doing pointer fixup.

There are various schemes for doing this efficiently without ever visiting a 
dead object. E.g. In C4 we keep live marks in a bit vector with one bit per 64 
bit word of heap (we mark in this bit vector, not in the object headers). When 
it comes time to evacuate e.g. a 2MB region, we just scan through the 
associated 32KB of memory that contains the live bits for that 2MB region, 

Re: Confusion regarding 'mark-sweep-compact' naming

2019-08-14 Thread Peter Veentjer
Thanks for your answer Aleksey, 

comments inline.

On Tuesday, August 6, 2019 at 12:41:28 PM UTC+3, Aleksey Shipilev wrote:
>
> On 8/6/19 7:38 AM, Peter Veentjer wrote: 
> > There is still some confusion on my side. This time it is regarding the 
> algorithmic complexity of 
> > mark-sweep-compact. 
> > 
> > The complexity of the mark phase is linear to the live set. 
> > The complexity of the sweep phase is linear to the heap size. 
> > The complexity of the compact phase is linear to the live set. 
>
> Well... 
>
> The multiplicative factors in either case might make one linearity perform 
> better than the other on 
> wildly different heap sizes and live sets. The adversarial example of this 
> is N large byte arrays of 
> size M. Compact might have to copy N*M bytes (especially when doing 
> sliding compaction), and sweep 
> might only need to visit N locations to check on object metadata. 
>
> There are second-order heap size effects for both mark and compact. Take 
> two heaps with the same 
> live set, one very dense and one very sparse, and the mere fact you have 
> to walk memory far away 
> adds up the secondary effect of heap size. It would not linger for 
> compacting collectors, though, as 
> first few compaction cycles would collapse the heap to denser variant. 
>
> That is all to say that algorithmic complexity is a seriously bad way to 
> reason about (GC) performance. 
>

I understand. I created a bunch of flashcard to get a better conceptual 
understanding of the different gc algorithms;
I'll add some additional flashcard to refine my understanding.
 

>
> > Why would you ever want to include the a sweep phase because the 
> complexity of mark-sweep-compact is 
> > now linear to the heap size; a lot worse compared to 'linear to the live 
> set'. Get rid of the sweep 
> > and complexity goes down to 'linear to the live set'. What is the added 
> value of such an inefficient 
> > GC implementation? 
>
> I think m-s-c nomenclature is confusing, as the actual implementations are 
> doing either m-s or m-c 
> during their individual phases. 


Exactly. And this was the source of my confusion and caused by an abuse of 
names.

 

> The benefit of doing either is well covered. 
>
> The hard naming question is how to properly label the collector where 
> different phases employ 
> different algos. For example, in CMS the young collection is copying, old 
> collection is mark-sweep, 
> fall-back full GC is mark-compact. How would you label CMS then? 


It would be easier/simpler to label the young and old collector 
independently; but I understand your point.

 

> I believe marking it m-s-c is 
> borderline tolerable, until users get confused thinking both sweep and 
> compact happen in the same phase. 
>

> Also with implementations that manage backing storage in fine-grained 
> chunks (for example, free 
> lists), "sweep" might mean dealing with them on per-dead-object basis. For 
> example, this might mean 
>  preparing the storage for subsequent compaction by "sweeping" out old 
> objects before compaction 
> kicks in, which requires free space. Practical implementations I know opt 
> to instead reconstruct the 
> backing storage metadata after massive compaction moves, though. 
>
> My $0.02c, anyway. 
>
> -Aleksey 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
To view this discussion on the web, visit 
https://groups.google.com/d/msgid/mechanical-sympathy/2a29baf3-2027-4eb6-a7f1-4737397f9f0f%40googlegroups.com.


Re: Confusion regarding 'mark-sweep-compact' naming

2019-08-06 Thread Aleksey Shipilev
On 8/6/19 7:38 AM, Peter Veentjer wrote:
> There is still some confusion on my side. This time it is regarding the 
> algorithmic complexity of
> mark-sweep-compact.
> 
> The complexity of the mark phase is linear to the live set.
> The complexity of the sweep phase is linear to the heap size.
> The complexity of the compact phase is linear to the live set.

Well...

The multiplicative factors in either case might make one linearity perform 
better than the other on
wildly different heap sizes and live sets. The adversarial example of this is N 
large byte arrays of
size M. Compact might have to copy N*M bytes (especially when doing sliding 
compaction), and sweep
might only need to visit N locations to check on object metadata.

There are second-order heap size effects for both mark and compact. Take two 
heaps with the same
live set, one very dense and one very sparse, and the mere fact you have to 
walk memory far away
adds up the secondary effect of heap size. It would not linger for compacting 
collectors, though, as
first few compaction cycles would collapse the heap to denser variant.

That is all to say that algorithmic complexity is a seriously bad way to reason 
about (GC) performance.

> Why would you ever want to include the a sweep phase because the complexity 
> of mark-sweep-compact is
> now linear to the heap size; a lot worse compared to 'linear to the live 
> set'. Get rid of the sweep
> and complexity goes down to 'linear to the live set'. What is the added value 
> of such an inefficient
> GC implementation?

I think m-s-c nomenclature is confusing, as the actual implementations are 
doing either m-s or m-c
during their individual phases. The benefit of doing either is well covered.

The hard naming question is how to properly label the collector where different 
phases employ
different algos. For example, in CMS the young collection is copying, old 
collection is mark-sweep,
fall-back full GC is mark-compact. How would you label CMS then? I believe 
marking it m-s-c is
borderline tolerable, until users get confused thinking both sweep and compact 
happen in the same phase.

Also with implementations that manage backing storage in fine-grained chunks 
(for example, free
lists), "sweep" might mean dealing with them on per-dead-object basis. For 
example, this might mean
 preparing the storage for subsequent compaction by "sweeping" out old objects 
before compaction
kicks in, which requires free space. Practical implementations I know opt to 
instead reconstruct the
backing storage metadata after massive compaction moves, though.

My $0.02c, anyway.

-Aleksey

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
To view this discussion on the web, visit 
https://groups.google.com/d/msgid/mechanical-sympathy/e1755e01-58d6-3db4-bb65-ee7bc72b0ead%40gmail.com.


signature.asc
Description: OpenPGP digital signature


Re: Confusion regarding 'mark-sweep-compact' naming

2019-08-05 Thread Peter Veentjer
There is still some confusion on my side. This time it is regarding the 
algorithmic complexity of mark-sweep-compact.

The complexity of the mark phase is linear to the live set.
The complexity of the sweep phase is linear to the heap size.
The complexity of the compact phase is linear to the live set.

So the complexity of the mark-compact would be linear to the live set since 
you only deal with live objects.

Why would you ever want to include the a sweep phase because the complexity 
of mark-sweep-compact is now linear to the heap size; a lot worse compared 
to 'linear to the live set'. Get rid of the sweep and complexity goes down 
to 'linear to the live set'. What is the added value of such an inefficient 
GC implementation?

I'm sure there is value; but currently it doesn't compute.


On Monday, August 14, 2017 at 8:47:01 AM UTC+3, Peter Veentjer wrote:
>
> I have been improving my gc knowledge and there is some confusion on my 
> side regarding the 'mark-sweep-compact' algorithm I frequently see 
> mentioned in posts, articles and some not too formal books on the topic. 
> E.g.
>
>
> https://plumbr.eu/handbook/garbage-collection-algorithms/removing-unused-objects/compact
> https://www.infoq.com/news/2015/12/java-garbage-collection-minibook
>
> AFAIK mark-sweep-compact is not correct. It is either mark-sweep or 
> mark-compact where as part of the compacting process the dead objects are 
> removed. When I check more formal literature like "The Garbage Collection 
> Handbook: the Art of Automatic Memory Management", then there is no 
> mentioning of mark-sweep-compact.
>
> So I guess that when 'mark-sweep-compact' is mentioned, they are actually 
> referring to 'mark-compact'. Is this assumption correct?
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
To view this discussion on the web, visit 
https://groups.google.com/d/msgid/mechanical-sympathy/01237fa5-58ba-4aee-b44f-9fb8de4724c9%40googlegroups.com.


Re: Confusion regarding 'mark-sweep-compact' naming

2017-08-14 Thread Gil Tene


On Sunday, August 13, 2017 at 11:45:09 PM UTC-7, Kirk Pepperdine wrote:
>
> Hi, 
>
>
> > On Aug 14, 2017, at 7:47 AM, Peter Veentjer  > wrote: 
> > 
> > I have been improving my gc knowledge and there is some confusion on my 
> side regarding the 'mark-sweep-compact' algorithm I frequently see 
> mentioned in posts, articles and some not too formal books on the topic. 
> E.g. 
> > 
> > AFAIK mark-sweep-compact is not correct. It is either mark-sweep or 
> mark-compact where as part of the compacting process the dead objects are 
> removed. When I check more formal literature like "The Garbage Collection 
> Handbook: the Art of Automatic Memory Management", then there is no 
> mentioning of mark-sweep-compact. 
> > 
> > So I guess that when 'mark-sweep-compact' is mentioned, they are 
> actually referring to 'mark-compact'. Is this assumption correct? 
>
> The terminology around this is very loose so the very short answer is…. 
> sort of. So, a very terse but hopefully useful summary of collectors. 
>
> When we talk about mark-sweep we are generally referring to a class of 
> garbage collectors known as tracing collectors. The actual details of how 
> the mark and sweep for each of the different tracing collectors is 
> implemented depend upon a couple of factors. One of the most important 
> factors is how memory pool(s) (data structure in heap space) is organized. 
> I tend to label a collector as either “in-place” or “evacuation”. The most 
> trivial examples for both are; if you only have a single memory pool then 
> you will be performing an “in-place” collection as the data remains in the 
> that memory pool at the end of the collection. If you have two spaces you 
> can set this up as what is known has hemispheric collections. In this case 
> you mark the data in the currently active memory pool and then “evacuate” 
> it to the inactive pool. When complete active becomes inactive (thus empty) 
> and inactive becomes active. Wash, rinse, repeat…. 
>
> In-place collections will need a free-list and need to some how deal with 
> fragmentation (of the free-list). Thus they compact. When and how often is 
> an implementation detail. Evacuating collectors use a top of heap pointer 
> (and other more complex tricks) that act as a free list. Since they 
> evacuate to another pool, the data will naturally be compacted. Thus no 
> compaction will be necessary. As you can imagine, the time/space cost 
> models for these two general approaches are quite different. 
>
> In OpenJDK we have both generational and regional collectors. The 
> generational collector divides memory into 4 memory pools named Eden, 
> Survivor 0, Survivor 1, and Tenured space. The first three are grouped into 
> young and Tenured into old space. In young we can use mark-evacuation where 
> as in old we are forced to use mark-sweep-compact. I’m going to claim (I’m 
> sure Gil will want to chime in here) that G1, C4, Balance and Shenandoah 
> are regional collectors in that they divide heap into a large number of 
> memory pools (regions).


Agreed. These are all regional Mark-Compact evacuating collectors. But G1, 
Balanced and Shenandoah are olden-only regional collectors (or single 
generation in the current Shenandoah case), while C4 uses both a regional 
Mark-compact newgen collector and a regional Mark-Compact oldgen collector. 
As such, it gets all the nice-for-your-electron-budget benefits of 
generational collection along with all the nice benefits of regional 
collection.
 

> G1 aims for 2048 memory pools. In all 4 cases, the collectors are very 
> very different variants of mark-evacuation each with their own quirks. The 
> advantage of using regions is that you only need an evacuating collector as 
> you don’t end up with a terminal pool (tenured) where you are forced to use 
> an in-place collector. If you are careful in how you select regions to be 
> recovered, you can realize some nice wins over generational collectors. 
> However, these collectors tend to be far more complex and harder on your 
> electron budget than generational collectors are.


The harder-on-electron-budget thing is not algorithmically inherent to 
evacuating regional collectors. It usually has to do with other algorithmic 
choices that may be made in collectors that happen to be regional. E.g. 
some region-based incremental STW compactors (G1, Balanced) generally try 
to compact a small increment of the heap and fixup a hopefully-limited set 
of regions that refer to the compacted set, all in a single STW pause. 
These algorithms tend to rely on cross-region remembered-set tracking which 
involves some interesting electron budget implications, but more 
importantly, because the same regions will be involved in multiple fixup 
passes, the resulting N^2 complexity potential can also cause the electron 
budget to significantly inflate. Evacuating regional collector algorithms 
that do not rely on cross-region remembered sets, and 

Re: Confusion regarding 'mark-sweep-compact' naming

2017-08-14 Thread Gil Tene
I've taken to using Mark-Sweep-Compact, Mark-Sweep (no Compact), and 
Mark-Compact (no sweep) to describe the variants, because you will find all 
of them in the wild these days.

The confusion starts with the fact that some algorithms that people often 
refer to as "Mark-Sweep" are actually Mark-Sweep-Compact, while other are 
purely Mark-
Sweep (with no compaction). Mark-Sweep with no compaction usually means 
that free memory is recycled in place, with no object movement. A good 
example of this in practice is the CMS collector in the HotSpot JVM, which 
tracks unused memory in the oldgen in some form of free lists, but does not 
move any objects in the oldgen to compact it. Mark-Sweep *with* compaction 
(Mark-Sweep-Compact) obviously does compact. It usually implies the use of 
the knowledge established during a sweep (e.g. a list of the empty ranges) 
to compact the space "in place", It also usually involves a complete fixup 
of all references in all live objects. E.g. a simplistic description of a 
common Mark-Sweep-Compact technique would involves: (A) mark the live 
objects in the heap, (B) sweep the heap to find the empty ranges, (C) move 
all live objects to one side of the heap (filling in the empty spaces), (D) 
linearly scan all live objects and fix up any references in them to point 
to the correct target location. A good example of this in practice is the 
ParallelGC and SerialGC collectors in the HotSpot JVM.

Similarly, more confusion is caused by the fact that some algorithms that 
people often refer to as "Mark-Compact" are actually "Mark-Sweep-Compact", 
while other are purely Mark-Compact (no sweep). Again a good example of the 
first in common use would be ParallelGC in the HotSpot JVM (and the 
Wikipedia entry https://en.wikipedia.org/wiki/Mark-compact_algorithm). Good 
examples for the latter in common use can be found in various regional 
evacuating collectors (like C4, G1, etc.), where there is no sweep per-se 
(live object are evacuated to outside of the originally marked heap, and 
there is some mechanism for finding/tracking them to do so, but that 
mechanism is not the traditional sweep used by Mark-Sweep style of 
collectors). Fixup behavior in such algorithms can also vary quite a bit 
(e.g. C4 normally uses the next cycle's Mark to perform fixups for the 
previous Compact's relocated objects, and as such does not require a 
scanning fixup pass).

Another key difference often pointed to is that with in-place compaction 
(Mark-Sweep-Compact), no additional empty memory is required to guaranteed 
full compaction, while regional evacuating collectors need to evacuate into 
some empty memory, so their ability to compact may be limited by the amount 
of empty memory available outside of the currently occupied heap regions. 
However,. several well documented regional collector algorithms (e.g. 
Pauseless 
, C4 
, 
Compressor ) 
perform hand-over-hand compaction to guarantee forward progress in 
evacuation, such that a single empty region is sufficient to achieve full 
compaction in a single pass.

On Sunday, August 13, 2017 at 10:47:01 PM UTC-7, Peter Veentjer wrote:
>
> I have been improving my gc knowledge and there is some confusion on my 
> side regarding the 'mark-sweep-compact' algorithm I frequently see 
> mentioned in posts, articles and some not too formal books on the topic. 
> E.g.
>
>
> https://plumbr.eu/handbook/garbage-collection-algorithms/removing-unused-objects/compact
> https://www.infoq.com/news/2015/12/java-garbage-collection-minibook
>
> AFAIK mark-sweep-compact is not correct. It is either mark-sweep or 
> mark-compact where as part of the compacting process the dead objects are 
> removed. When I check more formal literature like "The Garbage Collection 
> Handbook: the Art of Automatic Memory Management", then there is no 
> mentioning of mark-sweep-compact.
>
> So I guess that when 'mark-sweep-compact' is mentioned, they are actually 
> referring to 'mark-compact'. Is this assumption correct?
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Confusion regarding 'mark-sweep-compact' naming

2017-08-14 Thread Kirk Pepperdine
Hi,


> On Aug 14, 2017, at 7:47 AM, Peter Veentjer  wrote:
> 
> I have been improving my gc knowledge and there is some confusion on my side 
> regarding the 'mark-sweep-compact' algorithm I frequently see mentioned in 
> posts, articles and some not too formal books on the topic. E.g.
> 
> AFAIK mark-sweep-compact is not correct. It is either mark-sweep or 
> mark-compact where as part of the compacting process the dead objects are 
> removed. When I check more formal literature like "The Garbage Collection 
> Handbook: the Art of Automatic Memory Management", then there is no 
> mentioning of mark-sweep-compact.
> 
> So I guess that when 'mark-sweep-compact' is mentioned, they are actually 
> referring to 'mark-compact'. Is this assumption correct?

The terminology around this is very loose so the very short answer is…. sort 
of. So, a very terse but hopefully useful summary of collectors.

When we talk about mark-sweep we are generally referring to a class of garbage 
collectors known as tracing collectors. The actual details of how the mark and 
sweep for each of the different tracing collectors is implemented depend upon a 
couple of factors. One of the most important factors is how memory pool(s) 
(data structure in heap space) is organized. I tend to label a collector as 
either “in-place” or “evacuation”. The most trivial examples for both are; if 
you only have a single memory pool then you will be performing an “in-place” 
collection as the data remains in the that memory pool at the end of the 
collection. If you have two spaces you can set this up as what is known has 
hemispheric collections. In this case you mark the data in the currently active 
memory pool and then “evacuate” it to the inactive pool. When complete active 
becomes inactive (thus empty) and inactive becomes active. Wash, rinse, repeat….

In-place collections will need a free-list and need to some how deal with 
fragmentation (of the free-list). Thus they compact. When and how often is an 
implementation detail. Evacuating collectors use a top of heap pointer (and 
other more complex tricks) that act as a free list. Since they evacuate to 
another pool, the data will naturally be compacted. Thus no compaction will be 
necessary. As you can imagine, the time/space cost models for these two general 
approaches are quite different.

In OpenJDK we have both generational and regional collectors. The generational 
collector divides memory into 4 memory pools named Eden, Survivor 0, Survivor 
1, and Tenured space. The first three are grouped into young and Tenured into 
old space. In young we can use mark-evacuation where as in old we are forced to 
use mark-sweep-compact. I’m going to claim (I’m sure Gil will want to chime in 
here) that G1, C4, Balance and Shenandoah are regional collectors in that they 
divide heap into a large number of memory pools (regions). G1 aims for 2048 
memory pools. In all 4 cases, the collectors are very very different variants 
of mark-evacuation each with their own quirks. The advantage of using regions 
is that you only need an evacuating collector as you don’t end up with a 
terminal pool (tenured) where you are forced to use an in-place collector. If 
you are careful in how you select regions to be recovered, you can realize some 
nice wins over generational collectors. However, these collectors tend to be 
far more complex and harder on your electron budget than generational 
collectors are.

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Confusion regarding 'mark-sweep-compact' naming

2017-08-13 Thread Peter Veentjer
I have been improving my gc knowledge and there is some confusion on my 
side regarding the 'mark-sweep-compact' algorithm I frequently see 
mentioned in posts, articles and some not too formal books on the topic. 
E.g.

https://plumbr.eu/handbook/garbage-collection-algorithms/removing-unused-objects/compact
https://www.infoq.com/news/2015/12/java-garbage-collection-minibook

AFAIK mark-sweep-compact is not correct. It is either mark-sweep or 
mark-compact where as part of the compacting process the dead objects are 
removed. When I check more formal literature like "The Garbage Collection 
Handbook: the Art of Automatic Memory Management", then there is no 
mentioning of mark-sweep-compact.

So I guess that when 'mark-sweep-compact' is mentioned, they are actually 
referring to 'mark-compact'. Is this assumption correct?

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.