Re: Confusion regarding 'mark-sweep-compact' naming

2019-08-19 Thread Steve Blackburn
What I said above was not as sharp as it should have been.

I see a crisp distinction between 'evacuation' and 'compaction'.

- Evacuation moves (evacuates) survivors into a pre-existing reserve 
(however small) of heap space.
- Compaction requires no such reserve; survivors are only moved into space 
freed in the process of performing the compaction.

Evacuation may be done at a very fine grain, requiring absolutely minimal 
reserves, or in the other extreme, it may require reserving half the heap 
(semi-space).   Still, I would say 'evacuation' so long as some free 
reserve (however small) is required in order for the algorithm to proceed. 
  Thus while "in-place compaction" is a good description, I'd say it is a 
tautology.

I know this crisp distinction has not been universally observed in the 
literature, but I think it is helpful because it is absolutely crisp, and 
cuts to the very heart of the underlying algorithm.

We use the term 'defragmentation' to describe a fine-grained evacuation 
strategy (for example, one that moves single objects into small holes in 
the heap).   Still, we'd call this an evacuating collector, not a 
compacting collector.   We thought long and hard about this at the time we 
did our work, and settled on the above for the above reasons.

So I'd ask the question: does this algorithm require a reserve of heap 
space (even a small one) before it can proceed.   If so, it's evacuation.

One more thing: obviously we can and do routinely build collectors that are 
hybrids, so there's absolutely no reason why a collector could not do both.

--Steve

On Saturday, 17 August 2019 03:08:47 UTC+10, Gil Tene wrote:
>
>
>
> Sent from my iPad
>
> On Aug 16, 2019, at 2:53 AM, Steve Blackburn  > wrote:
>
> For what it is worth, Kathryn and I have previously clarified some of this 
> nomenclature (see our Immix paper 
> <http://users.cecs.anu.edu.au/~steveb/pubs/papers/immix-pldi-2008.pdf>). 
>
> Marking: identifying liveness via a trace
> Sweeping: identifying free space while leaving live objects in place 
> (classic mark-sweep and immix do this)
> Evacuating: freeing space by moving survivors to another place
> Compacting:  freeing space moving survivors within the same place.
>
>
> :-)
>
> Now we have the “is it Compacting or Evacuating?” question...
>
> Because of Quick Release (C4 paper [1], section 2.5, and originally 
> described in 2005 in the Pauseless paper [2], section 4.2) C4 [1] , 
> Pauseless [2], Collie [3], Compressor [4], and ZGC [reference needed] are 
> all examples of collectors created in the past ~15 years that can be 
> classified as either/both Compacting or Evacuating under the nomenclature 
> above.
>
> At first glance, one might be tempted to say that they are simply 
> Evacuating collectors: from the point of view of each region, they clearly 
> appear to free space by moving survivors to another place (to outside of 
> the region), rather then moving them within the same place (the same 
> region).
>
> But from an actual memory space perspective (the actual memory storage 
> used by any set of regions being compacted) they are clearly Compacting, 
> since with the hand-over-hand compaction enabled by Quick Release, 
> surviving objects are clearly moved within the same “place”: the set of 
> actual storage being compacted in a given relocate phase, where the 
> survivors are before the moving started. These collectors exhibit a 
> tell-tale quality of “in place” compaction: they move survivors directly on 
> top where other survivors were just before they themselves were moved, all 
> within the same pass.
>
> In virtual memory, these collectors have a clear separation between the 
> two places that are the “from space” and the “to space” for a given 
> relocation pass. Each is a distinct set of virtual memory ranges, and they 
> do not overlap.
>
> But in actual storage, that is not the case in any of these collectors, as 
> the “from space” and “to space” for a given relocation pass overlap with 
> each other in physical space.
>
> I obviously consider C4 to be a Compacting collector (it’s right there in 
> the name: “The Continuously Concurrent Compacting Collector”), and would 
> therefore classify Pauseless, Collie, Compressor, and ZGC as Compacting as 
> well.
>
> But I would also consider them all to be Evacuating regional collectors. 
> Because that behavior (in logical/virtual memory, which is where our 
> thoughts focus most of the time) clearly fits too. Region evacuation is 
> core to Pauseless and C4, and the C4 name was actually a nod at how it 
> frees memory by “blowing up” individual regions and moving all live objects 
> “elsewhere”.
>
> However, I wouldn’t call C4

Re: Confusion regarding 'mark-sweep-compact' naming

2019-08-19 Thread Steve Blackburn
It seems my last reply got lost, so I'll try again...

For me the distinction between evacuating and compacting is crisp: while 
both move objects, evacuation requires a reserve, compaction does not.

A compacting collector can use all available space because it moves objects 
into space regained in the process of its collection.

By contrast, an evacuating collector must put aside a reserve (however 
small).   So semispace is the poster-child evacuating collector.  More 
subtly, I describe immix's defragmentation as evacuation even though it 
does not need an explicit reserve---it moves objects into 'holes' generated 
by a previous collection.   This is an implicit reserve---space un-utilized 
by the mutator, not space scavenged by the same instance of the collection 
that performs the object movement.

This distinction between requiring a reserve and not requiring a reserve is 
algorithmically fundamental, which makes it a useful distinction in the 
nomenclature of collectors.

As an aside, sweeping need not visit dead objects.   For example, in a 
classic segregated fits free list, a sweep will identify entirely free 
blocks without visiting any of the constituents.   Likewise we describe 
immix as 'sweep to region', where 'holes' are identified, but the objects 
within it are not visited.   I'd describe sweeping as freeing memory by 
traversing the entire collected region (or metadata corresponding to that 
entire region), whether or not visiting objects within that region.

--Steve


On Monday, 19 August 2019 05:34:59 UTC+10, Gil Tene wrote:
>
>
>
> On Friday, August 16, 2019 at 1:30:10 AM UTC-7, Aleksey Shipilev wrote:
>>
>> On 8/16/19 10:07 AM, Gil Tene wrote: 
>> > Classification terms evolve as the art evolves. 
>>
>> What I want to emphasize here that this discussion reinvents the meaning 
>> of "sweep". That definition 
>> is not used in the way you describe in the sources I know of. Granted, 
>> definitions drift over time, 
>> but we have to be careful to separate what is the "agreed on" definition, 
>> and what is whatever 
>> definition we want to be the emergent one. 
>>
>> > On Thursday, August 15, 2019 at 1:42:23 AM UTC-7, Aleksey Shipilev 
>> wrote: 
>> > I am trying to understand what is your central argument here. This 
>> seems to be it. Are you saying 
>> > that "sweeping" is when you visit dead objects, and non-"sweeping" 
>> is when you do not? 
>> > 
>> > Yes. I think that's a very good summary. 
>>
>> Would be helpful if we started from this next time around! 
>>
>
> True. Maybe this discussion will help others start from there next time. 
> It's certainly helped me hone
> on that specific summary.
>  
>
>>
>> > Is walking _dead_ objects the discriminator for "sweeping" then? So 
>> in your book, if we take the 
>> > same Lisp2 collector, and compute-new-addresses and adjust-pointers 
>> steps are walking the 
>> > self-parsable heap (visiting dead objects along the way), it is 
>> M-S-C? But if it uses marking 
>> > bitmap 
>> > (thus only visiting live objects), it becomes M-C? [This would be a 
>> weird flex, but okay]. 
>> > 
>> > 
>> > Exactly. "In my book" adding an efficient livemark bit vector with 
>> possible summary layers would 
>> > covert the classic LISP2 GC from a Mark-Sweep-Compact to a Mark-Compact 
>> (with no sweep). 
>>
>> So this is what irks me here. In my Epsilon's mark-compact toy collector 
>> (https://shipilev.net/jvm/diy-gc/), changing from mark bitmaps to 
>> self-parsable heap traversal is 
>> about 5 lines of code. This does not feel like passing the bar for 
>> reclassification of the collector 
>> algo between major classes.
>
>
> Self-parseable heaps are expensive to traverse, and that expense grows 
> with the size non-live
> part of the heap. Being sensitive to live-occupancy % (or not) is a 
> fundamental quality in
> a collector.
>
> If a "5 lines of code" change allows your collector to no longer be 
> sensitive to heap size, and
> stay sensitive only to the live set, it is well worth it.  And it also 
> changes its classification.
>
> E.g. sometimes changing from a linked list to an array list touches only 
> one line of code, but
> that change can dramatically alter the classification from a complexity 
> and fit-for-use point
> of view.
>  
>
>> It feels more like the implementation detail that can swing both ways, 
>> depending on circumstances. 
>>
>
> ParalellGC probably wouldn't call the file that implements it's
> Mark-Compact (with sweep) algorithm "MarkSweep.cpp", if the authors
> of that code thought of their implementation as a Mark Compact that
> does not include a sweep, or if they thouight of the sweeping as a
> small implementation detail.
>
> If it were just an implementation detail, there wouldn't be scores of 
> academic
> papers that seems to classify the "Mark Compact" (the historic kind that 
> visits all
> dead objects, so implies sweeping) as inherently less CPU-efficient, and 
> back
> tha

Re: Confusion regarding 'mark-sweep-compact' naming

2019-08-16 Thread Steve Blackburn
For what it is worth, Kathryn and I have previously clarified some of this 
nomenclature (see our Immix paper 
).

Marking: identifying liveness via a trace
Sweeping: identifying free space while leaving live objects in place 
(classic mark-sweep and immix do this)
Evacuating: freeing space by moving survivors to another place
Compacting:  freeing space moving survivors within the same place.

We use the term 'defragmentation' to refer to the use of evacuation to free 
up space in a regional collector.

--Steve

On Friday, 16 August 2019 18:30:10 UTC+10, Aleksey Shipilev wrote:
>
> On 8/16/19 10:07 AM, Gil Tene wrote: 
> > Classification terms evolve as the art evolves. 
>
> What I want to emphasize here that this discussion reinvents the meaning 
> of "sweep". That definition 
> is not used in the way you describe in the sources I know of. Granted, 
> definitions drift over time, 
> but we have to be careful to separate what is the "agreed on" definition, 
> and what is whatever 
> definition we want to be the emergent one. 
>
> > On Thursday, August 15, 2019 at 1:42:23 AM UTC-7, Aleksey Shipilev 
> wrote: 
> > I am trying to understand what is your central argument here. This 
> seems to be it. Are you saying 
> > that "sweeping" is when you visit dead objects, and non-"sweeping" 
> is when you do not? 
> > 
> > Yes. I think that's a very good summary. 
>
> Would be helpful if we started from this next time around! 
>
> > Is walking _dead_ objects the discriminator for "sweeping" then? So 
> in your book, if we take the 
> > same Lisp2 collector, and compute-new-addresses and adjust-pointers 
> steps are walking the 
> > self-parsable heap (visiting dead objects along the way), it is 
> M-S-C? But if it uses marking 
> > bitmap 
> > (thus only visiting live objects), it becomes M-C? [This would be a 
> weird flex, but okay]. 
> > 
> > 
> > Exactly. "In my book" adding an efficient livemark bit vector with 
> possible summary layers would 
> > covert the classic LISP2 GC from a Mark-Sweep-Compact to a Mark-Compact 
> (with no sweep). 
>
> So this is what irks me here. In my Epsilon's mark-compact toy collector 
> (https://shipilev.net/jvm/diy-gc/), changing from mark bitmaps to 
> self-parsable heap traversal is 
> about 5 lines of code. This does not feel like passing the bar for 
> reclassification of the collector 
> algo between major classes. It feels more like the implementation detail 
> that can swing both ways, 
> depending on circumstances. 
>
> -Aleksey 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
To view this discussion on the web, visit 
https://groups.google.com/d/msgid/mechanical-sympathy/869e1ac0-7edd-43ca-afe1-973b0410ff76%40googlegroups.com.