Re: Confusion regarding 'mark-sweep-compact' naming

Aleksey Shipilev Tue, 06 Aug 2019 02:41:28 -0700

On 8/6/19 7:38 AM, Peter Veentjer wrote:
> There is still some confusion on my side. This time it is regarding the 
> algorithmic complexity of
> mark-sweep-compact.
> 
> The complexity of the mark phase is linear to the live set.
> The complexity of the sweep phase is linear to the heap size.
> The complexity of the compact phase is linear to the live set.


Well...

The multiplicative factors in either case might make one linearity perform 
better than the other on
wildly different heap sizes and live sets. The adversarial example of this is N 
large byte arrays of
size M. Compact might have to copy N*M bytes (especially when doing sliding 
compaction), and sweep
might only need to visit N locations to check on object metadata.

There are second-order heap size effects for both mark and compact. Take two 
heaps with the same
live set, one very dense and one very sparse, and the mere fact you have to 
walk memory far away
adds up the secondary effect of heap size. It would not linger for compacting 
collectors, though, as
first few compaction cycles would collapse the heap to denser variant.

That is all to say that algorithmic complexity is a seriously bad way to reason 
about (GC) performance.

> Why would you ever want to include the a sweep phase because the complexity 
> of mark-sweep-compact is
> now linear to the heap size; a lot worse compared to 'linear to the live 
> set'. Get rid of the sweep
> and complexity goes down to 'linear to the live set'. What is the added value 
> of such an inefficient
> GC implementation?

I think m-s-c nomenclature is confusing, as the actual implementations are 
doing either m-s or m-c
during their individual phases. The benefit of doing either is well covered.

The hard naming question is how to properly label the collector where different 
phases employ
different algos. For example, in CMS the young collection is copying, old 
collection is mark-sweep,
fall-back full GC is mark-compact. How would you label CMS then? I believe 
marking it m-s-c is
borderline tolerable, until users get confused thinking both sweep and compact 
happen in the same phase.

Also with implementations that manage backing storage in fine-grained chunks 
(for example, free
lists), "sweep" might mean dealing with them on per-dead-object basis. For 
example, this might mean
 preparing the storage for subsequent compaction by "sweeping" out old objects 
before compaction
kicks in, which requires free space. Practical implementations I know opt to 
instead reconstruct the
backing storage metadata after massive compaction moves, though.

My $0.02c, anyway.

-Aleksey

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
To view this discussion on the web, visit 
https://groups.google.com/d/msgid/mechanical-sympathy/e1755e01-58d6-3db4-bb65-ee7bc72b0ead%40gmail.com.

signature.asc
Description: OpenPGP digital signature

Re: Confusion regarding 'mark-sweep-compact' naming

Reply via email to