Thanks for your answer Aleksey, comments inline.
On Tuesday, August 6, 2019 at 12:41:28 PM UTC+3, Aleksey Shipilev wrote: > > On 8/6/19 7:38 AM, Peter Veentjer wrote: > > There is still some confusion on my side. This time it is regarding the > algorithmic complexity of > > mark-sweep-compact. > > > > The complexity of the mark phase is linear to the live set. > > The complexity of the sweep phase is linear to the heap size. > > The complexity of the compact phase is linear to the live set. > > Well... > > The multiplicative factors in either case might make one linearity perform > better than the other on > wildly different heap sizes and live sets. The adversarial example of this > is N large byte arrays of > size M. Compact might have to copy N*M bytes (especially when doing > sliding compaction), and sweep > might only need to visit N locations to check on object metadata. > > There are second-order heap size effects for both mark and compact. Take > two heaps with the same > live set, one very dense and one very sparse, and the mere fact you have > to walk memory far away > adds up the secondary effect of heap size. It would not linger for > compacting collectors, though, as > first few compaction cycles would collapse the heap to denser variant. > > That is all to say that algorithmic complexity is a seriously bad way to > reason about (GC) performance. > I understand. I created a bunch of flashcard to get a better conceptual understanding of the different gc algorithms; I'll add some additional flashcard to refine my understanding. > > > Why would you ever want to include the a sweep phase because the > complexity of mark-sweep-compact is > > now linear to the heap size; a lot worse compared to 'linear to the live > set'. Get rid of the sweep > > and complexity goes down to 'linear to the live set'. What is the added > value of such an inefficient > > GC implementation? > > I think m-s-c nomenclature is confusing, as the actual implementations are > doing either m-s or m-c > during their individual phases. Exactly. And this was the source of my confusion and caused by an abuse of names. > The benefit of doing either is well covered. > > The hard naming question is how to properly label the collector where > different phases employ > different algos. For example, in CMS the young collection is copying, old > collection is mark-sweep, > fall-back full GC is mark-compact. How would you label CMS then? It would be easier/simpler to label the young and old collector independently; but I understand your point. > I believe marking it m-s-c is > borderline tolerable, until users get confused thinking both sweep and > compact happen in the same phase. > > Also with implementations that manage backing storage in fine-grained > chunks (for example, free > lists), "sweep" might mean dealing with them on per-dead-object basis. For > example, this might mean > preparing the storage for subsequent compaction by "sweeping" out old > objects before compaction > kicks in, which requires free space. Practical implementations I know opt > to instead reconstruct the > backing storage metadata after massive compaction moves, though. > > My $0.02c, anyway. > > -Aleksey > > -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. To view this discussion on the web, visit https://groups.google.com/d/msgid/mechanical-sympathy/2a29baf3-2027-4eb6-a7f1-4737397f9f0f%40googlegroups.com.