Apologies, Matthew, if you know this all already.  This is really more for Paulo's benefit, but your wish list provided a convenient framework.


On 9/13/2019 9:28 AM, Matthew Flatt wrote:
At Thu, 12 Sep 2019 19:20:08 -0400, Philip McGrath wrote:
> "Flonum unboxing: accept mismatch, for now" [...] I could imagine
> that this might not be what you're looking for, since it seems like
> it would be much more about creating garbage than collecting it per
> se

Yes, flonum unboxing entirely a compiler project.


Some things I'd like to have from the GC, but these are large projects:

  * GC implemented/generated in Scheme.

    A common problem with adding new things to the GC (including
    scalable object locking, ordered finalization, and vfasl saving) is
    repeating/copying the code that traverses objects. Each copy is a
    little different, and it's difficult to abstract over the variation
    and get good performance, especially in C. Implementing the GC
    itself in Scheme may be too tricky to be worthwhile, but generating
    a C implementation in Scheme might be the way to go. The pointer and
    object layouts in "cmacro.ss" is already a lot of the needed
    information.

Working in C  (or similar languages) it is best not to abstract very much wrt pointer identification.  Each different object type should have a custom pointer finding function generated by the compiler directly from the object's layout.   You then somehow associate the function (or functions, see below) with the object - using BiBOP page metadata, or an additional pointer in the object header, etc. Stack frames (assuming the language has a notion of a contiguous stack) can be handled in very much the the same manner as heap objects.

For a non-moving collector, the pointer finding function can simply queue all the pointers for later examination and then mark the object as scanned.  In the case of a union, you would queue the "potential" pointer and figure out later whether it really points to a heap object (mistakes due to conservative pointer finding are fairly rare if true heap allocations can be positively identified).

For a moving collector, it's a bit more complex - the pointer finding function needs to be able to enumerate pointers one-by-one, and you also need a corresponding pointer setting function.

But these functions tend to be rather simple to generate - the majority of objects are not  very complex, and in any case it is all done mechanically from the object's layout.

For generational (or parallel, below) collection, you'll need to implement moving anyway.


  * Support for non-moving objects that are collected when unreferenced
    but not moved while referenced.

Best handled with one or more separate logical heaps.  Relatively easy in a BiBOP system if you base page residence on object size rather than on type.   Quite a bit more complicated if you demand heaps be contiguous.


    Racket CS implements non-moving objects by locking objects and using
    a finalizer to unlock. Although I made changes to Chez Scheme so
    that object locking is more scalable, it would be nicer and likely
    perform better to have an allocation arena that is GCed but not
    compacted/moved.

Ouch !   "Pinning" objects in an moving system is a PITA ... it doesn't matter whether you are compacting or (semi-heap) copying. It would be better if "pinning" an object relocated it to the non-moving heap.


  * Parallel GC

Generally involves a complete rethink of the heap structure. Parallel collection is one thing that actually IS made easier by contiguous (semi) heaps.   In a BiBOP system a "heap" is a logical construct of (maybe) discontiguous pages, and the extra chasing around makes collecting in parallel more complicated.

Note that the mutator(s) typically are stopped in a parallel collection (but see below).


  * Incremental GC

Depends on what you mean by "incremental".

One possible "incremental" system would use the same N (semi) heaps as the parallel collector, but rotates the pair of heaps used for current / FROM and TO spaces.  The mutator is stopped just long enough to collect the filled FROM space, emptying it completely so it can become a TO-space target for the next collection.   [I believe the GC handbook calls this "train" allocation.]

Another would leverage the VMM system, collecting page by page as FROM-space pages are touched.  This can be done moving or non-moving, but it is complicated by the need to positively identify object boundaries ... because the identity of the object being written to has been lost when the VMM-assisted collector gets control ... and by objects that span multiple VMM pages.


Collecting object by object while the mutator continues requires much more complexity and generally requires more space - regardless whether GC actually is interleaved with or done partly in parallel with the mutator.  It requires for each object, one of:

   A) an extra metadata pointer in the header  [kiss of death for small objects: boxes, pairs, flonums, etc.], or
   B) a minimum allocation size at least equal to that of a cons pair.

Either case requires access barriers on both read and write that can recognize that an object has been forwarded and then perform the operation on the TO-space copy instead of the original in FROM-space.  The barrier may be required to copy other objects because FROM-space originals generally can't be modified after having been copied [at least not without introducing a lot of extra complexity].


If you put a reasonable limit on the size of thread (Racket place) private heaps, and limit the number of generations in the private heaps to just 1 or 2, then you can get good performance with a simpler Stop-the-Thread collection - done asynchronously whenever the thread pleases - and leave more complex techniques like incremental collection to the shared heap, which is likely to be large (maybe last generation sized) and could be collected in parallel with the mutator(s), often at a slower rate.


One thing to remember about GC is that useful real-world systems tend to be hybrids that make use multiple allocation/collection techniques for different situations.  Trying to use the same technique everywhere might make for a good research paper, but it is unlikely to make for a highly performant runtime system.  No one technique is optimal everywhere (if it even is optimal anywhere), so some amount of tailoring to the specifics generally will be better.

George

--
You received this message because you are subscribed to the Google Groups "Racket 
Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-dev/1f4bc89d-d79f-aeec-6dc4-22bcfc70379a%40comcast.net.

Reply via email to