Re: [racket-dev] Request for RacketCS GC fun

George Neuner Fri, 13 Sep 2019 19:55:55 -0700

Apologies, Matthew, if you know this all already. This is really morefor Paulo's benefit, but your wish list provided a convenient framework.



On 9/13/2019 9:28 AM, Matthew Flatt wrote:

At Thu, 12 Sep 2019 19:20:08 -0400, Philip McGrath wrote:
> "Flonum unboxing: accept mismatch, for now" [...] I could imagine
> that this might not be what you're looking for, since it seems like
> it would be much more about creating garbage than collecting it per
> se

Yes, flonum unboxing entirely a compiler project.


Some things I'd like to have from the GC, but these are large projects:

  * GC implemented/generated in Scheme.

    A common problem with adding new things to the GC (including
    scalable object locking, ordered finalization, and vfasl saving) is
    repeating/copying the code that traverses objects. Each copy is a
    little different, and it's difficult to abstract over the variation
    and get good performance, especially in C. Implementing the GC
    itself in Scheme may be too tricky to be worthwhile, but generating
    a C implementation in Scheme might be the way to go. The pointer and
    object layouts in "cmacro.ss" is already a lot of the needed
    information.

Working in C (or similar languages) it is best not to abstract verymuch wrt pointer identification. Each different object type should havea custom pointer finding function generated by the compiler directlyfrom the object's layout. You then somehow associate the function (orfunctions, see below) with the object - using BiBOP page metadata, or anadditional pointer in the object header, etc. Stack frames (assuming thelanguage has a notion of a contiguous stack) can be handled in very muchthe the same manner as heap objects.

For a non-moving collector, the pointer finding function can simplyqueue all the pointers for later examination and then mark the object asscanned. In the case of a union, you would queue the "potential"pointer and figure out later whether it really points to a heap object(mistakes due to conservative pointer finding are fairly rare if trueheap allocations can be positively identified).

For a moving collector, it's a bit more complex - the pointer findingfunction needs to be able to enumerate pointers one-by-one, and you alsoneed a corresponding pointer setting function.

But these functions tend to be rather simple to generate - the majorityof objects are not very complex, and in any case it is all donemechanically from the object's layout.

For generational (or parallel, below) collection, you'll need toimplement moving anyway.

  * Support for non-moving objects that are collected when unreferenced
    but not moved while referenced.

Best handled with one or more separate logical heaps. Relatively easyin a BiBOP system if you base page residence on object size rather thanon type. Quite a bit more complicated if you demand heaps be contiguous.

    Racket CS implements non-moving objects by locking objects and using
    a finalizer to unlock. Although I made changes to Chez Scheme so
    that object locking is more scalable, it would be nicer and likely
    perform better to have an allocation arena that is GCed but not
    compacted/moved.

Ouch ! "Pinning" objects in an moving system is a PITA ... it doesn'tmatter whether you are compacting or (semi-heap) copying. It would bebetter if "pinning" an object relocated it to the non-moving heap.

  * Parallel GC

Generally involves a complete rethink of the heap structure. Parallelcollection is one thing that actually IS made easier by contiguous(semi) heaps. In a BiBOP system a "heap" is a logical construct of(maybe) discontiguous pages, and the extra chasing around makescollecting in parallel more complicated.

Note that the mutator(s) typically are stopped in a parallel collection(but see below).

  * Incremental GC


Depends on what you mean by "incremental".

One possible "incremental" system would use the same N (semi) heaps asthe parallel collector, but rotates the pair of heaps used for current /FROM and TO spaces. The mutator is stopped just long enough to collectthe filled FROM space, emptying it completely so it can become aTO-space target for the next collection. [I believe the GC handbookcalls this "train" allocation.]

Another would leverage the VMM system, collecting page by page asFROM-space pages are touched. This can be done moving or non-moving,but it is complicated by the need to positively identify objectboundaries ... because the identity of the object being written to hasbeen lost when the VMM-assisted collector gets control ... and byobjects that span multiple VMM pages.

Collecting object by object while the mutator continues requires muchmore complexity and generally requires more space - regardless whetherGC actually is interleaved with or done partly in parallel with themutator. It requires for each object, one of:

A) an extra metadata pointer in the header [kiss of death for smallobjects: boxes, pairs, flonums, etc.], or

   B) a minimum allocation size at least equal to that of a cons pair.

Either case requires access barriers on both read and write that canrecognize that an object has been forwarded and then perform theoperation on the TO-space copy instead of the original in FROM-space. The barrier may be required to copy other objects because FROM-spaceoriginals generally can't be modified after having been copied [at leastnot without introducing a lot of extra complexity].

If you put a reasonable limit on the size of thread (Racket place)private heaps, and limit the number of generations in the private heapsto just 1 or 2, then you can get good performance with a simplerStop-the-Thread collection - done asynchronously whenever the threadpleases - and leave more complex techniques like incremental collectionto the shared heap, which is likely to be large (maybe last generationsized) and could be collected in parallel with the mutator(s), often ata slower rate.

One thing to remember about GC is that useful real-world systems tend tobe hybrids that make use multiple allocation/collection techniques fordifferent situations. Trying to use the same technique everywhere mightmake for a good research paper, but it is unlikely to make for a highlyperformant runtime system. No one technique is optimal everywhere (ifit even is optimal anywhere), so some amount of tailoring to thespecifics generally will be better.


George

--
You received this message because you are subscribed to the Google Groups "Racket 
Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-dev/1f4bc89d-d79f-aeec-6dc4-22bcfc70379a%40comcast.net.

Re: [racket-dev] Request for RacketCS GC fun

Reply via email to