Re: [bitc-dev] GC tradeoffs and meeting user-centered design targets

Jonathan S. Shapiro Fri, 02 Aug 2013 12:37:15 -0700

On Fri, Aug 2, 2013 at 11:49 AM, David Jeske <[email protected]> wrote:

> On Fri, Aug 2, 2013 at 11:12 AM, Jonathan S. Shapiro <[email protected]>wrote:
>
>> It seems that I often have large, durable DAGs with internal pointers.
>> Often, there is some "master" pointer (or small set of such pointers) that
>> keeps the structure alive.
>>
>
> I do typically see this pattern, but not for frozen DAGs. They change over
> time (sometimes a little, sometimes a lot), which means they need
> collection themselves or they could consume all memory with non-referenced
> elements.
>
> Once collections of these regions is allowed while sub-regions exist, the
> "borrowed" pointer no longer works, because holding the master-pointer no
> longer implies liveliness for dynamic data within the region.
>

It's the other way around. In a slowly changing DAG, you wouldn't have been
able to use borrowed pointers; they would fail to type check, because they
aren't actually borrowed.

For example, consider borrowed internal references between large value
> types, made safe by master-pointers...
>

That's essentially the pattern I was outlining. But the if the lifetimes
don't have a partial order, you can't type check them.

> If I make a large array-of-structs, any of the structs can point to any
> other struct in the same array using borrowed pointers without GC tracing,
> because if the whole array becomes unreachable, so do all of the elements
> of the array. This could not only dramatically reduce the number of GC
> traced pointers, but would also eliminate indirection and bounds checking
> costs because the borrowed pointers were validated when they were created.
>

Yes. This is *exactly* the case in the Coyotos/EROS kernels. Though
strictly speaking we have four or five master pointer [globals] that point
into a self-connected region containing four or five vectors. But I don't
think the presence of four or five masters makes the type analysis problem
any harder.

> However, the utility of this alone is a bit suspect, as our array is fixed
> size. When we run out of entries, there is no means to get more, other than
> pausing to copy/rebuild the entire array into a new larger array. It would
> be nice if we could at least make a second array, and mutually link the
> lifetimes of the arrays.
>

There are three sub-cases here:

1. The array is not *statically* fixed size, but a durable fixed size can
be selected at application startup. For that case, the idea still works.
2. We truly need to be able to grow the space dynamically, but it will
never shrink. Actually, I think *that* case also works. *Adding* things to
a region doesn't reduce the liveness of the region.
3. At some point we want to shrink the region. *That* case is a pain in the
ass, and GC doesn't help. The issue is that we've effectively been building
a hand-managed heap of some [set of] type[s] T, using vectors of T[] as our
malloc regions. When we want to shrink, we don't get any benefit unless we
shrink a *region*, and that implies coalescing all of the surviving objects
into the surviving region vectors. GC won't help us do that.

> So now we have two value-type arrays-of-structs, and we join them by
> having their boxes hold pointers to each-other.
>

No. We have two vectors/arrays. Whether they are value types is not
important to the proposition at hand.

> Now borrowed pointers in either value-type-region can point to the other.
> None of them need to be GC traced, because the master-pointers guarantee
> liveliness of the other array. Further, both arrays don't need to be of the
> same type.
>

Yes.

> This seems to have some advantages over general garbage-collected
> heap-allocation, because we are able to replicate C's ability to do
> cache-efficient packed arrays, with long-lived internal pointers to
> elements -- without the overhead of generally handling internal references
> in our GC. This also seems to have some advantages over value-type-arrays
> with array-index references, since we don't need to dereference or
> bounds-check the pointers between elements.
>

I hope so, and I believe that in useful cases it is so.

> Effectively, this is a mechanism to implement typesafe-managed-manual
> memory techniques within a GC system while hinting to the GC that doesn't
> need to frequently concern itself with the details inside the boxes (see
> below).
>

Almost, modulo my comment about coalescing ranges within the allocator.

> In order for this to be truly useful, I think we still need a way to free
> one of the sub-arrays. There are two interesting possibilities.
>
> (1) is simply to use a GC tracing method on the borrowed pointers of
> connected boxes, to find out which boxes have no more borrowed pointers..
> and then free that box. However, because the main point of this entire
> mechanism is to enable low-overhead alternate memory-management techniques,
> I think there is a more interesting option...
>
> (2) to allow forced free of a box, which causes the system to prevent any
> more borrowed pointers and actively null-out any existing references to the
> inside of that box. This is similar to 1, but it's guaranteed to free the
> memory in a single pass.
>

Neither is enough, because of the coalescing problem, which we would really
like *not* to do by hand. It's tempting to imagine using something like the
SmallTalk *become* mechanism, in which we *explicitly* migrate an object
from one slot to another in such a way that the GC subsystem will clean up
the pointers for us. Offhand, I'm not sure how to make that work
efficiently, but it can be done.

Jonathan

_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Re: [bitc-dev] GC tradeoffs and meeting user-centered design targets

Reply via email to