Re: [bitc-dev] Regions and GC

Jonathan S. Shapiro Sun, 04 Aug 2013 14:42:47 -0700

On Sat, Aug 3, 2013 at 12:38 AM, William ML Leslie <
[email protected]> wrote:

> On 3 August 2013 03:52, Jonathan S. Shapiro <[email protected]> wrote:
> > On Fri, Aug 2, 2013 at 8:20 AM, Matt Rice <[email protected]> wrote:
> >>
> >> On Sat, Jul 27, 2013 at 1:29 PM, Jonathan S. Shapiro <[email protected]>
> >> wrote:
> > However when region polymorphism is introduced at procedure parameters
> and
> > return values this becomes less clear. Objects in a caller region can
> now be
> > manipulated by a callee. The lifetime of a region is still bounded by the
> > stack frame in which it is allocate [*], but it's no longer easy to
> point at
> > a single stack frame and say "that's the [single] frame in which this
> region
> > can be touched".
>
> You didn't add the footnote you mentioned there...

So sorry. It's a pretty boring footnote, though. The lifetime of a region
is bounded by the stack frame in which it [the region] is allocated, *except
for the "bottom" region, which is the general GC heap, and is not
associated with any particular frame*. It's a minor point about how the
region ordering induction grounds out, that's all.

Shap said:
>
> > The lifetime *of a region* is bounded by the stack frame in which *it*
> is allocated
>
> This need not be the stack frame that the *value* was allocated in...
>

Indeed. The model is that a region is allocated in a let-like binding
somewhere (therefore in a stack frame somewhere), and then objects are
allocated in/from that region. Assuming we do not choose to GC the region,
the lifetime of those objects is exactly the lifetime of the region. If we *
do* choose to GC the region, we can exploit the region induction to reduce
what has to be traced. I've said that second part several times, but maybe
a more unpacked explanation might be helpful:

At any given moment, we have an inductive chain of regions that have a
stack-like ordering discipline:

R0 (the GC Heap) > R1 > R2 > R3 > R4 ... > Rmax (the current frame)

where '>' should be read as "has longer lifespan than". The usual use of
regions imposes a subtype relation, such that an object in region Ri can
only be pointed at from a pointer in Region Rj <= Ri. This is in contrast
to generational collection, where pointers are permitted from tenured space
to younger generations, and must be tracked specially.

The length of the region relation chain |R| is of the same order as the
depth of the stack in frames |S|. More precisely, the length |R| is
strictly <= the total number of let bindings |LB| on the stack, and since
there is a constant bound on let bindings per frame, |LB| ~ |S|. But in
pragmatically, most "temporary" garbage is allocated in procedures close to
the bottom of the call chain. What this means is that regions worth the
bother of collecting tend to be very near Rnow, tend to be small, and tend
to require a very limited amount of tracing.

One way to think about it is that region-driven collection is a form of
multi-generational collection in which the liveness of a GC generation is
related to the liveness of stack frames, but there is an added benefit that
we can often tell, statically, when a given region cannot usefully be
collected. By "not usefully", I mean that we can show statically that
collection will not meaningfully reduce the size of that region.

> The return case is illustrative, because a function can return a
> reference to a fresh list of dynamically determined length and still
> have accurate lifetimes inferred by the region analysis!  Clearly the
> list itself can't be allocated on the stack, so what is going on?
>

Actually, it *can* be allocated on the stack. Or at least it *often* can.

First, a side observation: you can think of stack frames themselves, and
the unboxed regions they contain, as a "poor man's region system". The
usual cross-region reference rule is enforced by prohibiting address-of on
unboxed stack-allocated objects, or alternatively by allowing address-of
but only when stored in borrowed pointer cells. So borrowed pointers are an
extended from of "poor man's region system".

If you can statically estimate the size of a region, or get a decently
conservative bound, you can quite literally put the region on the stack.
When you know that an object is non-escaping, you simply allocate that
object from the stack frame using alloca().

That allocation approach breaks down if the region size is dynamic. At that
point you either have to allocate the dynamically-allocated portion of the
region from the GC heap, or leave yourself a conservatively sized arena, or
some such other form of storage management.

The most interesting case is the one where you are returning a dynamically
allocated object. The problem is that you would like to make an alloca()
call in the caller's call frame, but the callee's call frame is "in the
way".  For the same reason, it's hard to fuse regions with generational
scavenging in this case, though there are some "partial compaction"
techniques that you can use.

A region system can tell us that references into a region do not
> escape, may escape, or clearly escape via the return value.  When a
> region escapes via return, region inference requires that the caller
> take responsibility for disposal of the region (usually, specifically
> the value).

Yes.

> It is perfectly acceptible for the value itself to live
> on the heap.
>

>From a "how to think about regions" perspective, I agree. From a "why are
regions pragmatically useful" perspective, not so much. If you put the
region-governed objects into the GC heap, then they are still subject to
GC, and the main practical benefit of regions was GC reduction.

> I find the most thorough introduction to regions to be Mads Tofte and
> Lars Birkedal: "A Region Inference Algorithm", ACM Transactions on
> Programming Languages and Systems, Vol 20, No 5, July 1998.
>

Yes. This is one of the better papers on the subject. One thing I
*haven't* stumbled
across is a paper or writeup that covers the same material in a more
non-researcher-friendly sort of way.

shap

_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Re: [bitc-dev] Regions and GC

Reply via email to