I want to open a discussion about GC, but before I do, I want to give a
bit of background.

In spite of the very impressive work by Demers-Boehm-Weiser,
Conservative GC doesn't work in the absence of compiler support. Using
current library-only conservative GC techniques, long-running
applications leak heap space badly. This was our experience in OpenCM,
and subsequent reading led me to realize that this property was actually
well known -- various experiments had been done with scheme and other
systems that demonstrated the leakiness. I really wish that I had
understood this better at the time, because this problem effectively
killed OpenCM, and it has led other people (quite rightly, IMHO) to
ignore conservative GC as a viable memory management technique. I also
suspect that it will turn out to be a major issue impeding the success
of applications using the mono environment and the GCJ environment.

There are a variety of things that compilers could do to help. In C,
it's not generally possible to have precise information, but one could
at least have much less conservative information. Small language changes
could help further. In practice, untagged unions always turn out to have
a tag, in the sense that there are always some fields somewhere that you
could write a predicate about that would tell you which variant of the
union you should be looking at. One problem with C is that these tags
cannot be expressed within the language even when the programmer knows
what they are. These things could be done, and they could be done
without altering the efficiency of C. Think of them as a kind of
specialized debugging information. Unfortunately, at the end of the day
the C community doesn't seem to feel any pressure to support automatic
memory management -- not even in this sort of very minimalist way.

This leads to three possible outcomes:

  1. Managed storage is doomed outside of "all or nothing" solutions
     like .NET.

  2. C compilers get augmented so as to emit a bit more information
     in support of managed storage. The most promising work here is
     probably the Ivy infrastructure, but I basically don't think that
     this is going to happen.

  3. The current body of unsafe libraries gets incrementally recoded
     using safe languages. This is the possibility that I want to
     explore for a moment.

I am wondering if we cannot find a way to limit the impact of
conservative GC to objects whose references (transitively) cross the
interface boundary between managed and unmanaged worlds, and if we can
then augment the interface specifications to give some information about
pointer lifespans.

As a starting point, lets consider the case of a library coded in BitC
that is being called by C. It seems to me that there are two types of
pointers that might be returned from BitC to C:

1. Pointers that must be explicitly freed by C. The associated objects
can be annotated as such.

2. Pointers that have relative lifespan constraints. E.g. a pointer to
internal state is not supposed to survive its containing object.

3. Pointers where we explicitly don't know what the heck is going on,
which will have to be managed conservatively, but at least we can limit
the damage a bit -- even if we don't control the C compiler.

So here are my questions:

How useful would these annotations be in practice?

To what degree can they be inferred automatically from the C code?

Am I simply being silly here? Is there a more optimistic way out than
rewriting the current body of libraries?


shap

_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to