Re: [bitc-dev] GC, owning, and induction proved removal contracts

Matt Oliveri Sat, 02 Aug 2014 23:10:27 -0700

On Sat, Aug 2, 2014 at 11:20 PM, David Jeske <[email protected]> wrote:
> On Sat, Aug 2, 2014 at 3:23 PM, Matt Oliveri <[email protected]> wrote:
>> 5. Any borrowed pointer appears on a stack frame (as in Rust) or (now)
>> in some ComplexStructure rooted right under a stack frame. So when we
>> deallocate something like ListA, we can easily find all the
>> ComplexStructures to call releaseObject on a lot.
>
> I'm not very concerned with short-lived data-structures on stack frames as
> stack-based borrowing already works quite well for these scenarios.
>
> I'm concerned with long lived heap data-structures.


Sure, but everything is rooted in the stack or some other simple thing
somehow, for it to be reachable. So I assumed (incorrectly, it seems)
that you'd use releaseObject on the whole heap structure between these
simple roots and the actual borrowed pointers to remove.

If not, then the question is how do you keep track of the borrowing
structures to call releaseObject on?

>> (Or maybe the programmer writes releaseObject. But in that case,
>> automatically proving it correct is probably too hard.)
>
> The programmer must write releaseObject, because the idea is for
> releaseObject to operate as efficiently as possible. For example, an object
> indexed in a sorted-structure should be efficiently found and then removed.
>
> HOWEVER, the language to write releaseObject can be severely constrained,
> possibly not even turing complete -- for the purpose of making it provable.

Hmm, I'm not sure that would be easy. If you want releaseObject to
take full advantage of the data structure in order to update it, then
the algorithms could get arbitrarily fancy for fancier data
structures. It seems like the proof would need to be derived from the
structure of the code, probably using some extra annotations. It might
turn out to be the same as, or similar to existing ideas to use types
for memory management.

>> (How does it know how this should be done? Is it always just nulling
>> the places where the pointer occurs? That sounds kind of like weak
>> references.)
>
> Hopefully it is not just nulling the pointer, but is cleanly removing that
> object entry from the data-structure. For example, if a borrowed object was
> added to a linked list, the removal would remove that linked list node
> entirely, not just leave a null pointer behind. However, it would be
> acceptable to merely null the reference and defer the cleanup -- like a
> weak-reference.

OK. I mean, either way, the programmer will shoot themselves in the
foot if they thought there would be no dangling references, but were
wrong.

>> (Would you call releaseObject for each borrowed object, or just for
>> the whole data structure, with other objects it owns being implicitly
>> released too. Either way, it looks like in general, you're doing a
>> nested traversal over both data structures, which would be rather
>> slow.)
>
> If an entire data-structure is released (such as releasing ListA), then any
> borrowers can get a "releaseAllFrom(ListA)". If they only hold references
> from ListA, then they can be fully released quickly. If they borrow
> references from other sources, then the ListA entries will need to be
> removed one by one.

Maybe I over-generalized in what I thought you were proposing. ListA
was just an example; the owning data structure could be anything, not
just a List. How would you take advantage of releaseAllFrom-like
operations generically for any data type? Well I guess the data
structure implementer only needs to handle owning structures they
actually borrow from, but the runtime needs to handle anything
borrowing from anything.

> releaseObject() can be O(1) for indexed structures and O(borrower-size) for
> non-indexed structures. relaseAllFrom(ListA) can be O(1) for single-source
> borrowers, and O(min(borrowersize,lista)) for multi-source borrowers.
>
> When heapsize is large, all of these seem much better than a tracing
> collector to me, because a tracing collector can only collect these through
> work proportional to the entire live heap.

Well O(borrower-size) is essentially a trace of the subheap the data
structure takes up. So it wins if you can find the data structures to
call releaseObject on more efficiently than a trace could. But you
haven't told me how you'll do that.

>> Well, you have to wonder whether it should really count as
>> non-tracing, if it has to traverse ComplexStructureB completely. But I
>> do see that there would be cases where the programmer could point out
>> a smarter way.
>
> The goal is to use a smarter way to remove references. However, even if we
> are tracing, we are only tracing the borrowing-container, not the entire
> live heap!

Right.

> ------
>
> Moving beyond the simple "single owner borrowing", this can also be a method
> of handling cycles without tracing in a reference-counting collector.
>
> If it can be easy to write provable programs in a constrained language to
> manipulate (possibly cyclic) data-structures, then we can essentially ignore
> their references.
>
> ComplexStructureB, which contains internal cycles, can reference-count it's
> "ownership" of an object as a single reference, ignoring it's own internal
> cyclic references -- since they are manipulated by a provably correct
> container-manipulation sub-language.

I suspect I'm not understanding you here, but I wasn't worried about
cycles _within_ borrowing structures. We assume releaseObject will
handle that. I might be worried about borrowing cycles _between_
structures, depending on how you're finding the structures when
automatically calling releaseObject. By "borrowing cycle" I mean a
cycle in the "B borrows some objects that A owns" relation.

>> > B) However, IF we can prove that it's *unprovable* whether or not a
>> > release-event removes all references to an object, then I think we've
>> > proven
>> > it's not possible have typesafe memorysafe non-tracing precise heap
>> > collection for that container
>>
>> Proving something unprovable requires a couter-model. And you didn't
>> say which logic it should be unprovable in. Did you just mean "prove
>> that releaseObject is unimplementable"?
>
> The halting-problems says every turing complete program can't be proven to
> finish, which means a touring-complete implementation of releaseObject can't
> necessarily be proven to even finish, let alone release the object.
>
> Therefore, if we can prove that some reasonably interesting data-structure
> (say a kd-tree), requires a turning complete language to insert into, then
> we have proven that it is impossible to have a provably correct
> releaseObject implementation.

I think you might be misunderstanding something. Whether a proof
_exists_ is a separate issue from whether you have an _algorithm_ to
find a proof or refutation. This is again different from whether you
have a conservative algorithm that will find some proofs but possibly
give up.

"Unprovable" is saying a proof doesn't exist, which is much stronger
than saying there's no good way to decide a problem in general. A
little bit of human insight can easily lead to many proofs that answer
special cases of undecidable problems. This is (part of) why type
systems work.

> However, I suspect there is a non-turing complete subset language which
> could provably manipulate many many common data-structures.

Technically there's a Turing-complete language which could provably do
just about anything possible: proof-carrying code. So the real reason,
if any, to use a non-Turing-complete language would be to see that
things are provable without actually needing to see the proof. But I
think if you restrict the language that much, you'd rule out many
efficient algorithms too. I don't think you need a non-Turing-complete
language. You probably need some kind of extra type structure, like I
said above, which might require annotations, and might be a little
restrictive, but not as much.

One thing I can say for sure is that if you demand that the
releaseObject implementations be guaranteed to terminate, then you are
stuck with a non-Turing-complete language. You get to pick how close
to Turing-complete though, but this would still be an unusual-looking
language, unless you separate the termination proofs. (Unusual like
monads and structural recursion. But some people are OK with these
things.)
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Re: [bitc-dev] GC, owning, and induction proved removal contracts

Reply via email to