Re: Finalization and dead references: another proposal

Peter Levart Thu, 07 Dec 2017 07:51:06 -0800

Hi,

I'm trying to read the rules and imagine cases where they would not work...

/A field declared to be reachability-sensitive is reachability-safe. Anaccess a to a reachability-safe field of object p inside a (possiblystatic) method in p’s class, results in the introduction ofreachabilityFence()s according to the following rules://

//

//For every local reference variable v declared immediately insidelexical scope s, if s contains such an access a, where p is obtained bydereferencing v, then Reference.reachabilityFence(v) will be executedjust before either (1) the exit of the scope s, or (2) just before anyassignment to v. For present purposes, “this” is treated as a variabledeclared at method scope, as if it were an explicit parameter.////If the object containing the field f is allocated in the samefull-expression as the access a, then Reference.reachabilityFence(p),where p is a reference to the object containing f, is executed at theend of the full expression.//

//
//The second case covers expressions like//
//
//    nativeDoSomething(new Foo(...).nativeHandle)//
//

//where again the newly allocated object could otherwise be cleanedbefore nativeHandle is used.//

1st case that comes to mind and is a real case in the JDK is theexistence of sun.nio.ch.DirectBuffer internal (not exported) interfaceimplemented by NIO direct buffer(s) with method:


    long address();

Utility method like the following:

    static void erase(ByteBuffer bb) {

unsafe.setMemory(((DirectBuffer)bb).address(), bb.capacity(),(byte)0);

...won't get reachabilityFence(bb) before return, will it? Calling amethod on a bb doesn't count as a dereference of a reachablity-sensitivefield of bb. Unless methods could also be annotated with the sameannotation. This would only work for instance methods that return anative resource kept in the same instance or some instance that isreachable from this instance.

Native resources are typically encapsulated, but there are variouslevels of encapsulation. DirectBuffer is an example of module levelencapsulation by non-exported public interface implemented bypackage-private classes.

2nd case is artificial. But, since we have streams, lambdas,optionals... It's easy to fool above rules even if the annotation is puton the DirectBuffer#address() method ...


static void doSomethingWith1stNonNull(DirectBuffer... dbs) {
    Stream.of(dbs)
        .filter(Objects::nonNull)
        .mapToLong(DirectBuffer::address)
        .findFirst()
        .ifPresent(addr -> {
            ... do something with addr ...
        });
}

Even with imperative programming, one can play with scopes:

static void doSomethingWith1stNonNull(DirectBuffer... dbs) {
    long addr = 0;
    for (DirectBuffer db : dbs) {
        if (db != null) {
            addr = db.address();
            break;
        }
    }
    if (addr != 0) {
        ... do something with addr ...
    }
}

Or, hope that this would work (but the above rules don't cover it):

static void doSomethingWithEither(DirectBuffer db1, DirectBuffer db2) {
    long addr = ((some condition) ? db1 : db2).address();
    ... do something with addr ...
}

But I can imagine that teaching users to not do such foolish thingswould be easy. The rules are simple:- always dereference reachablity-sensitive resources directly throughlocal reference variables.- make sure that the local reference variable that you extractedreachablity-sensitive resource from, is in scope while you performoperations on the resource.


Regards, Peter


On 12/07/2017 01:38 AM, Hans Boehm wrote:

We're still trying to deal with a fair amount of code that implicitly
assumes that finalization or similar clean-up will not occur while a
pointer to the affected object is in scope. Which is of course not true.

As a reminder, the canonical offending usage (anti-)pattern with
(deprecated, but easier to write) finalizers is

class Foo {
     private long mPtrToNativeFoo;

     private static native void nativeDeallocate(long nativePtr);
     private static native void nativeDoSomething(long nativePtr, long
anotherNativePtr);

     protected void finalize() { ... nativeDeallocate(mPtrToNativeFoo); ... }

     public void doSomething(Foo another) { ...
nativeDoSomething(mPtrToNativeFoo, another.mPtrToNativeFoo) ... }
     ...
}

This is subtly incorrect in that, while executing the final call to
doSomething() on a particular object, just after retrieving mPtrToNativeFoo
and another.mPtrToNativeFoo, but before invoking nativeDoSomething(), the
garbage collector may run, and "this" and "another" may be finalized,
deallocating the native objects their mPtrToNativeFoos refer to.
When nativeDoSomething() finally does run, it may see dangling pointers.

Examples using java.lang.ref or Cleaners (or even WeakHashMap, if you must)
instead of finalize() are as easy to construct, but a bit longer. (The
finalizer version is also arguably incorrect in other ways that are not
relevant here. Pretend this were written in terms of PhantomReferences.)

It is easily possible to construct 100% Java code with the same problem.
Instead of mPtrToNativeFoo, each object stores an integer handle that is
used to access additional Java state logically associated with the object.
But the native pointer case seems to dominate in practice.

Various solutions to this have been proposed, but none seem quite
attractive enough that I actually feel comfortable asking people to update
their code to use them. Noteworthy proposals include:

0) Explicitly call close() on such objects. Great idea when it works. In
general it doesn't, since the code needs to know when the enclosing Java
object is no longer needed. If we always knew that we wouldn't need a GC.
1) Various hacks to keep variables live, e.g. the one based on synchronized
blocks I advocated in my 2004 JavaOne talk. These are slow and ugly, as
we've always admitted. Nobody used them. Incorrect won over slow, ugly, and
complicated ~100% of the time.
2) Java 9's reachabilityFence(). This is better. It can be implemented so
that it's no longer slow. But in many common cases, it's still quite ugly.
For example, the call to nativeDoSomething() above must be followed by two
reachabilityFences, one on this and one on another. And to do this really
correctly, the whole thing would often need to be in a try...finally block.
And in reality code following this pattern usually doesn't have just a
single doSomething method that needs this treatment, but may easily have
dozens. And the rules for placing reachabilityFences can become quite
subtle if there are e.g. locally allocated objects involved. My assessment
is that this isn't good enough. People may no longer write incorrect code
100% of the time, but I'd bet on 70%+.
3) JNI functions can be rewritten, so that the containing Java object is
passed in addition to the native pointers. Somewhat accidentally, this
happens to be roughly free for single argument functions. (Delete
"static".) It adds overhead in other cases, like the one above, and the
rewriting can get somewhat convoluted. In some cases, it doesn't work at
all. AFAIK, it's never actually guaranteed to be correct; it works because
standard implementations don't optimize across the language boundary.
That's not too likely to change. Maybe.
4) We could change the language spec to always prohibit premature
finalization/cleaning in cases like the above. I could personally live with
that solution, and proposed it internally here in the past. But it doesn't
seem to go over well among implementers. And AFAICT, doing it well requires
significant tooling changes, in that we do want to reliably treat local
variables as dead once they go out of scope, a piece of information that
doesn't seem to be reliably preserved in class files. One could argue that
the current class file design implicitly assumes that we can do dead
variable elimination.

After going back and forth on this, my conclusion is that we need a
linguistic mechanism for identifying the case in which the garbage
collector is being used to managed external resources associated with a
field. A (still slowly evolving) proposal to add an annotation to do so is
at

https://docs.google.com/document/d/1yMC4VzZobMQTrVAraMy7xBdWPCJ0p7G5r_43yaqsj-s/edit?usp=sharing

In many ways, this is inherently a compromise. But in the vast majority of
cases, it greatly reduces the required source code changes over all but (4)
above. And I think I could usually explain to an author of currently broken
code in under 5 minutes exactly what they need to do to fix it. And I
wouldn't have to lie much to do so. I don't think (0)-(3) above share that
property.

This has already benefited from comments provided by a few people. (Thanks
to Doug, Jeremy, and Martin, among others.) But I would really like more
feedback, including suggestions about how to proceed.

Hans

Re: Finalization and dead references: another proposal

Reply via email to