Re: RFR: 8266310: deadlock while loading the JNI code [v2]

Peter Levart Fri, 21 May 2021 01:30:15 -0700


On 21/05/2021 01:11, David Holmes wrote:

Hi Peter,

On 21/05/2021 12:42 am, Peter Levart wrote:
Hi Aleksei,
Are you trying to solve this in principle or do you have a concreteproblem at hand which triggers this deadlock? If it is the later,then some rearrangement of code might do the trick... For example,native libraries are typically loaded by a class initializer of someclass that is guaranteed to be initialized before the 1st invocationof a native method from such library. But if such class can also beloaded and initialized by some other trigger, deadlock can occur.Best remedy for such situation is to move all native methods to aspecial class that serves just for interfacing with native code andalso contains an initializer that loads the native library andnothing else. Such arrangement would ensure that the order of takinglocks is always the same: classLoadingLock -> nativeLibraryLock ...
There were specific examples for this problem, but Aleksei is tryingto define a general solution - which unfortunately doesn't exist.
The basic deadlock scenario is a special variant of the general classinitialization deadlock:
Thread 1:
- loadLibrary
 - acquire loadLibrary global lock
   - call JNI_OnLoad
     - use class Foo (which needs to be loaded and initialized)
       - block acquiring <clinit> lock for Foo

Thread 2:
 - Initialize class Foo
  - Acquire <clinit> lock for Foo
    - <clinit>
      - call loadLibrary(x) // for any X
       - block acquiring loadLibrary global lock
We can reduce the chance of deadlock by using a per-native-librarylock instead of the global loadLibrary lock - which is what Aleksei'sinitial version did. But we cannot remove all deadlock possibilitybecause we must ensure only one thread can be executing JNI_OnLoad forany given native library.

Right, I was just trying to suggest that by exercising some disciplineabout how to arrange code that loads native libraries, deadlocks can beavoided if also Aleksei's initial version of the patch is used (using aper-native-library lock instead of the global loadLibrary lock). Thediscipline would be to load a particular native library only from<clinit> of a unique class associated with that native library. Ifeverybody followed this discipline (which fortunately is typical use ofloadLibrary), then without Aleksei's patch, deadlock is still possible.Imagine two native libraries A and B, each loaded from <clinit> ofcorresponding classes ClassA and ClassB. Library A also has JNI_OnLoadwhich uses ClassB. So we have:


Thread 1:
    - initialize ClassA
    - acquire <clinit> lock for ClassA
        - ClassA.<clinit>
        - call loadLibrary(A)
        - acquire loadLibrary global lock
            - call JNI_OnLoad for A
            - use class ClassB (which needs to be loaded and initialized)
            - block acquiring <clinit> lock for ClassB

Thread 2:
    - initialize ClassB
    - acquire <clinit> lock for ClassB
        - ClassB.<clinit>
        - call loadLibrary(B)
        - block acquiring loadLibrary global lock

With Aleksei's initial patch, such scenario would not result in adeadlock. And since such scenario can arise from typical use ofloadLibrary, I think this patch is a good thing. It reduces number ofscenarios where deadlock is a possible outcome. Here's another scenariowhere deadlock would still occur even with Sergei's initial patch. It isa modified variant of above scenario where JNI_OnLoad of library A usesClassB and JNI_OnLoad of library B uses ClassA:


Thread 1:
    - initialize ClassA
    - acquire <clinit> lock for ClassA
        - ClassA.<clinit>
        - call loadLibrary(A)
        - acquire loadLibrary(A) lock
            - call JNI_OnLoad for A
            - use class ClassB (which needs to be loaded and initialized)
            - block acquiring <clinit> lock for ClassB

Thread 2:
    - initialize ClassB
    - acquire <clinit> lock for ClassB
        - ClassB.<clinit>
        - call loadLibrary(B)
        - acquire loadLibrary(B) lock
            - call JNI_OnLoad for B
            - use class ClassA (which needs to be loaded and initialized)
            - block acquiring <clinit> lock for ClassA

But in this scenario, deadlock is provoked without blocking on anyloadLibrary lock. This deadlock arises from circular dependencies amongclasses in their <clinit> methods. It just happens that thesedependencies are evaluated via a chain of: X.<clinit> -> loadLibrary ->JNI_OnLoad -> use class Y calls. Such deadlock is possible withoutinvolvement of native libraries loading too, so I would not classify itas the problem that Sergei's patch is trying to solve.

I still haven't found a scenario of a possible deadlock when Sergei'sinitial patch is combined with the above mentioned discipline in which aloadLibrary lock would be involved.


Regards, Peter

Cheers,
David
-----
Regards, Peter

On 5/20/21 12:31 AM, David Holmes wrote:
On 20/05/2021 2:29 am, Aleksei Voitylov wrote:
On Wed, 19 May 2021 16:21:41 GMT, Aleksei Voitylov<avoity...@openjdk.org> wrote:
Please review this PR which fixes the deadlock in ClassLoaderbetween the two lock objects - a lock object associated with theclass being loaded, and the ClassLoader.loadedLibraryNames hashmap, locked during the native library load operation.
Problem being fixed:
The initial reproducer demonstrated a deadlock between theJarFile/ZipFile and the hash map. That deadlock exists even whenthe ZipFile/JarFile lock is removed because there's another lockobject in the class loader, associated with the name of the classbeing loaded. Such objects are stored inClassLoader.parallelLockMap. The deadlock occurs whenJNI_OnLoad() loads exactly the same class, whose signature isbeing verified in another thread.
Proposed fix:
The proposed patch suggests to get rid of lockingloadedLibraryNames hash map and synchronize on each entry name,as it's done with class names in seeClassLoader.getClassLoadingLock(name) method.
The patch introduces nativeLibraryLockMap which holds the lockobjects for each library name, and the getNativeLibraryLock()private method is used to lazily initialize the correspondinglock object. nativeLibraryContext was changed to ThreadLocal, sothat in any concurrent thread it would have a NativeLibraryobject on top of the stack, that's being currentlyloaded/unloaded in that thread. nativeLibraryLockMap accumulatesthe names of all native libraries loaded - in line with classloading code, it is not explicitly cleared.
Testing: jtreg and jck testing with no regressions. A newregression test was developed.
Aleksei Voitylov has updated the pull request incrementally withone additional commit since the last revision:
   address review comments, add tests
Dear colleagues,
The updated PR addresses review comment regarding ThreadLocal aswell as David' concern around the lock being held duringJNI_OnLoad/JNI_OnUnload calls, and ensures all lock objects aredeallocated. Multiple threads are allowed to enterNativeLibrary.load() to prevent any thread from locking whileanother thread loads a library. Before the update, there could be aclass loading lock held by a parallel capable class loader, whichcan deadlock with the library loading lock. As proposed by DavidHolmes, the library loading lock was removed becausedlopen/LoadLibrary are thread safe and they maintain internalreference counters on libraries. There's still a lock being heldwhile a pair of containers are read/updated. It's not going todeadlock as there's no lock/wait operation performed while thatlock is held. Multiple threads may create their own copies ofNativeLibrary object and register it for auto unloading.
Tests for auto unloading were added along with the PR update. Thereare now 3 jtreg tests:- one checks for deadlock, similar to the one proposed by ChrisHegarty
- two other tests are for library unload.
The major side effect of that multiple threads are allowed to enteris that JNI_OnLoad/JNI_OnUnload may be called multiple (but same)number of times from concurrent threads. In particular, the numberof calls to JNI_OnLoad must be equal to the number of calls toJNI_OnUnload after the relevant class loader is garbage collected.This may affect the behaviour that relies on specific order or thenumber of JNI_OnLoad/JNI_OnUnload calls. The current JNIspecification does not mandate how many timesJNI_OnLoad/JNI_OnUnload are called. Also, we could not locate testsin jck/jtreg/vmTestbase that would rely on the specific order ornumber of calls to JNI_OnLoad/JNI_OnUnload.
But you can't make such a change! That was my point. To fix thedeadlock we must not hold a lock. But we must ensure only a singlecall to JNI_OnLoad is possible. It is an unsolvable problem withthose constraints. You can't just change the behaviour of JNI_OnLoadlike that.
David
-----
If this is really a problem that several people are facing, thenperhaps a change in the API could solve it. I'm thinking
Thank you Alan Bateman, David Holmes and Chris Hegarty for yourvaluable input.
-------------

PR: https://git.openjdk.java.net/jdk/pull/3976

Re: RFR: 8266310: deadlock while loading the JNI code [v2]

Reply via email to