On 21/05/2021 01:11, David Holmes wrote:
Hi Peter,
On 21/05/2021 12:42 am, Peter Levart wrote:
Hi Aleksei,
Are you trying to solve this in principle or do you have a concrete
problem at hand which triggers this deadlock? If it is the later,
then some rearrangement of code might do the trick... For example,
native libraries are typically loaded by a class initializer of some
class that is guaranteed to be initialized before the 1st invocation
of a native method from such library. But if such class can also be
loaded and initialized by some other trigger, deadlock can occur.
Best remedy for such situation is to move all native methods to a
special class that serves just for interfacing with native code and
also contains an initializer that loads the native library and
nothing else. Such arrangement would ensure that the order of taking
locks is always the same: classLoadingLock -> nativeLibraryLock ...
There were specific examples for this problem, but Aleksei is trying
to define a general solution - which unfortunately doesn't exist.
The basic deadlock scenario is a special variant of the general class
initialization deadlock:
Thread 1:
- loadLibrary
- acquire loadLibrary global lock
- call JNI_OnLoad
- use class Foo (which needs to be loaded and initialized)
- block acquiring <clinit> lock for Foo
Thread 2:
- Initialize class Foo
- Acquire <clinit> lock for Foo
- <clinit>
- call loadLibrary(x) // for any X
- block acquiring loadLibrary global lock
We can reduce the chance of deadlock by using a per-native-library
lock instead of the global loadLibrary lock - which is what Aleksei's
initial version did. But we cannot remove all deadlock possibility
because we must ensure only one thread can be executing JNI_OnLoad for
any given native library.
Right, I was just trying to suggest that by exercising some discipline
about how to arrange code that loads native libraries, deadlocks can be
avoided if also Aleksei's initial version of the patch is used (using a
per-native-library lock instead of the global loadLibrary lock). The
discipline would be to load a particular native library only from
<clinit> of a unique class associated with that native library. If
everybody followed this discipline (which fortunately is typical use of
loadLibrary), then without Aleksei's patch, deadlock is still possible.
Imagine two native libraries A and B, each loaded from <clinit> of
corresponding classes ClassA and ClassB. Library A also has JNI_OnLoad
which uses ClassB. So we have:
Thread 1:
- initialize ClassA
- acquire <clinit> lock for ClassA
- ClassA.<clinit>
- call loadLibrary(A)
- acquire loadLibrary global lock
- call JNI_OnLoad for A
- use class ClassB (which needs to be loaded and initialized)
- block acquiring <clinit> lock for ClassB
Thread 2:
- initialize ClassB
- acquire <clinit> lock for ClassB
- ClassB.<clinit>
- call loadLibrary(B)
- block acquiring loadLibrary global lock
With Aleksei's initial patch, such scenario would not result in a
deadlock. And since such scenario can arise from typical use of
loadLibrary, I think this patch is a good thing. It reduces number of
scenarios where deadlock is a possible outcome. Here's another scenario
where deadlock would still occur even with Sergei's initial patch. It is
a modified variant of above scenario where JNI_OnLoad of library A uses
ClassB and JNI_OnLoad of library B uses ClassA:
Thread 1:
- initialize ClassA
- acquire <clinit> lock for ClassA
- ClassA.<clinit>
- call loadLibrary(A)
- acquire loadLibrary(A) lock
- call JNI_OnLoad for A
- use class ClassB (which needs to be loaded and initialized)
- block acquiring <clinit> lock for ClassB
Thread 2:
- initialize ClassB
- acquire <clinit> lock for ClassB
- ClassB.<clinit>
- call loadLibrary(B)
- acquire loadLibrary(B) lock
- call JNI_OnLoad for B
- use class ClassA (which needs to be loaded and initialized)
- block acquiring <clinit> lock for ClassA
But in this scenario, deadlock is provoked without blocking on any
loadLibrary lock. This deadlock arises from circular dependencies among
classes in their <clinit> methods. It just happens that these
dependencies are evaluated via a chain of: X.<clinit> -> loadLibrary ->
JNI_OnLoad -> use class Y calls. Such deadlock is possible without
involvement of native libraries loading too, so I would not classify it
as the problem that Sergei's patch is trying to solve.
I still haven't found a scenario of a possible deadlock when Sergei's
initial patch is combined with the above mentioned discipline in which a
loadLibrary lock would be involved.
Regards, Peter
Cheers,
David
-----
Regards, Peter
On 5/20/21 12:31 AM, David Holmes wrote:
On 20/05/2021 2:29 am, Aleksei Voitylov wrote:
On Wed, 19 May 2021 16:21:41 GMT, Aleksei Voitylov
<[email protected]> wrote:
Please review this PR which fixes the deadlock in ClassLoader
between the two lock objects - a lock object associated with the
class being loaded, and the ClassLoader.loadedLibraryNames hash
map, locked during the native library load operation.
Problem being fixed:
The initial reproducer demonstrated a deadlock between the
JarFile/ZipFile and the hash map. That deadlock exists even when
the ZipFile/JarFile lock is removed because there's another lock
object in the class loader, associated with the name of the class
being loaded. Such objects are stored in
ClassLoader.parallelLockMap. The deadlock occurs when
JNI_OnLoad() loads exactly the same class, whose signature is
being verified in another thread.
Proposed fix:
The proposed patch suggests to get rid of locking
loadedLibraryNames hash map and synchronize on each entry name,
as it's done with class names in see
ClassLoader.getClassLoadingLock(name) method.
The patch introduces nativeLibraryLockMap which holds the lock
objects for each library name, and the getNativeLibraryLock()
private method is used to lazily initialize the corresponding
lock object. nativeLibraryContext was changed to ThreadLocal, so
that in any concurrent thread it would have a NativeLibrary
object on top of the stack, that's being currently
loaded/unloaded in that thread. nativeLibraryLockMap accumulates
the names of all native libraries loaded - in line with class
loading code, it is not explicitly cleared.
Testing: jtreg and jck testing with no regressions. A new
regression test was developed.
Aleksei Voitylov has updated the pull request incrementally with
one additional commit since the last revision:
address review comments, add tests
Dear colleagues,
The updated PR addresses review comment regarding ThreadLocal as
well as David' concern around the lock being held during
JNI_OnLoad/JNI_OnUnload calls, and ensures all lock objects are
deallocated. Multiple threads are allowed to enter
NativeLibrary.load() to prevent any thread from locking while
another thread loads a library. Before the update, there could be a
class loading lock held by a parallel capable class loader, which
can deadlock with the library loading lock. As proposed by David
Holmes, the library loading lock was removed because
dlopen/LoadLibrary are thread safe and they maintain internal
reference counters on libraries. There's still a lock being held
while a pair of containers are read/updated. It's not going to
deadlock as there's no lock/wait operation performed while that
lock is held. Multiple threads may create their own copies of
NativeLibrary object and register it for auto unloading.
Tests for auto unloading were added along with the PR update. There
are now 3 jtreg tests:
- one checks for deadlock, similar to the one proposed by Chris
Hegarty
- two other tests are for library unload.
The major side effect of that multiple threads are allowed to enter
is that JNI_OnLoad/JNI_OnUnload may be called multiple (but same)
number of times from concurrent threads. In particular, the number
of calls to JNI_OnLoad must be equal to the number of calls to
JNI_OnUnload after the relevant class loader is garbage collected.
This may affect the behaviour that relies on specific order or the
number of JNI_OnLoad/JNI_OnUnload calls. The current JNI
specification does not mandate how many times
JNI_OnLoad/JNI_OnUnload are called. Also, we could not locate tests
in jck/jtreg/vmTestbase that would rely on the specific order or
number of calls to JNI_OnLoad/JNI_OnUnload.
But you can't make such a change! That was my point. To fix the
deadlock we must not hold a lock. But we must ensure only a single
call to JNI_OnLoad is possible. It is an unsolvable problem with
those constraints. You can't just change the behaviour of JNI_OnLoad
like that.
David
-----
If this is really a problem that several people are facing, then
perhaps a change in the API could solve it. I'm thinking
Thank you Alan Bateman, David Holmes and Chris Hegarty for your
valuable input.
-------------
PR: https://git.openjdk.java.net/jdk/pull/3976