Roger,
I did think about confining the changes to ObjectInputStream. Maybe I
did not think hard enough about it.
Keeping a cache of the result of a first stack walk might work for the
recursive calls. One might still do a useless stack walk for each
top-level readObject call. No concurrent class loading can invalidate
the cache for any particular stack walk. But it seems fraught to
reuse a cached value in the face of concurrent class loading, or in
the presence of overloaded ObjectInputStream methods that might create
new ClassLoaders on this stack. I'm sure I don't want to debug that.
My experiments have shown that the average stack walk is about 20
frames. I do not have the data on how many of those frames are
recursive calls within ObjectInputStream methods and how many are in
other code.
Dragging in the StackWalker API would further complicate things, in my
opinion. It might appear to keep the code out of platform native
code, but we know that walking thread stacks involves native code.
I settled on a system-wide one-bit cache: There have been no
user-defined ClassLoaders constructed, so I can not find one on any
particular stack walk. It may not be the best possible solution, but
it addresses the issue of useless stack walks in the common case where
there are no user-defined ClassLoaders.
I agree that both ClassLoader and ObjectInputStream are complex. In
fact, the proposed patch makes no changes to ObjectInputStream. There
is one line added to each of the system-defined ClassLoaders to
identify them as not user-defined ClassLoaders, one line in the base
ClassLoader constructor, and the bulk of the change is around the
guard in jdk.internal.misc.VM.latestUserDefinedLoader() to avoid the
stack walk if possible.
... peter
-----Original Message-----
From: Roger Riggs <[email protected]>
Organization: Java Products Group, Oracle
Date: Friday, June 12, 2020 at 11:50 AM
To: "Peter Kessler (Open Source)" <[email protected]>,
"[email protected]" <[email protected]>
Subject: Re: [PATCH] 8246633: Improve the performance of
ObjectInputStream.resolveClass(ObjectStreamClass)
Hi Peter,
The hazard to avoid is the cross package coupling that makes both
ClassLoader and
ObjectInputStream more complex; both are more than sufficiently complex
already.
Optimizing the implementations make sense if it benefits enough uses
cases and
does not make the code harder to maintain and evolve.
Have you explored any alternatives that can be confined to OIS?
For example, taking advantage the many calls to readObject within
a single OIS that are nested. Possibly the first (depth = 0) find user
class loader
can be re-used or at least to short circuit the search on subsequent
nested calls?
Or caching the classloader at each depth and only need to probe for it
if it has
not already been recorded. The StackWalker API may be useful to limit
the depth
of searches.
Thanks, Roger
On 6/11/20 2:52 PM, Peter Kessler OS wrote:
> Hi Roger,
>
> Thanks for your comments.
>
> I am not knowingly changing the required behavior. All I am proposing is
> guarding the thread stack walk if it will be a waste of effort. If an
> application or framework has its own ClassLoaders, then the guard will
fail
> and the thread stack will be walked.
>
> The stack walk is important, if there are user-defined ClassLoaders. A
> method that uses ObjectInputStream to read objects can not know what
> ClassLoader to use to resolve class references in the stream, without the
> stack walk. But if there never have been any user-defined ClassLoaders
> constructed, then the stack walk can never return a user-defined
> ClassLoader.
>
> If the stream reader is part of some simple application, then it may spend
> a significant amount of its time uselessly walking thread stacks. (If the
> stream reader is part of some simple application its author may not know
to
> override ObjectInputStream.resolveClass, or how to determine what
> ClassLoader should load a class from the stream.) If someone were using
> such an application to choose platforms, the Java platform would be at a
> disadvantage.
>
> Consider a somewhat more complex case, the SPECjbb2015 benchmark, which
> claims to model a well-constructed business application on the Java
> platform. SPECjbb2015 runs multiple JVMs, exchanging information via
> ObjectInputStreams. Profiling shows that it spends more than 5% of its
time
> walking thread stacks. Yet, in spite of being a complex application,
> written by experts in the Java platform, there are no user-defined
> ClassLoaders. So all the time spent walking stacks is wasted time, and is
> one of the reasons the servers fail their service-level agreements. We can
> discuss whether SPECjbb2015 is representative of Java applications. But it
> is easily available and relatively simple to run. People use benchmarks
> like SPECjbb2015 to decide whether to use the Java platform for particular
> tasks, and it would be unfortunate if we did not make the Java platform as
> good as it could be.
>
> ... peter
>
> -----Original Message-----
> From: core-libs-dev <[email protected]> on behalf of
Roger Riggs <[email protected]>
> Organization: Java Products Group, Oracle
> Date: Tuesday, June 9, 2020 at 6:43 AM
> To: "[email protected]" <[email protected]>
> Subject: Re: [PATCH] 8246633: Improve the performance of
ObjectInputStream.resolveClass(ObjectStreamClass)
>
> Hi Peter,
>
> I'd like to understand the scope and impact of the problem before
> jumping to a solution.
> For specific applications, overriding ObjectInputStream.resolveClass
to
> handle the class
> lookup can handle the case as expeditiously as is necessary.
>
> What application uses cases are impacted and how prevelant are they?
> Many, many frameworks and applications create classloaders for their
own
> purposes and
> take advantage of the required behavior.
>
> Thanks, Roger
>
>
> On 6/4/20 7:08 PM, Peter Kessler OS wrote:
> > ObjectInputStream.resolveClass(ObjectStreamClass) is specified to
find "the first class loader on the current thread's stack (starting from the
currently executing method) that is neither the platform class loader nor its
ancestor." Finding that class loader is done with a virtual frame walk in
JVM_LatestUserDefinedLoader(JNIEnv *env). The virtual frame walk is expensive.
If an application does not define any ClassLoaders, then the virtual frame
walk will always find the system class loader.
> >
> > If an application does a significant amount of reading from an
ObjectInputStream, the cost of
ObjectInputStream.resolveClass(ObjectInputStreamClass) can be considerable. In
a sample application, that method (and the methods it calls) have been measured
to consume 5+% of the CPU cycles.
> >
> > The proposal is to note the construction of any user-defined
ClassLoader, and if none has been constructed, avoid the virtual frame walk
returning ClassLoader.getSystemClassLoader(). But if any user-defined
ClassLoader has been constructed, use the virtual frame walk to find the first
user-defined ClassLoader on the thread stack.
> >
> > User-defined ClassLoaders could be distinguished from
non-user-defined ClassLoaders in several ways. The ClassLoader class hierarchy
could be changed to make non-user-defined ClassLoaders inherit from a marker
class. There is already a jdk.internal.loader.BuiltinClassLoader but it is
used for module (and resource) loading. BuiltinClassLoader is not a
super-class of jdk.internal.reflect.DelegatingClassLoader, which is used to
load reflection classes, and reflection methods are also ignored by the virtual
frame walk. ClassLoaders could have a boolean that distinguished whether they
are user-defined or not. That would require a change to ClassLoader API.
There are other ways to distinguish user-defined ClassLoaders. The proposal is
to keep the decoration of the ClassLoaders out of the instances, so that it is
only visible to the code that needs it.
> >
> > The proposal is that each non-user-defined ClassLoader have a
static block that registers the class as a non-user-defined ClassLoader. Then
in the base ClassLoader constructor check if the ClassLoader being constructed
is a user-defined ClassLoader and if so set a boolean that acts as the guard on
the virtual frame walk. If additional non-user-defined ClassLoaders are
declared in the future beyond the ones identified in this patch, they can be
added to the registry.
> >
> > There are currently 4 non-user-defined ClassLoader classes, so the
registry is not difficult to maintain. Nor is the registry difficult to search
when ClassLoaders are constructed. The guard boolean that records whether a
user-defined ClassLoader has been constructed transitions from false to true
and never becomes false again. (An extension to transition the boolean back to
false when a user-defined ClassLoader becomes unreachable is beyond the scope
of this proposal.)
> >
> > This proposal slows loading of non-user-defined ClassLoaders by
the time it takes to register them. It slows ClassLoader construction by the
time it takes to consult the registry and conditionally set the guard boolean.
At each invocation of ObjectInputStream.resolveClass(ObjectInputStreamClass),
the guard boolean is used to possibly avoid the virtual frame walk.
> >
> > Tested with `make run-test-tier1` on Linux (CentOS 7) on both
aarch64 and x86_64. I had one failure on each of the runs
> >
> > ACTION: main -- Failed. Execution failed: `main' threw
exception: java.lang.Error: TESTBUG: the library has not been found in
${Repo}/build/linux-aarch64-server-release/images/test/hotspot/jtreg/native
> > REASON: User specified action: run main/native GTestWrapper
> >
> > on both plaforms. But I get the same failure when I run the tier1
tests on clones of OpenJDK-15+25 on both aarch64 and x86_64.
> >
> > Running a sample (representative?) application with the Oracle
Developer Studio analyzer shows a performance comparison of
> >
> > Stack Fragment sorted by metric: Attributed Total CPU Time
> > baseline.er experiment.er
> > Attributed Attributed Name
> > Total CPU Time Total CPU Time
> > sec. sec.
> > ================================== Callers
> > 3459.210 29.931
java.io.ObjectInputStream.readOrdinaryObject(boolean)
> > 1139.727 3.532
java.io.ObjectInputStream.readArray(boolean)
> > 875.262 9.116
java.io.ObjectInputStream.readNonProxyDesc(boolean)
> >
> > ================================== Stack Fragment
> >
java.io.ObjectInputStream.readClassDesc(boolean)
> >
java.io.ObjectInputStream.readNonProxyDesc(boolean)
> >
java.io.ObjectInputStream.resolveClass(java.io.ObjectStreamClass)
> >
java.io.ObjectInputStream.latestUserDefinedLoader()
> > 4.173 3.953
jdk.internal.misc.VM.latestUserDefinedLoader()
> >
> > ================================== Callees
> > 5470.026 0.
jdk.internal.misc.VM.latestUserDefinedLoader0()
> > 0. 38.627
java.lang.ClassLoader.getSystemClassLoader()
> >
> > The `hg export -g` below is with respect to OpenJDK-15+25.
> >
> > Thank you for your review comments. I will need a sponsor to get
this change into the repository.
> >
> > ... peter
> >
> > # HG changeset patch
> > # User Peter Kessler <[email protected]>
> > # Date 1591310467 25200
> > # Thu Jun 04 15:41:07 2020 -0700
> > # Node ID 9a39488182c1dfc5bc7bb41d562d7ef16ee657f6
> > # Parent 90b266a84c06f1b3dc0ed8767856793e8c1c357e
> > Improve the performance of ObjectInputStream.resolveClass
> >
> > diff --git
a/src/java.base/share/classes/java/lang/ClassLoader.java
b/src/java.base/share/classes/java/lang/ClassLoader.java
> > --- a/src/java.base/share/classes/java/lang/ClassLoader.java
> > +++ b/src/java.base/share/classes/java/lang/ClassLoader.java
> > @@ -385,6 +385,8 @@
> > }
> > this.package2certs = new ConcurrentHashMap<>();
> > this.nameAndId = nameAndId(this);
> > + /* Note the construction of a ClassLoader. */
> > + VM.noteClassLoaderConstruction(this.getClass());
> > }
> >
> > /**
> > diff --git
a/src/java.base/share/classes/jdk/internal/loader/ClassLoaders.java
b/src/java.base/share/classes/jdk/internal/loader/ClassLoaders.java
> > ---
a/src/java.base/share/classes/jdk/internal/loader/ClassLoaders.java
> > +++
b/src/java.base/share/classes/jdk/internal/loader/ClassLoaders.java
> > @@ -117,6 +117,11 @@
> > protected Class<?> loadClassOrNull(String cn, boolean
resolve) {
> > return JLA.findBootstrapClassOrNull(this, cn);
> > }
> > +
> > + static {
> > + /* Register this ClassLoader as not a user-defined
ClassLoader. */
> > +
VM.registerNotUserDefinedClassLoader(BootClassLoader.class);
> > + }
> > };
> >
> > /**
> > @@ -127,6 +132,8 @@
> > static {
> > if (!ClassLoader.registerAsParallelCapable())
> > throw new InternalError();
> > + /* Register this ClassLoader as not a user-defined
ClassLoader. */
> > +
VM.registerNotUserDefinedClassLoader(PlatformClassLoader.class);
> > }
> >
> > PlatformClassLoader(BootClassLoader parent) {
> > @@ -142,6 +149,8 @@
> > static {
> > if (!ClassLoader.registerAsParallelCapable())
> > throw new InternalError();
> > + /* Register this ClassLoader as not a user-defined
ClassLoader. */
> > +
VM.registerNotUserDefinedClassLoader(AppClassLoader.class);
> > }
> >
> > final URLClassPath ucp;
> > diff --git a/src/java.base/share/classes/jdk/internal/misc/VM.java
b/src/java.base/share/classes/jdk/internal/misc/VM.java
> > --- a/src/java.base/share/classes/jdk/internal/misc/VM.java
> > +++ b/src/java.base/share/classes/jdk/internal/misc/VM.java
> > @@ -304,12 +304,44 @@
> > private static final int
JVMTI_THREAD_STATE_WAITING_INDEFINITELY = 0x0010;
> > private static final int
JVMTI_THREAD_STATE_WAITING_WITH_TIMEOUT = 0x0020;
> >
> > + /** A registry of not user-defined ClassLoaders. */
> > + private static final List<Class<? extends ClassLoader>>
notUserDefinedClassLoaderRegistry =
> > +
Collections.synchronizedList(new ArrayList<>());
> > +
> > + /** Register a ClassLoader class as being not a user-defined
ClassLoader. */
> > + public static void registerNotUserDefinedClassLoader(Class<?
extends ClassLoader> classLoaderClass) {
> > + notUserDefinedClassLoaderRegistry.add(classLoaderClass);
> > + }
> > +
> > + /** A flag for whether a user-defined ClassLoaders has been
constructed. */
> > + private static volatile boolean
constructedUserDefinedClassLoaderFlag = false;
> > +
> > + /**
> > + * Note a ClassLoader construction, and record if a
> > + * user-defined ClassLoader has been constructed.
> > + */
> > + public static void noteClassLoaderConstruction(Class<?
extends ClassLoader> classLoaderClass) {
> > + /* Check if the ClassLoader class not a user-defined
ClassLoader. */
> > + if
(notUserDefinedClassLoaderRegistry.contains(classLoaderClass)) {
> > + return;
> > + }
> > + constructedUserDefinedClassLoaderFlag = true;
> > + return;
> > + }
> > +
> > /*
> > * Returns the first user-defined class loader up the
execution stack,
> > * or the platform class loader if only code from the
platform or
> > * bootstrap class loader is on the stack.
> > */
> > public static ClassLoader latestUserDefinedLoader() {
> > + if (!constructedUserDefinedClassLoaderFlag) {
> > + /*
> > + * If no user-defined ClassLoader has been
constructed,
> > + * then I will not find a user-defined ClassLoader in
the stack.
> > + */
> > + return ClassLoader.getSystemClassLoader();
> > + }
> > ClassLoader loader = latestUserDefinedLoader0();
> > return loader != null ? loader :
ClassLoader.getPlatformClassLoader();
> > }
> > diff --git
a/src/java.base/share/classes/jdk/internal/reflect/ClassDefiner.java
b/src/java.base/share/classes/jdk/internal/reflect/ClassDefiner.java
> > ---
a/src/java.base/share/classes/jdk/internal/reflect/ClassDefiner.java
> > +++
b/src/java.base/share/classes/jdk/internal/reflect/ClassDefiner.java
> > @@ -30,6 +30,7 @@
> >
> > import jdk.internal.access.JavaLangAccess;
> > import jdk.internal.access.SharedSecrets;
> > +import jdk.internal.misc.VM;
> >
> > /** Utility class which assists in calling defineClass() by
> > creating a new class loader which delegates to the one
needed in
> > @@ -73,4 +74,9 @@
> > DelegatingClassLoader(ClassLoader parent) {
> > super(parent);
> > }
> > +
> > + static {
> > + /* Register this ClassLoader as not a user-defined
ClassLoader. */
> > +
VM.registerNotUserDefinedClassLoader(DelegatingClassLoader.class);
> > + }
> > }
> >
>
>