Steve,
I would assume the difference of the webrev.04 "old" iterator and the
webrev.06 "new" iterator
that might trigger a performance is how you create the JarFileEntry. The
one parameter constructor
invokes isMultiRelease(), which might be relatively expensive, when the
"mr" is enabled. There
are couple "if" checks involved.
For the "new" iterator is slower than the current jdk9 one. It might be
desired to have two JarEntryIterator
classes defined, one for "notVersioned", one for the "versioned". Of
course keep the two parameter
constructor for the "notVersioned". This might bring performance back
for the normal "notVersioned"
usage, which I would assume is the majority.
There is also a "bug" in the new iterator. The iterator/Enumeration
returned from ZipFile.entries()
checks "ensureOpen() in both hasNext() and next(). So close() the
ZipFile after itr.hasNext() will
fails the next next() invocation in the old implementation. The latest
itr will returns the cached
"ze". Not a big issue for sure, but a kind of "regression". Defining two
iterator classes as suggested
above will also workaround this issue, as the "notVersioned" one will
work just as expected, no
regression.
-Sherman
On 2/9/16 5:04 PM, Steve Drach wrote:
Hi,
Yet another webrev,
http://cr.openjdk.java.net/~sdrach/8132734/webrev.06/
<http://cr.openjdk.java.net/%7Esdrach/8132734/webrev.06/>, with a
change to JarEntryIterator to fix a problem discovered by performance
tests — calling hasNext() twice as often as needed. I also removed
the @since 9 tags from the methods entries() and stream(), and added
an additional sentence to the spec for each of those methods that
clarifies what a base entry is (actually is not).
I think having stream and entries do this is right although I would
like to see some performance data if possible.
See
http://cr.openjdk.java.net/~sdrach/8132734/JarFile%20Performance.pdf
<http://cr.openjdk.java.net/%7Esdrach/8132734/JarFile%20Performance.pdf>
I used JMH to run the benchmark. See
http://cr.openjdk.java.net/~sdrach/8132734/MyBenchmark.java
<http://cr.openjdk.java.net/%7Esdrach/8132734/MyBenchmark.java>. I
used two jar files, the rt.jar file from JDK 8 that has 20653 entries
and the multi-release.jar found in the test directory with 14 entries.
Obviously rt.jar is not a multi-release jar file.
The first two tables (1 and 2) are comparable and the second two
tables are somewhat comparable (3 and 4).
Tables 1 and 2 have 4 sections that show the results of tests on the
two jar files in 4 configurations of JarFile. The tests were done
with a JarFile object constructed without the Release object argument,
essentially the legacy constructor. The section labeled "JDK 8
JarFile” was done with JDK 8u66. The section labeled “JDK 9 JarFile”
was done with the latest build of openjdk/jdk9/dev without any changes
in my 8132734 changeset. I chose this section as the reference, so
the last column shows the values normalized to 1 micro/milli second
per operation (rt.jar times are in milliseconds and multi-release.jar
times are in microseconds). It should be obvious that JDK 9 is much
faster than JDK 8, somewhere on the order of 5-6 times faster. I
think that is because ZipFile doesn’t use JNI in JDK 9.
Of the two remaining sections in Tables 1 and 2, the section labeled
“MultiRelease JarFile” differs from the section labeled “MultiRelease
JarFile, new iterator” only in the JarEntryIterator class. The first
one uses the original iterator in JarFile.java that can be seen
starting with line 551 of webrev.04
<http://cr.openjdk.java.net/%7Esdrach/8132734/webrev.04/>, and the new
iterator starts with line 553 of webrev.06
<http://cr.openjdk.java.net/%7Esdrach/8132734/webrev.06/>. The
results are strange. The new, more complex, iterator appears to be
faster than the old, simpler, iterator. I double, and triple, checked
it, but it was always faster. I used jitwatch to look at the hotspot
logs generated during compilation and neither method was compiled. I
suppose I could dig into it further but decided not to. Consider it
good news. The results do show that the multi-release enhancement
slows JarFile entries/stream down by 2-18% depending on the size of
the jar file. But they are still much better than the JDK 8 values.
Also I would expect that if a JAR file is not mult-release but the
library opens it with Release.RUNTIME to perform the same as opening
the JAR file with the Release-less constructors.
The results in Table 3 attempts to answer this question since rt.jar
is not a multi-release jar file. This tells me that if one opens the
JarFile with Release.RUNTIME, that there is a performance penalty of
2-6% on this very large jar file.
Finally, the results in Table 4 tell me that processing a true
multi-release jar file takes about 80% more time per entries() or
stream() operation. I’ve looked at this in a profiler and there is no
particular area that stands out to me, it’s just more complicated to
process a multi-release jar file, as would be expected.
I think the javadoc will need to also need to make it clear whether
entries with names starting with META-INF/versions/ are returned.
Fixed.
I see you've added @since 9 to the existing methods, I assume you
didn't mean to do this.
Fixed.
At some point then we need to discuss how RUNTIME_VERSION is
computed. Iris (via Mandy) has pushed jdk.Version to jdk9/dev but
having it exported by java.base conflicts with the design principles
in JEP 200. Moving it to another module means that code in java.base
cannot use it and thus the JAR file can't use it.
Left as is.