Agree that declaring which versions are available in a manifest
attribute would be the best solution from a performance point of view,
since there'd be no scanning at all. Seems like an optional attribute
that should be relatively easy to add and cause no adverse effects on
unaware JDK versions.
/Claes
On 2020-04-12 00:03, Eirik Bjørsnøs wrote:
If Multi-Release specified a list of versions instead if just
true|false, then scanning would not be needed at all.
Would require a spec update and updates to ecosystem build tools etc,
but perhaps a better choice in the long run since any kind of scanning
will be linear to to number of entries.
Eirik.
On Sat, Apr 11, 2020 at 11:54 PM Eirik Bjørsnøs <eir...@gmail.com
<mailto:eir...@gmail.com>> wrote:
Claes,
That trick did occur to me before I reported the up front costs.
Without this optimization, my tests shows more like ~1050
microseconds instead of ~850.
Perhaps further improvements is possible by extracting versions from
the binary representation instead of going via UTF for all names.
But I'm not sure it would be worth it. I don't feel I have a clear
understanding about how "bad" this up front cost really is compared
to the later wins in the real world.
Performance is hard to reason about :-)
Eirik.
On Sat, Apr 11, 2020 at 11:32 PM Claes Redestad
<claes.redes...@oracle.com <mailto:claes.redes...@oracle.com>> wrote:
Hi Eirik,
interesting idea.
I think you could tune away a significant part of that up front
cost by
using JUZFA.entryNameStream(this) instead of
this.stream().map(ZipEntry::getName). This will avoid expanding each
entry into a JarEntry internally. Perhaps this gets the up-front
overhead down to more acceptable levels..?
/Claes
On 2020-04-11 22:06, Eirik Bjørsnøs wrote:
> There's an added up-front cost to scanning versions which I
have also tried
> to test.
>
> Since checkForSpecialAttributes is lazy, this can be tested
by measuring
> the time taken to read the first entry of an open JarFile.
For the h2 jar
> file, this seems to take ~350 microseconds without my patch,
which
> increases to ~850 microseconds with the patch.
>
> This cost applies to the first getEntry call and is then
amortized over all
> following calls.
>
> So this patch is probably not a win for use cases where very
few entries
> are read.
>
> Eirik.
>
> On Sat, Apr 11, 2020 at 9:10 PM Eirik Bjørsnøs
<eir...@gmail.com <mailto:eir...@gmail.com>> wrote:
>
>>
>> Lance,
>>
>> I made a small performance test. Pretty sloppy, so please
don't tell
>> Aleksey S :-)
>>
>> Results indicate there may be some performance wins to be had.
>>
>> The test uses the Maven artifact
com.h2database:h2:1.4.200:jar. This jar
>> which has 950 entries, of which the following three are
versioned:
>>
>> META-INF/versions/10/org/h2/util/NetUtils2.class
>> META-INF/versions/9/org/h2/util/Bits.class
>> META-INF/versions/9/org/h2/util/CurrentTimestamp.class
>>
>> The performance test calls JarFile.getEntry for each of the
base names
>> found in the jar. It does so 2000 times for 50 iterations
and calculates
>> the average run time.
>>
>> This is done once on a JarFile opened with runtime version
15, once on a
>> JarFile opened with runtime version 8 (which effectively
disables versioned
>> lookup so works as a baseline). Warmup runs are run first to
get stable
>> results.
>>
>> The test is run with OpenJDK 15 built from master.
>>
>> Results:
>>
>> Average time to get 950 entries 2000 times:
>>
>> Runtime version 15: 2903 ms
>> Runtime version 8: 336 ms:
>>
>> This is shows the difference between testing seven versions
(9, 10, 11,
>> 12, 13, 14, 15) and not testing versions.
>>
>> I then made a change to JarFile which scans the versions up
front and
>> stores them in an int[] which is then looped over in
getVersionedEntry.
>>
>> Results:
>>
>> Runtime version 15: 1048 ms
>> Runtime version 8: 315 ms:
>>
>> My benchmark is of course synthetic and does not represent
reality. I have
>> not done any analysis on the shape of typical
multi-versioned jars nor
>> their access patterns.
>>
>> However, an improvement of 2.5 - 3x is maybe worth taking a
closer look?
>>
>> Here's the patch for my change in JarFile.java:
>>
>> Index: src/java.base/share/classes/java/util/jar/JarFile.java
>> IDEA additional info:
>> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
>> <+>UTF-8
>>
===================================================================
>> --- src/java.base/share/classes/java/util/jar/JarFile.java
(revision
>> 86722cb038d3030c51f3268799a2c3dc0c508638)
>> +++ src/java.base/share/classes/java/util/jar/JarFile.java (date
>> 1586631626679)
>> @@ -161,7 +161,7 @@
>> private final Runtime.Version version; // current version
>> private final int versionFeature; //
version.feature()
>> private boolean isMultiRelease; // is jar
multi-release?
>> -
>> + private int[] versions; // which
versions does the
>> jar contain
>> // indicates if Class-Path attribute present
>> private boolean hasClassPathAttribute;
>> // true if manifest checked for special attributes
>> @@ -599,12 +599,13 @@
>> }
>>
>> private JarEntry getVersionedEntry(String name,
JarEntry je) {
>> - if (BASE_VERSION_FEATURE < versionFeature) {
>> + int[] versions = this.versions;
>> + if (BASE_VERSION_FEATURE < versionFeature &&
versions != null &&
>> versions.length > 0) {
>> if (!name.startsWith(META_INF)) {
>> // search for versioned entry
>> - int v = versionFeature;
>> - while (v > BASE_VERSION_FEATURE) {
>> - JarFileEntry vje =
getEntry0(META_INF_VERSIONS + v +
>> "/" + name);
>> + int v = versions.length - 1;
>> + while (v >= 0) {
>> + JarFileEntry vje =
getEntry0(META_INF_VERSIONS +
>> versions[v] + "/" + name);
>> if (vje != null) {
>> return vje.withBasename(name);
>> }
>> @@ -1016,9 +1017,20 @@
>> byte[] lbuf = new byte[512];
>> Attributes attr = new Attributes();
>> attr.read(new
Manifest.FastInputStream(
>> - new ByteArrayInputStream(b)),
lbuf);
>> - isMultiRelease = Boolean.parseBoolean(
>> -
attr.getValue(Attributes.Name.MULTI_RELEASE));
>> + new
ByteArrayInputStream(b)), lbuf);
>> + if(Boolean.parseBoolean(
>> +
>> attr.getValue(Attributes.Name.MULTI_RELEASE))) {
>> + isMultiRelease = true;
>> + versions = this.stream()
>> + .map(ZipEntry::getName)
>> +
.mapToInt(this::parseVersion)
>> + .filter(v -> v != -1 &&
v >=
>> BASE_VERSION_FEATURE && v <= versionFeature)
>> + .distinct()
>> + .sorted()
>> + .toArray();
>> +
>> + }
>> +
>> }
>> }
>> }
>> @@ -1026,6 +1038,27 @@
>> }
>> }
>>
>> + /**
>> + * If {@code entryName} is a a versioned entry, parse
and return the
>> version as an integer, otherwise return -1
>> + */
>> + private int parseVersion(String entryName) {
>> + if(!entryName.startsWith(META_INF_VERSIONS)) {
>> + return -1;
>> + }
>> +
>> + int separator = entryName.indexOf("/",
>> META_INF_VERSIONS.length());
>> +
>> + if(separator == -1) {
>> + return -1;
>> + }
>> +
>> + try {
>> + return Integer.parseInt(entryName,
>> META_INF_VERSIONS.length(), separator, 10);
>> + } catch (NumberFormatException e) {
>> + return -1;
>> + }
>> + }
>> +
>> synchronized void ensureInitialization() {
>> try {
>> maybeInstantiateVerifier();
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Fri, Apr 10, 2020 at 10:58 PM Lance Andersen
<lance.ander...@oracle.com <mailto:lance.ander...@oracle.com>>
>> wrote:
>>
>>> Hi Eric
>>>
>>> Feel free to enter a feature request and better yet propose
a fix :-)
>>>
>>> Have a good weekend!
>>>
>>> Best
>>> Lance
>>>
>>> On Apr 10, 2020, at 2:59 PM, Eirik Bjørsnøs
<eir...@gmail.com <mailto:eir...@gmail.com>> wrote:
>>>
>>> I recently needed to re-implement multi-release lookup
logic for a
>>> ModuleReader capable of reading modules from unpacked
(exploded) jar files
>>> [1]
>>>
>>> It occurred to me that JarFile.getVersionedEntry checks
_every_ version
>>> between 8 and the runtime version when looking up paths.
>>>
>>> Since META-INF/versions will probably be sparsely
populated, I'm wondering
>>> if something could be done to avoid checking 20 different
paths in OpenJDK
>>> 28.
>>>
>>> Perhaps scanning META-INF/versions once when opening the
file could work,
>>> then only check existing versions in getVersionedEntry?
>>>
>>> Maybe a premature optimization today, but with the new
release cadence,
>>> this problem is going to surface at some point in the
future, right?
>>>
>>> [1]
>>>
https://mail.openjdk.java.net/pipermail/jigsaw-dev/2020-April/014414.html
>>>
>>> Eirik.
>>>
>>>
>>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
>>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
>>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
>>>
<http://oracle.com/us/design/oracle-email-sig-198324.gif>Lance
Andersen|
>>> Principal Member of Technical Staff | +1.781.442.2037
>>> Oracle Java Engineering
>>> 1 Network Drive
>>> Burlington, MA 01803
>>> lance.ander...@oracle.com <mailto:lance.ander...@oracle.com>
>>>
>>>
>>>
>>>