Claes, That trick did occur to me before I reported the up front costs.
Without this optimization, my tests shows more like ~1050 microseconds instead of ~850. Perhaps further improvements is possible by extracting versions from the binary representation instead of going via UTF for all names. But I'm not sure it would be worth it. I don't feel I have a clear understanding about how "bad" this up front cost really is compared to the later wins in the real world. Performance is hard to reason about :-) Eirik. On Sat, Apr 11, 2020 at 11:32 PM Claes Redestad <claes.redes...@oracle.com> wrote: > Hi Eirik, > > interesting idea. > > I think you could tune away a significant part of that up front cost by > using JUZFA.entryNameStream(this) instead of > this.stream().map(ZipEntry::getName). This will avoid expanding each > entry into a JarEntry internally. Perhaps this gets the up-front > overhead down to more acceptable levels..? > > /Claes > > On 2020-04-11 22:06, Eirik Bjørsnøs wrote: > > There's an added up-front cost to scanning versions which I have also > tried > > to test. > > > > Since checkForSpecialAttributes is lazy, this can be tested by measuring > > the time taken to read the first entry of an open JarFile. For the h2 jar > > file, this seems to take ~350 microseconds without my patch, which > > increases to ~850 microseconds with the patch. > > > > This cost applies to the first getEntry call and is then amortized over > all > > following calls. > > > > So this patch is probably not a win for use cases where very few entries > > are read. > > > > Eirik. > > > > On Sat, Apr 11, 2020 at 9:10 PM Eirik Bjørsnøs <eir...@gmail.com> wrote: > > > >> > >> Lance, > >> > >> I made a small performance test. Pretty sloppy, so please don't tell > >> Aleksey S :-) > >> > >> Results indicate there may be some performance wins to be had. > >> > >> The test uses the Maven artifact com.h2database:h2:1.4.200:jar. This jar > >> which has 950 entries, of which the following three are versioned: > >> > >> META-INF/versions/10/org/h2/util/NetUtils2.class > >> META-INF/versions/9/org/h2/util/Bits.class > >> META-INF/versions/9/org/h2/util/CurrentTimestamp.class > >> > >> The performance test calls JarFile.getEntry for each of the base names > >> found in the jar. It does so 2000 times for 50 iterations and calculates > >> the average run time. > >> > >> This is done once on a JarFile opened with runtime version 15, once on a > >> JarFile opened with runtime version 8 (which effectively disables > versioned > >> lookup so works as a baseline). Warmup runs are run first to get stable > >> results. > >> > >> The test is run with OpenJDK 15 built from master. > >> > >> Results: > >> > >> Average time to get 950 entries 2000 times: > >> > >> Runtime version 15: 2903 ms > >> Runtime version 8: 336 ms: > >> > >> This is shows the difference between testing seven versions (9, 10, 11, > >> 12, 13, 14, 15) and not testing versions. > >> > >> I then made a change to JarFile which scans the versions up front and > >> stores them in an int[] which is then looped over in getVersionedEntry. > >> > >> Results: > >> > >> Runtime version 15: 1048 ms > >> Runtime version 8: 315 ms: > >> > >> My benchmark is of course synthetic and does not represent reality. I > have > >> not done any analysis on the shape of typical multi-versioned jars nor > >> their access patterns. > >> > >> However, an improvement of 2.5 - 3x is maybe worth taking a closer look? > >> > >> Here's the patch for my change in JarFile.java: > >> > >> Index: src/java.base/share/classes/java/util/jar/JarFile.java > >> IDEA additional info: > >> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP > >> <+>UTF-8 > >> =================================================================== > >> --- src/java.base/share/classes/java/util/jar/JarFile.java (revision > >> 86722cb038d3030c51f3268799a2c3dc0c508638) > >> +++ src/java.base/share/classes/java/util/jar/JarFile.java (date > >> 1586631626679) > >> @@ -161,7 +161,7 @@ > >> private final Runtime.Version version; // current version > >> private final int versionFeature; // version.feature() > >> private boolean isMultiRelease; // is jar multi-release? > >> - > >> + private int[] versions; // which versions does the > >> jar contain > >> // indicates if Class-Path attribute present > >> private boolean hasClassPathAttribute; > >> // true if manifest checked for special attributes > >> @@ -599,12 +599,13 @@ > >> } > >> > >> private JarEntry getVersionedEntry(String name, JarEntry je) { > >> - if (BASE_VERSION_FEATURE < versionFeature) { > >> + int[] versions = this.versions; > >> + if (BASE_VERSION_FEATURE < versionFeature && versions != null > && > >> versions.length > 0) { > >> if (!name.startsWith(META_INF)) { > >> // search for versioned entry > >> - int v = versionFeature; > >> - while (v > BASE_VERSION_FEATURE) { > >> - JarFileEntry vje = getEntry0(META_INF_VERSIONS + v > + > >> "/" + name); > >> + int v = versions.length - 1; > >> + while (v >= 0) { > >> + JarFileEntry vje = getEntry0(META_INF_VERSIONS + > >> versions[v] + "/" + name); > >> if (vje != null) { > >> return vje.withBasename(name); > >> } > >> @@ -1016,9 +1017,20 @@ > >> byte[] lbuf = new byte[512]; > >> Attributes attr = new Attributes(); > >> attr.read(new Manifest.FastInputStream( > >> - new ByteArrayInputStream(b)), lbuf); > >> - isMultiRelease = Boolean.parseBoolean( > >> - > attr.getValue(Attributes.Name.MULTI_RELEASE)); > >> + new ByteArrayInputStream(b)), lbuf); > >> + if(Boolean.parseBoolean( > >> + > >> attr.getValue(Attributes.Name.MULTI_RELEASE))) { > >> + isMultiRelease = true; > >> + versions = this.stream() > >> + .map(ZipEntry::getName) > >> + .mapToInt(this::parseVersion) > >> + .filter(v -> v != -1 && v >= > >> BASE_VERSION_FEATURE && v <= versionFeature) > >> + .distinct() > >> + .sorted() > >> + .toArray(); > >> + > >> + } > >> + > >> } > >> } > >> } > >> @@ -1026,6 +1038,27 @@ > >> } > >> } > >> > >> + /** > >> + * If {@code entryName} is a a versioned entry, parse and return > the > >> version as an integer, otherwise return -1 > >> + */ > >> + private int parseVersion(String entryName) { > >> + if(!entryName.startsWith(META_INF_VERSIONS)) { > >> + return -1; > >> + } > >> + > >> + int separator = entryName.indexOf("/", > >> META_INF_VERSIONS.length()); > >> + > >> + if(separator == -1) { > >> + return -1; > >> + } > >> + > >> + try { > >> + return Integer.parseInt(entryName, > >> META_INF_VERSIONS.length(), separator, 10); > >> + } catch (NumberFormatException e) { > >> + return -1; > >> + } > >> + } > >> + > >> synchronized void ensureInitialization() { > >> try { > >> maybeInstantiateVerifier(); > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> On Fri, Apr 10, 2020 at 10:58 PM Lance Andersen < > lance.ander...@oracle.com> > >> wrote: > >> > >>> Hi Eric > >>> > >>> Feel free to enter a feature request and better yet propose a fix :-) > >>> > >>> Have a good weekend! > >>> > >>> Best > >>> Lance > >>> > >>> On Apr 10, 2020, at 2:59 PM, Eirik Bjørsnøs <eir...@gmail.com> wrote: > >>> > >>> I recently needed to re-implement multi-release lookup logic for a > >>> ModuleReader capable of reading modules from unpacked (exploded) jar > files > >>> [1] > >>> > >>> It occurred to me that JarFile.getVersionedEntry checks _every_ version > >>> between 8 and the runtime version when looking up paths. > >>> > >>> Since META-INF/versions will probably be sparsely populated, I'm > wondering > >>> if something could be done to avoid checking 20 different paths in > OpenJDK > >>> 28. > >>> > >>> Perhaps scanning META-INF/versions once when opening the file could > work, > >>> then only check existing versions in getVersionedEntry? > >>> > >>> Maybe a premature optimization today, but with the new release cadence, > >>> this problem is going to surface at some point in the future, right? > >>> > >>> [1] > >>> > https://mail.openjdk.java.net/pipermail/jigsaw-dev/2020-April/014414.html > >>> > >>> Eirik. > >>> > >>> > >>> <http://oracle.com/us/design/oracle-email-sig-198324.gif> > >>> <http://oracle.com/us/design/oracle-email-sig-198324.gif> > >>> <http://oracle.com/us/design/oracle-email-sig-198324.gif> > >>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>Lance > Andersen| > >>> Principal Member of Technical Staff | +1.781.442.2037 > >>> Oracle Java Engineering > >>> 1 Network Drive > >>> Burlington, MA 01803 > >>> lance.ander...@oracle.com > >>> > >>> > >>> > >>> >