If Multi-Release specified a list of versions instead if just true|false, then scanning would not be needed at all.
Would require a spec update and updates to ecosystem build tools etc, but perhaps a better choice in the long run since any kind of scanning will be linear to to number of entries. Eirik. On Sat, Apr 11, 2020 at 11:54 PM Eirik Bjørsnøs <eir...@gmail.com> wrote: > Claes, > > That trick did occur to me before I reported the up front costs. > > Without this optimization, my tests shows more like ~1050 microseconds > instead of ~850. > > Perhaps further improvements is possible by extracting versions from the > binary representation instead of going via UTF for all names. > > But I'm not sure it would be worth it. I don't feel I have a clear > understanding about how "bad" this up front cost really is compared to the > later wins in the real world. > > Performance is hard to reason about :-) > > Eirik. > > On Sat, Apr 11, 2020 at 11:32 PM Claes Redestad <claes.redes...@oracle.com> > wrote: > >> Hi Eirik, >> >> interesting idea. >> >> I think you could tune away a significant part of that up front cost by >> using JUZFA.entryNameStream(this) instead of >> this.stream().map(ZipEntry::getName). This will avoid expanding each >> entry into a JarEntry internally. Perhaps this gets the up-front >> overhead down to more acceptable levels..? >> >> /Claes >> >> On 2020-04-11 22:06, Eirik Bjørsnøs wrote: >> > There's an added up-front cost to scanning versions which I have also >> tried >> > to test. >> > >> > Since checkForSpecialAttributes is lazy, this can be tested by measuring >> > the time taken to read the first entry of an open JarFile. For the h2 >> jar >> > file, this seems to take ~350 microseconds without my patch, which >> > increases to ~850 microseconds with the patch. >> > >> > This cost applies to the first getEntry call and is then amortized over >> all >> > following calls. >> > >> > So this patch is probably not a win for use cases where very few entries >> > are read. >> > >> > Eirik. >> > >> > On Sat, Apr 11, 2020 at 9:10 PM Eirik Bjørsnøs <eir...@gmail.com> >> wrote: >> > >> >> >> >> Lance, >> >> >> >> I made a small performance test. Pretty sloppy, so please don't tell >> >> Aleksey S :-) >> >> >> >> Results indicate there may be some performance wins to be had. >> >> >> >> The test uses the Maven artifact com.h2database:h2:1.4.200:jar. This >> jar >> >> which has 950 entries, of which the following three are versioned: >> >> >> >> META-INF/versions/10/org/h2/util/NetUtils2.class >> >> META-INF/versions/9/org/h2/util/Bits.class >> >> META-INF/versions/9/org/h2/util/CurrentTimestamp.class >> >> >> >> The performance test calls JarFile.getEntry for each of the base names >> >> found in the jar. It does so 2000 times for 50 iterations and >> calculates >> >> the average run time. >> >> >> >> This is done once on a JarFile opened with runtime version 15, once on >> a >> >> JarFile opened with runtime version 8 (which effectively disables >> versioned >> >> lookup so works as a baseline). Warmup runs are run first to get stable >> >> results. >> >> >> >> The test is run with OpenJDK 15 built from master. >> >> >> >> Results: >> >> >> >> Average time to get 950 entries 2000 times: >> >> >> >> Runtime version 15: 2903 ms >> >> Runtime version 8: 336 ms: >> >> >> >> This is shows the difference between testing seven versions (9, 10, 11, >> >> 12, 13, 14, 15) and not testing versions. >> >> >> >> I then made a change to JarFile which scans the versions up front and >> >> stores them in an int[] which is then looped over in getVersionedEntry. >> >> >> >> Results: >> >> >> >> Runtime version 15: 1048 ms >> >> Runtime version 8: 315 ms: >> >> >> >> My benchmark is of course synthetic and does not represent reality. I >> have >> >> not done any analysis on the shape of typical multi-versioned jars nor >> >> their access patterns. >> >> >> >> However, an improvement of 2.5 - 3x is maybe worth taking a closer >> look? >> >> >> >> Here's the patch for my change in JarFile.java: >> >> >> >> Index: src/java.base/share/classes/java/util/jar/JarFile.java >> >> IDEA additional info: >> >> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP >> >> <+>UTF-8 >> >> =================================================================== >> >> --- src/java.base/share/classes/java/util/jar/JarFile.java (revision >> >> 86722cb038d3030c51f3268799a2c3dc0c508638) >> >> +++ src/java.base/share/classes/java/util/jar/JarFile.java (date >> >> 1586631626679) >> >> @@ -161,7 +161,7 @@ >> >> private final Runtime.Version version; // current version >> >> private final int versionFeature; // version.feature() >> >> private boolean isMultiRelease; // is jar multi-release? >> >> - >> >> + private int[] versions; // which versions does the >> >> jar contain >> >> // indicates if Class-Path attribute present >> >> private boolean hasClassPathAttribute; >> >> // true if manifest checked for special attributes >> >> @@ -599,12 +599,13 @@ >> >> } >> >> >> >> private JarEntry getVersionedEntry(String name, JarEntry je) { >> >> - if (BASE_VERSION_FEATURE < versionFeature) { >> >> + int[] versions = this.versions; >> >> + if (BASE_VERSION_FEATURE < versionFeature && versions != null >> && >> >> versions.length > 0) { >> >> if (!name.startsWith(META_INF)) { >> >> // search for versioned entry >> >> - int v = versionFeature; >> >> - while (v > BASE_VERSION_FEATURE) { >> >> - JarFileEntry vje = getEntry0(META_INF_VERSIONS + >> v + >> >> "/" + name); >> >> + int v = versions.length - 1; >> >> + while (v >= 0) { >> >> + JarFileEntry vje = getEntry0(META_INF_VERSIONS + >> >> versions[v] + "/" + name); >> >> if (vje != null) { >> >> return vje.withBasename(name); >> >> } >> >> @@ -1016,9 +1017,20 @@ >> >> byte[] lbuf = new byte[512]; >> >> Attributes attr = new Attributes(); >> >> attr.read(new Manifest.FastInputStream( >> >> - new ByteArrayInputStream(b)), lbuf); >> >> - isMultiRelease = Boolean.parseBoolean( >> >> - >> attr.getValue(Attributes.Name.MULTI_RELEASE)); >> >> + new ByteArrayInputStream(b)), lbuf); >> >> + if(Boolean.parseBoolean( >> >> + >> >> attr.getValue(Attributes.Name.MULTI_RELEASE))) { >> >> + isMultiRelease = true; >> >> + versions = this.stream() >> >> + .map(ZipEntry::getName) >> >> + .mapToInt(this::parseVersion) >> >> + .filter(v -> v != -1 && v >= >> >> BASE_VERSION_FEATURE && v <= versionFeature) >> >> + .distinct() >> >> + .sorted() >> >> + .toArray(); >> >> + >> >> + } >> >> + >> >> } >> >> } >> >> } >> >> @@ -1026,6 +1038,27 @@ >> >> } >> >> } >> >> >> >> + /** >> >> + * If {@code entryName} is a a versioned entry, parse and return >> the >> >> version as an integer, otherwise return -1 >> >> + */ >> >> + private int parseVersion(String entryName) { >> >> + if(!entryName.startsWith(META_INF_VERSIONS)) { >> >> + return -1; >> >> + } >> >> + >> >> + int separator = entryName.indexOf("/", >> >> META_INF_VERSIONS.length()); >> >> + >> >> + if(separator == -1) { >> >> + return -1; >> >> + } >> >> + >> >> + try { >> >> + return Integer.parseInt(entryName, >> >> META_INF_VERSIONS.length(), separator, 10); >> >> + } catch (NumberFormatException e) { >> >> + return -1; >> >> + } >> >> + } >> >> + >> >> synchronized void ensureInitialization() { >> >> try { >> >> maybeInstantiateVerifier(); >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Fri, Apr 10, 2020 at 10:58 PM Lance Andersen < >> lance.ander...@oracle.com> >> >> wrote: >> >> >> >>> Hi Eric >> >>> >> >>> Feel free to enter a feature request and better yet propose a fix :-) >> >>> >> >>> Have a good weekend! >> >>> >> >>> Best >> >>> Lance >> >>> >> >>> On Apr 10, 2020, at 2:59 PM, Eirik Bjørsnøs <eir...@gmail.com> wrote: >> >>> >> >>> I recently needed to re-implement multi-release lookup logic for a >> >>> ModuleReader capable of reading modules from unpacked (exploded) jar >> files >> >>> [1] >> >>> >> >>> It occurred to me that JarFile.getVersionedEntry checks _every_ >> version >> >>> between 8 and the runtime version when looking up paths. >> >>> >> >>> Since META-INF/versions will probably be sparsely populated, I'm >> wondering >> >>> if something could be done to avoid checking 20 different paths in >> OpenJDK >> >>> 28. >> >>> >> >>> Perhaps scanning META-INF/versions once when opening the file could >> work, >> >>> then only check existing versions in getVersionedEntry? >> >>> >> >>> Maybe a premature optimization today, but with the new release >> cadence, >> >>> this problem is going to surface at some point in the future, right? >> >>> >> >>> [1] >> >>> >> https://mail.openjdk.java.net/pipermail/jigsaw-dev/2020-April/014414.html >> >>> >> >>> Eirik. >> >>> >> >>> >> >>> <http://oracle.com/us/design/oracle-email-sig-198324.gif> >> >>> <http://oracle.com/us/design/oracle-email-sig-198324.gif> >> >>> <http://oracle.com/us/design/oracle-email-sig-198324.gif> >> >>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>Lance >> Andersen| >> >>> Principal Member of Technical Staff | +1.781.442.2037 >> >>> Oracle Java Engineering >> >>> 1 Network Drive >> >>> Burlington, MA 01803 >> >>> lance.ander...@oracle.com >> >>> >> >>> >> >>> >> >>> >> >