This is still on my list to review will get to it in the next day or so

On Feb 7, 2023, at 8:23 AM, Eirik Bjorsnos 
<d...@openjdk.org<mailto:d...@openjdk.org>> wrote:

After finding a hash match, getEntryPos needs to compare the lookup name up to 
the encoded entry name in the CEN. This comparison is done by decoding the 
entry name into a String. The names can then be compared using the String API. 
This decoding step adds a significat cost to this method.

This PR suggest to update the string comparison such that in the common case 
where  both the lookup name and the entry name are encoded in ASCII-compatible 
UTF-8,  decoding can be avoided and the byte arrays can instead be compared 
direcly.

ZipCoder is updated with a new method to compare a string with an encoded byte 
array range.  The default implementation decodes to string (like the current 
code), while the UTF-8 implementation uses JavaLangAccess.getBytesNoRepl to get 
the  bytes. Both methods thes uses Arrays.mismatch for comparison with or 
without matching trailing slashes.

Additionally, this PR suggest to make the following updates to getEntryPos:

- The try/catch for IAE is redundant and can be safely removed. (initCEN 
already checks this and will throws IAE for invalid UTF-8). This seems to give 
a 3-4% speedup on micros)
- A new test InvalidBytesInEntryNameOrComment is a added to verify that initCEN 
does in fact reject invalid UTF-8 in CEN file names and comments. (I found no 
existing test coverage for this)
- The recursion when looking for "name/" matches is replaced with iteration. We 
keep track of any "name/" match and return it at the end of the search. (I feel 
this is easier to follow and it also gives a ~30% speedup for addSlash lookups 
with no regression on regular lookups)

(My though is that including these additional updates in this PR might reduce 
reviewer overhead given that it touches the exact same code. I might be wrong 
on this, please advise :)

I'm seeing a ~17% saving on the micro ZipFileGetEntry.getEntryHit (modified to 
use xalan.jar):

Baseline:

Benchmark                             (size)  Mode  Cnt    Score   Error  Units
ZipFileGetEntry.getEntryHit              512  avgt   15   74.941 ± 1.004  ns/op
ZipFileGetEntry.getEntryHit             1024  avgt   15   84.943 ± 1.320  ns/op
ZipFileGetEntry.getEntryHitUncached      512  avgt   15  120.371 ± 2.386  ns/op
ZipFileGetEntry.getEntryHitUncached     1024  avgt   15  126.128 ± 1.075  ns/op
ZipFileGetEntry.getEntryMiss             512  avgt   15   23.818 ± 0.838  ns/op
ZipFileGetEntry.getEntryMiss            1024  avgt   15   29.762 ± 5.998  ns/op
ZipFileGetEntry.getEntryMissUncached     512  avgt   15   59.405 ± 0.545  ns/op
ZipFileGetEntry.getEntryMissUncached    1024  avgt   15   71.840 ± 2.455  ns/op
ZipFileGetEntry.getEntrySlash            512  avgt   15  135.621 ± 4.341  ns/op
ZipFileGetEntry.getEntrySlash           1024  avgt   15  134.190 ± 2.141  ns/op



PR:


Benchmark                             (size)  Mode  Cnt    Score   Error  Units
ZipFileGetEntry.getEntryHit              512  avgt   15   62.267 ± 1.329  ns/op
ZipFileGetEntry.getEntryHit             1024  avgt   15   72.916 ± 2.428  ns/op
ZipFileGetEntry.getEntryHitUncached      512  avgt   15  101.630 ± 1.154  ns/op
ZipFileGetEntry.getEntryHitUncached     1024  avgt   15  113.161 ± 0.502  ns/op
ZipFileGetEntry.getEntryMiss             512  avgt   15   23.003 ± 1.191  ns/op
ZipFileGetEntry.getEntryMiss            1024  avgt   15   23.236 ± 1.114  ns/op
ZipFileGetEntry.getEntryMissUncached     512  avgt   15   56.781 ± 1.505  ns/op
ZipFileGetEntry.getEntryMissUncached    1024  avgt   15   67.767 ± 1.963  ns/op
ZipFileGetEntry.getEntrySlash            512  avgt   15   73.745 ± 2.717  ns/op
ZipFileGetEntry.getEntrySlash           1024  avgt   15   75.784 ± 1.051  ns/op


To assess the impact on startup/warmup, I made a main method class which 
measures the total time of calling ZipFile.getEntry for all entries in the 109 
JAR file dependenies of spring-petclinic. The shows a nice improvement (time in 
micros):


Percentile Baseline Patch
50 %          23155 21149
75 %          23598 21454
90 %          23989 21691
95 %          24238 21973
99 %          25270 22446
STDEV           792   549
Count           500   500

Eirik Bjorsnos has updated the pull request with a new target base due to a 
merge or a rebase. The incremental webrev excludes the unrelated changes 
brought in by the merge/rebase. The pull request contains 20 additional commits 
since the last revision:

- Merge branch 'master' into getentrypos-prefixmatch
- Fix incorrect offset for the CEN "length of extra field". Fixed spelling of 
"invalid".
- Spelling fix in test data for non-ascii latin1 string
- Replace new shared secret with using getBytesNoRepl in UTF8ZipCoder.compare. 
Add a test case for UTF-8 encoded entry name which is latin1, but not ASCII
- Rename test to InvalidBytesInEntryNameOrComment
- Adjust whitespace
- Some minor improvements to code comments
- Micros seem to indicate that the range checks in Arrays.mismatch might have 
as much as 5% regression
- Move String/byte[] comparison into ZipCoder. Change the default 
implementation to decode to string for comparison instead of encoding to bytes, 
this seems safer. Revert some changes from previous commits to parameters in 
the hasTrailingSlash method.
- ByteBuffers for reading ZIP files must be little-endian
- ... and 10 more: https://git.openjdk.org/jdk/compare/23f44474...06047377

-------------

Changes:
 - all: https://git.openjdk.org/jdk/pull/12290/files
 - new: https://git.openjdk.org/jdk/pull/12290/files/2c5e7c2c..06047377

Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=12290&range=03
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=12290&range=02-03

 Stats: 22651 lines in 906 files changed: 9116 ins; 4377 del; 9158 mod
 Patch: https://git.openjdk.org/jdk/pull/12290.diff
 Fetch: git fetch https://git.openjdk.org/jdk pull/12290/head:pull/12290

PR: https://git.openjdk.org/jdk/pull/12290

[cid:E1C4E2F0-ECD0-4C9D-ADB4-B16CA7BCB7FC@home]






Lance Andersen | Principal Member of Technical Staff | +1.781.442.2037
Oracle Java Engineering
1 Network Drive
Burlington, MA 01803
lance.ander...@oracle.com<mailto:lance.ander...@oracle.com>



Reply via email to