Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v7]
On Thu, 9 Feb 2023 11:46:42 GMT, Eirik Bjorsnos wrote: > > In addition to - or instead of - an `equals` shortcut then if coders are > > the same we could use `ArraysSupport.mismatch` which should get similar > > speed and help more generally. > > ..and if String had (an optimized) mismatch method, then I bet all or most of > the comparison methods (equals, compareTo, endsWith, startsWith, > regionMatches) could delegate to that :-) A private mismatch method might make sense. A public method would require making a stronger case, I think, e.g. showing use cases a mismatcher would solve (elegantly, performantly) that can't be expressed with existing methods. - PR: https://git.openjdk.org/jdk/pull/12290
Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v7]
On Thu, 9 Feb 2023 11:39:18 GMT, Claes Redestad wrote: > Yes, I'll file a PR to see if we can make `startsWith` a bit sharper Thanks! I pushed the change to ZipCoder.compare. > In addition to - or instead of - an `equals` shortcut then if coders are the > same we could use `ArraysSupport.mismatch` which should get similar speed and > help more generally. ..and if String had (an optimized) mismatch method, then I bet all or most of the comparison methods (equals, compareTo, endsWith, startsWith, regionMatches) could delegate to that :-) - PR: https://git.openjdk.org/jdk/pull/12290
Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v7]
On Thu, 9 Feb 2023 11:17:54 GMT, Eirik Bjorsnos wrote: > It helps... a lot! I guess an update to String.startsWith should be handled separately from this PR. If startsWith will be optimized, then I'm happy to update this PR to use startsWith as you suggested. Then ZipCoder will speed up for free with any later startsWith optimization. - PR: https://git.openjdk.org/jdk/pull/12290
Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v7]
On Thu, 9 Feb 2023 11:07:17 GMT, Claes Redestad wrote: > Could be interesting to see if special-casing `startsWith` to call `equals` > when possible would help: It helps... a lot! Benchmark (size) Mode CntScore Error Units ZipFileGetEntry.getEntryHit 512 avgt 15 72.800 ± 3.426 ns/op ZipFileGetEntry.getEntryHit 1024 avgt 15 73.434 ± 0.671 ns/op ZipFileGetEntry.getEntryHitUncached 512 avgt 15 107.538 ± 1.309 ns/op ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 110.775 ± 2.252 ns/op ZipFileGetEntry.getEntryMiss 512 avgt 15 22.022 ± 0.170 ns/op ZipFileGetEntry.getEntryMiss1024 avgt 15 23.242 ± 1.569 ns/op ZipFileGetEntry.getEntryMissUncached 512 avgt 15 50.946 ± 0.458 ns/op ZipFileGetEntry.getEntryMissUncached1024 avgt 15 57.856 ± 3.021 ns/op - PR: https://git.openjdk.org/jdk/pull/12290
Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v7]
On Wed, 8 Feb 2023 16:36:16 GMT, Eirik Bjorsnos wrote: >> After finding a hash match, getEntryPos needs to compare the lookup name up >> to the encoded entry name in the CEN. This comparison is done by decoding >> the entry name into a String. The names can then be compared using the >> String API. This decoding step adds a significat cost to this method. >> >> This PR suggest to update the string comparison such that in the common case >> where both the lookup name and the entry name are encoded in >> ASCII-compatible UTF-8, decoding can be avoided and the byte arrays can >> instead be compared direcly. >> >> ZipCoder is updated with a new method to compare a string with an encoded >> byte array range. The default implementation decodes to string (like the >> current code), while the UTF-8 implementation uses >> JavaLangAccess.getBytesNoRepl to get the bytes. Both methods thes uses >> Arrays.mismatch for comparison with or without matching trailing slashes. >> >> Additionally, this PR suggest to make the following updates to getEntryPos: >> >> - The try/catch for IAE is redundant and can be safely removed. (initCEN >> already checks this and will throws IAE for invalid UTF-8). This seems to >> give a 3-4% speedup on micros) >> - A new test InvalidBytesInEntryNameOrComment is a added to verify that >> initCEN does in fact reject invalid UTF-8 in CEN file names and comments. (I >> found no existing test coverage for this) >> - The recursion when looking for "name/" matches is replaced with iteration. >> We keep track of any "name/" match and return it at the end of the search. >> (I feel this is easier to follow and it also gives a ~30% speedup for >> addSlash lookups with no regression on regular lookups) >> >> (My though is that including these additional updates in this PR might >> reduce reviewer overhead given that it touches the exact same code. I might >> be wrong on this, please advise :) >> >> I'm seeing a ~17% saving on the micro ZipFileGetEntry.getEntryHit (modified >> to use xalan.jar): >> >> Baseline: >> >> Benchmark (size) Mode CntScore Error >> Units >> ZipFileGetEntry.getEntryHit 512 avgt 15 74.941 ± 1.004 >> ns/op >> ZipFileGetEntry.getEntryHit 1024 avgt 15 84.943 ± 1.320 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 512 avgt 15 120.371 ± 2.386 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 126.128 ± 1.075 >> ns/op >> ZipFileGetEntry.getEntryMiss 512 avgt 15 23.818 ± 0.838 >> ns/op >> ZipFileGetEntry.getEntryMiss1024 avgt 15 29.762 ± 5.998 >> ns/op >> ZipFileGetEntry.getEntryMissUncached 512 avgt 15 59.405 ± 0.545 >> ns/op >> ZipFileGetEntry.getEntryMissUncached1024 avgt 15 71.840 ± 2.455 >> ns/op >> ZipFileGetEntry.getEntrySlash512 avgt 15 135.621 ± 4.341 >> ns/op >> ZipFileGetEntry.getEntrySlash 1024 avgt 15 134.190 ± 2.141 >> ns/op >> >> >> >> PR: >> >> >> Benchmark (size) Mode CntScore Error >> Units >> ZipFileGetEntry.getEntryHit 512 avgt 15 62.267 ± 1.329 >> ns/op >> ZipFileGetEntry.getEntryHit 1024 avgt 15 72.916 ± 2.428 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 512 avgt 15 101.630 ± 1.154 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 113.161 ± 0.502 >> ns/op >> ZipFileGetEntry.getEntryMiss 512 avgt 15 23.003 ± 1.191 >> ns/op >> ZipFileGetEntry.getEntryMiss1024 avgt 15 23.236 ± 1.114 >> ns/op >> ZipFileGetEntry.getEntryMissUncached 512 avgt 15 56.781 ± 1.505 >> ns/op >> ZipFileGetEntry.getEntryMissUncached1024 avgt 15 67.767 ± 1.963 >> ns/op >> ZipFileGetEntry.getEntrySlash512 avgt 15 73.745 ± 2.717 >> ns/op >> ZipFileGetEntry.getEntrySlash 1024 avgt 15 75.784 ± 1.051 >> ns/op >> >> >> To assess the impact on startup/warmup, I made a main method class which >> measures the total time of calling ZipFile.getEntry for all entries in the >> 109 JAR file dependenies of spring-petclinic. The shows a nice improvement >> (time in micros): >> >> >> Percentile Baseline Patch >> 50 % 23155 21149 >> 75 % 23598 21454 >> 90 % 23989 21691 >> 95 % 24238 21973 >> 99 % 25270 22446 >> STDEV 792 549 >> Count 500 500 > > Eirik Bjorsnos has updated the pull request incrementally with one additional > commit since the last revision: > > Update the sameHashAndLengthDirLookup to use ASCII example colliding paths, > allowing the test to run over all charsets Oh -- fun! Perhaps `startsWith` doesn't take advantage of the optimized intrinsic for `equals`. Could be interesting to see if special-casing `startsWith` to call `equals` when possible would help: diff --git
Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v7]
On Thu, 9 Feb 2023 10:05:06 GMT, Claes Redestad wrote: > Should be a win and avoids the need for a new mismatch method here. Interestingly, this is not a win: PR: Benchmark (size) Mode CntScore Error Units ZipFileGetEntry.getEntryHit 512 avgt 15 73.633 ± 4.349 ns/op ZipFileGetEntry.getEntryHit 1024 avgt 15 74.477 ± 1.667 ns/op ZipFileGetEntry.getEntryHitUncached 512 avgt 15 108.352 ± 1.598 ns/op ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 110.425 ± 1.867 ns/op ZipFileGetEntry.getEntryMiss 512 avgt 15 21.921 ± 0.251 ns/op ZipFileGetEntry.getEntryMiss1024 avgt 15 22.836 ± 0.279 ns/op ZipFileGetEntry.getEntryMissUncached 512 avgt 15 51.890 ± 2.289 ns/op ZipFileGetEntry.getEntryMissUncached1024 avgt 15 56.722 ± 0.701 ns/op Claes: BBenchmark (size) Mode CntScore Error Units ZipFileGetEntry.getEntryHit 512 avgt 15 84.518 ± 4.440 ns/op ZipFileGetEntry.getEntryHit 1024 avgt 15 85.359 ± 1.484 ns/op ZipFileGetEntry.getEntryHitUncached 512 avgt 15 117.003 ± 1.481 ns/op ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 119.026 ± 1.073 ns/op ZipFileGetEntry.getEntryMiss 512 avgt 15 22.027 ± 0.188 ns/op ZipFileGetEntry.getEntryMiss1024 avgt 15 22.929 ± 0.282 ns/op ZipFileGetEntry.getEntryMissUncached 512 avgt 15 52.513 ± 1.859 ns/op ZipFileGetEntry.getEntryMissUncached1024 avgt 15 56.425 ± 0.699 ns/op (I forced ZipCoder.compare here by simply removing UTFZipCoder.compare) - PR: https://git.openjdk.org/jdk/pull/12290
Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v7]
On Wed, 8 Feb 2023 16:36:16 GMT, Eirik Bjorsnos wrote: >> After finding a hash match, getEntryPos needs to compare the lookup name up >> to the encoded entry name in the CEN. This comparison is done by decoding >> the entry name into a String. The names can then be compared using the >> String API. This decoding step adds a significat cost to this method. >> >> This PR suggest to update the string comparison such that in the common case >> where both the lookup name and the entry name are encoded in >> ASCII-compatible UTF-8, decoding can be avoided and the byte arrays can >> instead be compared direcly. >> >> ZipCoder is updated with a new method to compare a string with an encoded >> byte array range. The default implementation decodes to string (like the >> current code), while the UTF-8 implementation uses >> JavaLangAccess.getBytesNoRepl to get the bytes. Both methods thes uses >> Arrays.mismatch for comparison with or without matching trailing slashes. >> >> Additionally, this PR suggest to make the following updates to getEntryPos: >> >> - The try/catch for IAE is redundant and can be safely removed. (initCEN >> already checks this and will throws IAE for invalid UTF-8). This seems to >> give a 3-4% speedup on micros) >> - A new test InvalidBytesInEntryNameOrComment is a added to verify that >> initCEN does in fact reject invalid UTF-8 in CEN file names and comments. (I >> found no existing test coverage for this) >> - The recursion when looking for "name/" matches is replaced with iteration. >> We keep track of any "name/" match and return it at the end of the search. >> (I feel this is easier to follow and it also gives a ~30% speedup for >> addSlash lookups with no regression on regular lookups) >> >> (My though is that including these additional updates in this PR might >> reduce reviewer overhead given that it touches the exact same code. I might >> be wrong on this, please advise :) >> >> I'm seeing a ~17% saving on the micro ZipFileGetEntry.getEntryHit (modified >> to use xalan.jar): >> >> Baseline: >> >> Benchmark (size) Mode CntScore Error >> Units >> ZipFileGetEntry.getEntryHit 512 avgt 15 74.941 ± 1.004 >> ns/op >> ZipFileGetEntry.getEntryHit 1024 avgt 15 84.943 ± 1.320 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 512 avgt 15 120.371 ± 2.386 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 126.128 ± 1.075 >> ns/op >> ZipFileGetEntry.getEntryMiss 512 avgt 15 23.818 ± 0.838 >> ns/op >> ZipFileGetEntry.getEntryMiss1024 avgt 15 29.762 ± 5.998 >> ns/op >> ZipFileGetEntry.getEntryMissUncached 512 avgt 15 59.405 ± 0.545 >> ns/op >> ZipFileGetEntry.getEntryMissUncached1024 avgt 15 71.840 ± 2.455 >> ns/op >> ZipFileGetEntry.getEntrySlash512 avgt 15 135.621 ± 4.341 >> ns/op >> ZipFileGetEntry.getEntrySlash 1024 avgt 15 134.190 ± 2.141 >> ns/op >> >> >> >> PR: >> >> >> Benchmark (size) Mode CntScore Error >> Units >> ZipFileGetEntry.getEntryHit 512 avgt 15 62.267 ± 1.329 >> ns/op >> ZipFileGetEntry.getEntryHit 1024 avgt 15 72.916 ± 2.428 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 512 avgt 15 101.630 ± 1.154 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 113.161 ± 0.502 >> ns/op >> ZipFileGetEntry.getEntryMiss 512 avgt 15 23.003 ± 1.191 >> ns/op >> ZipFileGetEntry.getEntryMiss1024 avgt 15 23.236 ± 1.114 >> ns/op >> ZipFileGetEntry.getEntryMissUncached 512 avgt 15 56.781 ± 1.505 >> ns/op >> ZipFileGetEntry.getEntryMissUncached1024 avgt 15 67.767 ± 1.963 >> ns/op >> ZipFileGetEntry.getEntrySlash512 avgt 15 73.745 ± 2.717 >> ns/op >> ZipFileGetEntry.getEntrySlash 1024 avgt 15 75.784 ± 1.051 >> ns/op >> >> >> To assess the impact on startup/warmup, I made a main method class which >> measures the total time of calling ZipFile.getEntry for all entries in the >> 109 JAR file dependenies of spring-petclinic. The shows a nice improvement >> (time in micros): >> >> >> Percentile Baseline Patch >> 50 % 23155 21149 >> 75 % 23598 21454 >> 90 % 23989 21691 >> 95 % 24238 21973 >> 99 % 25270 22446 >> STDEV 792 549 >> Count 500 500 > > Eirik Bjorsnos has updated the pull request incrementally with one additional > commit since the last revision: > > Update the sameHashAndLengthDirLookup to use ASCII example colliding paths, > allowing the test to run over all charsets You could skip the `equals`: if (decoded.startsWith(str)) { if (decoded.length() == str.length()) { return Comparison.EXACT_MATCH; } else if (addSlash &&
Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v7]
On Wed, 8 Feb 2023 16:36:16 GMT, Eirik Bjorsnos wrote: >> After finding a hash match, getEntryPos needs to compare the lookup name up >> to the encoded entry name in the CEN. This comparison is done by decoding >> the entry name into a String. The names can then be compared using the >> String API. This decoding step adds a significat cost to this method. >> >> This PR suggest to update the string comparison such that in the common case >> where both the lookup name and the entry name are encoded in >> ASCII-compatible UTF-8, decoding can be avoided and the byte arrays can >> instead be compared direcly. >> >> ZipCoder is updated with a new method to compare a string with an encoded >> byte array range. The default implementation decodes to string (like the >> current code), while the UTF-8 implementation uses >> JavaLangAccess.getBytesNoRepl to get the bytes. Both methods thes uses >> Arrays.mismatch for comparison with or without matching trailing slashes. >> >> Additionally, this PR suggest to make the following updates to getEntryPos: >> >> - The try/catch for IAE is redundant and can be safely removed. (initCEN >> already checks this and will throws IAE for invalid UTF-8). This seems to >> give a 3-4% speedup on micros) >> - A new test InvalidBytesInEntryNameOrComment is a added to verify that >> initCEN does in fact reject invalid UTF-8 in CEN file names and comments. (I >> found no existing test coverage for this) >> - The recursion when looking for "name/" matches is replaced with iteration. >> We keep track of any "name/" match and return it at the end of the search. >> (I feel this is easier to follow and it also gives a ~30% speedup for >> addSlash lookups with no regression on regular lookups) >> >> (My though is that including these additional updates in this PR might >> reduce reviewer overhead given that it touches the exact same code. I might >> be wrong on this, please advise :) >> >> I'm seeing a ~17% saving on the micro ZipFileGetEntry.getEntryHit (modified >> to use xalan.jar): >> >> Baseline: >> >> Benchmark (size) Mode CntScore Error >> Units >> ZipFileGetEntry.getEntryHit 512 avgt 15 74.941 ± 1.004 >> ns/op >> ZipFileGetEntry.getEntryHit 1024 avgt 15 84.943 ± 1.320 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 512 avgt 15 120.371 ± 2.386 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 126.128 ± 1.075 >> ns/op >> ZipFileGetEntry.getEntryMiss 512 avgt 15 23.818 ± 0.838 >> ns/op >> ZipFileGetEntry.getEntryMiss1024 avgt 15 29.762 ± 5.998 >> ns/op >> ZipFileGetEntry.getEntryMissUncached 512 avgt 15 59.405 ± 0.545 >> ns/op >> ZipFileGetEntry.getEntryMissUncached1024 avgt 15 71.840 ± 2.455 >> ns/op >> ZipFileGetEntry.getEntrySlash512 avgt 15 135.621 ± 4.341 >> ns/op >> ZipFileGetEntry.getEntrySlash 1024 avgt 15 134.190 ± 2.141 >> ns/op >> >> >> >> PR: >> >> >> Benchmark (size) Mode CntScore Error >> Units >> ZipFileGetEntry.getEntryHit 512 avgt 15 62.267 ± 1.329 >> ns/op >> ZipFileGetEntry.getEntryHit 1024 avgt 15 72.916 ± 2.428 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 512 avgt 15 101.630 ± 1.154 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 113.161 ± 0.502 >> ns/op >> ZipFileGetEntry.getEntryMiss 512 avgt 15 23.003 ± 1.191 >> ns/op >> ZipFileGetEntry.getEntryMiss1024 avgt 15 23.236 ± 1.114 >> ns/op >> ZipFileGetEntry.getEntryMissUncached 512 avgt 15 56.781 ± 1.505 >> ns/op >> ZipFileGetEntry.getEntryMissUncached1024 avgt 15 67.767 ± 1.963 >> ns/op >> ZipFileGetEntry.getEntrySlash512 avgt 15 73.745 ± 2.717 >> ns/op >> ZipFileGetEntry.getEntrySlash 1024 avgt 15 75.784 ± 1.051 >> ns/op >> >> >> To assess the impact on startup/warmup, I made a main method class which >> measures the total time of calling ZipFile.getEntry for all entries in the >> 109 JAR file dependenies of spring-petclinic. The shows a nice improvement >> (time in micros): >> >> >> Percentile Baseline Patch >> 50 % 23155 21149 >> 75 % 23598 21454 >> 90 % 23989 21691 >> 95 % 24238 21973 >> 99 % 25270 22446 >> STDEV 792 549 >> Count 500 500 > > Eirik Bjorsnos has updated the pull request incrementally with one additional > commit since the last revision: > > Update the sameHashAndLengthDirLookup to use ASCII example colliding paths, > allowing the test to run over all charsets A somewhat unrelated observation: If String had a mismatch method, this last regression would probably not have been introduced, since the logic would align better with UTF8ZipCoder which used Arrays.mismatch.
Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v7]
On Wed, 8 Feb 2023 17:11:19 GMT, Claes Redestad wrote: > Great! Extending coverage to provoke the issue on most charsets is good, and > it should guard UTF-8 from regressing too - no? It also guards UTFZipCoder from this particular regression, yes. - PR: https://git.openjdk.org/jdk/pull/12290
Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v7]
On Wed, 8 Feb 2023 16:36:16 GMT, Eirik Bjorsnos wrote: >> After finding a hash match, getEntryPos needs to compare the lookup name up >> to the encoded entry name in the CEN. This comparison is done by decoding >> the entry name into a String. The names can then be compared using the >> String API. This decoding step adds a significat cost to this method. >> >> This PR suggest to update the string comparison such that in the common case >> where both the lookup name and the entry name are encoded in >> ASCII-compatible UTF-8, decoding can be avoided and the byte arrays can >> instead be compared direcly. >> >> ZipCoder is updated with a new method to compare a string with an encoded >> byte array range. The default implementation decodes to string (like the >> current code), while the UTF-8 implementation uses >> JavaLangAccess.getBytesNoRepl to get the bytes. Both methods thes uses >> Arrays.mismatch for comparison with or without matching trailing slashes. >> >> Additionally, this PR suggest to make the following updates to getEntryPos: >> >> - The try/catch for IAE is redundant and can be safely removed. (initCEN >> already checks this and will throws IAE for invalid UTF-8). This seems to >> give a 3-4% speedup on micros) >> - A new test InvalidBytesInEntryNameOrComment is a added to verify that >> initCEN does in fact reject invalid UTF-8 in CEN file names and comments. (I >> found no existing test coverage for this) >> - The recursion when looking for "name/" matches is replaced with iteration. >> We keep track of any "name/" match and return it at the end of the search. >> (I feel this is easier to follow and it also gives a ~30% speedup for >> addSlash lookups with no regression on regular lookups) >> >> (My though is that including these additional updates in this PR might >> reduce reviewer overhead given that it touches the exact same code. I might >> be wrong on this, please advise :) >> >> I'm seeing a ~17% saving on the micro ZipFileGetEntry.getEntryHit (modified >> to use xalan.jar): >> >> Baseline: >> >> Benchmark (size) Mode CntScore Error >> Units >> ZipFileGetEntry.getEntryHit 512 avgt 15 74.941 ± 1.004 >> ns/op >> ZipFileGetEntry.getEntryHit 1024 avgt 15 84.943 ± 1.320 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 512 avgt 15 120.371 ± 2.386 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 126.128 ± 1.075 >> ns/op >> ZipFileGetEntry.getEntryMiss 512 avgt 15 23.818 ± 0.838 >> ns/op >> ZipFileGetEntry.getEntryMiss1024 avgt 15 29.762 ± 5.998 >> ns/op >> ZipFileGetEntry.getEntryMissUncached 512 avgt 15 59.405 ± 0.545 >> ns/op >> ZipFileGetEntry.getEntryMissUncached1024 avgt 15 71.840 ± 2.455 >> ns/op >> ZipFileGetEntry.getEntrySlash512 avgt 15 135.621 ± 4.341 >> ns/op >> ZipFileGetEntry.getEntrySlash 1024 avgt 15 134.190 ± 2.141 >> ns/op >> >> >> >> PR: >> >> >> Benchmark (size) Mode CntScore Error >> Units >> ZipFileGetEntry.getEntryHit 512 avgt 15 62.267 ± 1.329 >> ns/op >> ZipFileGetEntry.getEntryHit 1024 avgt 15 72.916 ± 2.428 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 512 avgt 15 101.630 ± 1.154 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 113.161 ± 0.502 >> ns/op >> ZipFileGetEntry.getEntryMiss 512 avgt 15 23.003 ± 1.191 >> ns/op >> ZipFileGetEntry.getEntryMiss1024 avgt 15 23.236 ± 1.114 >> ns/op >> ZipFileGetEntry.getEntryMissUncached 512 avgt 15 56.781 ± 1.505 >> ns/op >> ZipFileGetEntry.getEntryMissUncached1024 avgt 15 67.767 ± 1.963 >> ns/op >> ZipFileGetEntry.getEntrySlash512 avgt 15 73.745 ± 2.717 >> ns/op >> ZipFileGetEntry.getEntrySlash 1024 avgt 15 75.784 ± 1.051 >> ns/op >> >> >> To assess the impact on startup/warmup, I made a main method class which >> measures the total time of calling ZipFile.getEntry for all entries in the >> 109 JAR file dependenies of spring-petclinic. The shows a nice improvement >> (time in micros): >> >> >> Percentile Baseline Patch >> 50 % 23155 21149 >> 75 % 23598 21454 >> 90 % 23989 21691 >> 95 % 24238 21973 >> 99 % 25270 22446 >> STDEV 792 549 >> Count 500 500 > > Eirik Bjorsnos has updated the pull request incrementally with one additional > commit since the last revision: > > Update the sameHashAndLengthDirLookup to use ASCII example colliding paths, > allowing the test to run over all charsets Marked as reviewed by redestad (Reviewer). Great! Extending coverage to provoke the issue on most charsets is good, and it should guard UTF-8 from regressing too - no? - PR:
Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v7]
> After finding a hash match, getEntryPos needs to compare the lookup name up > to the encoded entry name in the CEN. This comparison is done by decoding the > entry name into a String. The names can then be compared using the String > API. This decoding step adds a significat cost to this method. > > This PR suggest to update the string comparison such that in the common case > where both the lookup name and the entry name are encoded in > ASCII-compatible UTF-8, decoding can be avoided and the byte arrays can > instead be compared direcly. > > ZipCoder is updated with a new method to compare a string with an encoded > byte array range. The default implementation decodes to string (like the > current code), while the UTF-8 implementation uses > JavaLangAccess.getBytesNoRepl to get the bytes. Both methods thes uses > Arrays.mismatch for comparison with or without matching trailing slashes. > > Additionally, this PR suggest to make the following updates to getEntryPos: > > - The try/catch for IAE is redundant and can be safely removed. (initCEN > already checks this and will throws IAE for invalid UTF-8). This seems to > give a 3-4% speedup on micros) > - A new test InvalidBytesInEntryNameOrComment is a added to verify that > initCEN does in fact reject invalid UTF-8 in CEN file names and comments. (I > found no existing test coverage for this) > - The recursion when looking for "name/" matches is replaced with iteration. > We keep track of any "name/" match and return it at the end of the search. (I > feel this is easier to follow and it also gives a ~30% speedup for addSlash > lookups with no regression on regular lookups) > > (My though is that including these additional updates in this PR might reduce > reviewer overhead given that it touches the exact same code. I might be wrong > on this, please advise :) > > I'm seeing a ~17% saving on the micro ZipFileGetEntry.getEntryHit (modified > to use xalan.jar): > > Baseline: > > Benchmark (size) Mode CntScore Error > Units > ZipFileGetEntry.getEntryHit 512 avgt 15 74.941 ± 1.004 > ns/op > ZipFileGetEntry.getEntryHit 1024 avgt 15 84.943 ± 1.320 > ns/op > ZipFileGetEntry.getEntryHitUncached 512 avgt 15 120.371 ± 2.386 > ns/op > ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 126.128 ± 1.075 > ns/op > ZipFileGetEntry.getEntryMiss 512 avgt 15 23.818 ± 0.838 > ns/op > ZipFileGetEntry.getEntryMiss1024 avgt 15 29.762 ± 5.998 > ns/op > ZipFileGetEntry.getEntryMissUncached 512 avgt 15 59.405 ± 0.545 > ns/op > ZipFileGetEntry.getEntryMissUncached1024 avgt 15 71.840 ± 2.455 > ns/op > ZipFileGetEntry.getEntrySlash512 avgt 15 135.621 ± 4.341 > ns/op > ZipFileGetEntry.getEntrySlash 1024 avgt 15 134.190 ± 2.141 > ns/op > > > > PR: > > > Benchmark (size) Mode CntScore Error > Units > ZipFileGetEntry.getEntryHit 512 avgt 15 62.267 ± 1.329 > ns/op > ZipFileGetEntry.getEntryHit 1024 avgt 15 72.916 ± 2.428 > ns/op > ZipFileGetEntry.getEntryHitUncached 512 avgt 15 101.630 ± 1.154 > ns/op > ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 113.161 ± 0.502 > ns/op > ZipFileGetEntry.getEntryMiss 512 avgt 15 23.003 ± 1.191 > ns/op > ZipFileGetEntry.getEntryMiss1024 avgt 15 23.236 ± 1.114 > ns/op > ZipFileGetEntry.getEntryMissUncached 512 avgt 15 56.781 ± 1.505 > ns/op > ZipFileGetEntry.getEntryMissUncached1024 avgt 15 67.767 ± 1.963 > ns/op > ZipFileGetEntry.getEntrySlash512 avgt 15 73.745 ± 2.717 > ns/op > ZipFileGetEntry.getEntrySlash 1024 avgt 15 75.784 ± 1.051 > ns/op > > > To assess the impact on startup/warmup, I made a main method class which > measures the total time of calling ZipFile.getEntry for all entries in the > 109 JAR file dependenies of spring-petclinic. The shows a nice improvement > (time in micros): > > > Percentile Baseline Patch > 50 % 23155 21149 > 75 % 23598 21454 > 90 % 23989 21691 > 95 % 24238 21973 > 99 % 25270 22446 > STDEV 792 549 > Count 500 500 Eirik Bjorsnos has updated the pull request incrementally with one additional commit since the last revision: Update the sameHashAndLengthDirLookup to use ASCII example colliding paths, allowing the test to run over all charsets - Changes: - all: https://git.openjdk.org/jdk/pull/12290/files - new: https://git.openjdk.org/jdk/pull/12290/files/ac7c4c7a..3cb77dba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=12290=06 - incr: https://webrevs.openjdk.org/?repo=jdk=12290=05-06 Stats: 13 lines in 1 file changed: 8 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/12290.diff