Re: RFR: JDK-8302323 Add repeat methods to StringBuilder/StringBuffer [v10]
On Mon, 27 Mar 2023 18:37:12 GMT, Jim Laskey wrote: >> Add the ability to repeatedly append char and CharSequence data to >> StringBuilder/StringBuffer. > > Jim Laskey has updated the pull request incrementally with one additional > commit since the last revision: > > Use Arrays.fill Looks ok. I don't have a strong opinion on insert-repeat. I think tests that check for illegal code points should use a legal repeat value. - Marked as reviewed by redestad (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/12728#pullrequestreview-1360764390
Re: RFR: JDK-8302323 Add repeat methods to StringBuilder/StringBuffer [v10]
On Mon, 27 Mar 2023 18:37:12 GMT, Jim Laskey wrote: >> Add the ability to repeatedly append char and CharSequence data to >> StringBuilder/StringBuffer. > > Jim Laskey has updated the pull request incrementally with one additional > commit since the last revision: > > Use Arrays.fill test/jdk/java/lang/StringBuilder/StringBuilderRepeat.java line 180: > 178: > 179: try { > 180: sb.repeat(0x10 + 1, -1); Should these tests that test invalid codepoints use a valid repeat count as to verify the IAE is correctly thrown for the out-of-bound codepoint? Tests where both params are illegal are fine, too, of course. - PR Review Comment: https://git.openjdk.org/jdk/pull/12728#discussion_r1150396671
Re: RFR: 8302818: Optimize wrapper sets and immutable sets of Enums [v5]
On Tue, 21 Feb 2023 03:39:46 GMT, Tingjun Yuan wrote: >> Currently, the two subclasses of `java.util.EnumSet` optimize bulk >> operations when the argument is also a `EnumSet`, but there is no such >> optimization for wrapper sets (returned by `Collections.unmodifiableSet`, >> `Collections.synchronizedSet`, etc.) and immutable sets (returned by >> `Set.of` methods) of `Enum`s. >> >> This PR introduces optimization classes for these situations. No public >> APIs are changed. > > Tingjun Yuan has updated the pull request incrementally with one additional > commit since the last revision: > > Set.copyOf: need defensive copy If this level of complexity is indeed needed to get whatever improvement you're after then I don't see how this can be worth its weight. Microbenchmarking might help support your case here, but assessing the potential performance costs from gradually increasing the number of classes floating around at various call sites in arbitrary applications is hard. Thus it is something we need to be very careful not to do without solid evidence. - PR Comment: https://git.openjdk.org/jdk/pull/12498#issuecomment-1478731354
Re: RFR: 8302818: Optimize wrapper sets and immutable sets of Enums [v5]
On Tue, 21 Feb 2023 03:39:46 GMT, Tingjun Yuan wrote: >> Currently, the two subclasses of `java.util.EnumSet` optimize bulk >> operations when the argument is also a `EnumSet`, but there is no such >> optimization for wrapper sets (returned by `Collections.unmodifiableSet`, >> `Collections.synchronizedSet`, etc.) and immutable sets (returned by >> `Set.of` methods) of `Enum`s. >> >> This PR introduces optimization classes for these situations. No public >> APIs are changed. > > Tingjun Yuan has updated the pull request incrementally with one additional > commit since the last revision: > > Set.copyOf: need defensive copy I'm wary of the impact of adding new wrapper classes. It may make the factory methods slightly slower due additional checks, but also risks increasing the number of classes at various call-sites which might upset call site inlining. An alternative design which would avoid adding more classes could be to add package-private accessors to the existing `Unmodifiable/Synchronized` wrapper classes so that `EnumSet/-Map` can retrieve the underlying set of an unmodifiable or synchronized `Set` or `Map` and then use it directly for these bulk operations. Then you'd contain the additional overhead to `EnumSet/-Map`. - PR Comment: https://git.openjdk.org/jdk/pull/12498#issuecomment-1478035549
Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test
On Wed, 15 Mar 2023 12:07:12 GMT, Sergey Tsypanov wrote: >> src/java.base/share/classes/java/lang/CharacterData.java line 72: >> >>> 70: >>> 71: static final CharacterData of(int ch) { >>> 72: if (ch >= 0 && ch <= 0xFF) { // fast-path >> >> Maybereducing to a single branch with a mask op helps further? Or maybe the >> JIT already effectively does that: >> >> `if (ch && 0xFF00 == 0) {` > > Btw, I think we can do the same for `StringLatin1.canEncode()` It seems reasonable to keep these two in sync, yes. (`CharacterData.of` could even call into `StringLatin1.canEncode`, unless that's cause for some performance anomaly) - PR: https://git.openjdk.org/jdk/pull/13040
Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test
On Wed, 15 Mar 2023 11:26:21 GMT, Eirik Bjorsnos wrote: > By avoiding a bit shift operation for the latin1 fast-path test, we can speed > up the `java.lang.CharacterData.of` method by ~25% for latin1 code points. > > The latin1 test is currently implemented as `ch >>> 8 == 0`. We can replace > this with `ch >= 0 && ch <= 0xFF` for a noticable performance gain > (especially for Latin1 code points): > > This method is called frequently by various property-determining methods in > `java.lang.Character` like `isLowerCase`, `isDigit` etc, so one should expect > improvements for all these methods. > > Performance is tested using the `Characters.isDigit` benchmark using the > digits '0' (decimal 48, in CharacterDataLatin1) and '\u0660' (decimal 1632, > in CharacterData00): > > Baseline: > > > Benchmark (codePoint) Mode Cnt Score Error Units > Characters.isDigit 48 avgt 15 0.870 ± 0.011 ns/op > Characters.isDigit 1632 avgt 15 2.168 ± 0.017 ns/op > > PR: > > > Benchmark (codePoint) Mode Cnt Score Error Units > Characters.isDigit 48 avgt 15 0.654 ± 0.007 ns/op > Characters.isDigit 1632 avgt 15 2.032 ± 0.019 ns/op Nice one! src/java.base/share/classes/java/lang/CharacterData.java line 72: > 70: > 71: static final CharacterData of(int ch) { > 72: if (ch >= 0 && ch <= 0xFF) { // fast-path Maybereducing to a single branch with a mask op helps further? Or maybe the JIT already effectively does that: `if (ch && 0xFF00 == 0) {` - Marked as reviewed by redestad (Reviewer). PR: https://git.openjdk.org/jdk/pull/13040
Re: RFR: JDK-8302323 Add repeat methods to StringBuilder/StringBuffer [v8]
On Fri, 3 Mar 2023 19:04:22 GMT, Jim Laskey wrote: >> Add the ability to repeatedly append char and CharSequence data to >> StringBuilder/StringBuffer. > > Jim Laskey has updated the pull request incrementally with one additional > commit since the last revision: > > Expand test for StringBuffer and illegal code points It seems reasonable to consider analogous `insert` methods. Perhaps a more consistent naming scheme would be `appendRepeated` and `insertRepeated`? (The precedent for `repeat` in `String::repeat` is there, but is kind of weak since it's clear in that case that we're not mutating internal state) src/java.base/share/classes/java/lang/AbstractStringBuilder.java line 1832: > 1830: if (isLatin1 && StringLatin1.canEncode(c)) { > 1831: byte b = (byte)c; > 1832: for (int index = this.count; index < limit; index++) { This loop could even be replaced with `Arrays.fill(value, this.count, limit, b)` - a plausible candidate for intrinsification? There's an added range check in that method, however, but that shouldn't be too hard for the JIT to eliminate. - PR: https://git.openjdk.org/jdk/pull/12728
Re: RFR: 8303480: Miscellaneous fixes to mostly invisible doc comments
On Fri, 3 Mar 2023 09:38:13 GMT, Pavel Rappo wrote: >> src/java.base/share/classes/java/lang/invoke/BootstrapMethodInvoker.java >> line 257: >> >>> 255: >>> 256: /** >>> 257: * @return true iff the BSM method type exactly matches >> >> I assume “iff” should “if”? > > Here and elsewhere in this file "iff" might mean [if and only > if](https://en.wikipedia.org/wiki/If_and_only_if), which would make sense. > (FWIW, there are a few hundred occurrences of the word "iff" in src.) > > @cl4es (Claes Redestad), as the author of those lines would you like to chime > in? > > Since Claes might read this, I note that when I changed unsupported `{@see}` > to `{@link}` thoughtout this file, my IDE could not resolve one of the links: > `java.lang.invoke.LambdaMetafactory#metafactory(MethodHandles.Lookup,String,Class,MethodType,MethodHandle,MethodType)` > > While there's a similarly-name method with slightly different parameters, I > refrained from using it: > `java.lang.invoke.LambdaMetafactory#metafactory(MethodHandles.Lookup,String,MethodType,MethodType,MethodHandle,MethodType)`. Yes, iff means if-and-only-if and is used for extra precision in formal logic, mathematics. As @pavelrappo points out it's a relatively common occurrence in the OpenJDK sources, though perhaps not in the public javadocs. Perhaps a bit pretentious, but mostly a terse way to say "return true if the BSM method type exactly matches X, otherwise false". The broken link stems from the fact that the method I was targeting (a way to use condy for lambda proxy singletons rather than a `MethodHandle.constant`) was never integrated. We'll look at either getting that done (@briangoetz suggested the time might be ready for it) or remove this currently pointless static bootstrap specialization test. - PR: https://git.openjdk.org/jdk/pull/12826
Re: RFR: 8301119: Support for GB18030-2022 [v2]
On Thu, 23 Feb 2023 08:32:29 GMT, Claes Redestad wrote: >> `Charset` class is initialized *before* system properties are set up, in >> order to check the JNU encoding (used for file path name) is a supported >> charset or not. In some OS environments, GB18030 is the native encoding so >> we need to avoid checking the system property in such a case. > > `@Stable` semantics are still fuzzy to me but the rule I've adhered to is > that back to back stores to the field - if unavoidable - needs to be > idempotent since the JIT (or AOT) may record any non-null value as a compile > time constant at any time. > > I'd write this to not update the static field if initLevel() < 1. Such calls > should be rare and only happen once on a system that has GB18030 as their > native encoding. Scratch that: as it seems to be important that we don't switch after startup then what this code is really reaching for is `static final` field semantics. Since `StandardCharsets` might be loaded very early a holder class pattern might be necessary: isGB18030_2000() { return GB18030Properties.GB18030_2000; } private static class GB18030Properties { private static final GB18030_2000 = init(); private static boolean init() { if (VM.initLevel() < 1) { // Cannot get the system property yet. Assumes non-2000 return false; } return "2000".equals(GetPropertyAction.privilegedGetProperty("jdk.charset.GB18030")); } } - PR: https://git.openjdk.org/jdk/pull/12518
Re: RFR: 8301119: Support for GB18030-2022 [v2]
On Wed, 22 Feb 2023 17:52:01 GMT, Naoto Sato wrote: >>> curious - what scenario triggers this call at initLevel < 1 ? >> >> It's not supported, but it is possible that someone might run with >> -Dfile.encoding=GB18030, in which case the default charset is used before >> the system properties are initialized in initPhase1. Checking the init level >> breaks the circularity, the only downside is that can't switch to >> GB18030-2000 at the same time. > > `Charset` class is initialized *before* system properties are set up, in > order to check the JNU encoding (used for file path name) is a supported > charset or not. In some OS environments, GB18030 is the native encoding so we > need to avoid checking the system property in such a case. `@Stable` semantics are still fuzzy to me but the rule I've adhered to is that back to back stores to the field - if unavoidable - needs to be idempotent since the JIT (or AOT) may record any non-null value as a compile time constant at any time. I'd write this to not update the static field if initLevel() < 1. Such calls should be rare and only happen once on a system that has GB18030 as their native encoding. - PR: https://git.openjdk.org/jdk/pull/12518
Re: RFR: 8302871: Speed up StringLatin1.regionMatchesCI [v10]
On Wed, 22 Feb 2023 07:11:16 GMT, Eirik Bjorsnos wrote: >> This PR suggests we can speed up `StringLatin1.regionMatchesCI` by applying >> 'the oldest ASCII trick in the book'. >> >> The new static method `CharacterDataLatin1.equalsIgnoreCase` compares two >> latin1 bytes for equality ignoring case. `StringLatin1.regionMatchesCI` is >> updated to use `equalsIgnoreCase` >> >> To verify the correctness of `equalsIgnoreCase`, a new test is added to >> `EqualsIgnoreCase` with an exhaustive verification that all 256x256 latin1 >> code point pairs have an `equalsIgnoreCase` consistent with >> Character.toUpperCase, Character.toLowerCase. >> >> Performance is tested for matching and mismatching cases of code point pairs >> picked from the ASCII letter, ASCII number and latin1 letter ranges. Results >> in the first comment below. > > Eirik Bjorsnos has updated the pull request incrementally with two additional > commits since the last revision: > > - Replace 'oldest ASCII trick in the book' use in toUpperCase, toLowerCase > with "by removing (setting) a single bit" > - Align local variable naming in toLowerCase, toUpperCase with > equalsIgnoreCase by using 'lower' and 'upper' Marked as reviewed by redestad (Reviewer). - PR: https://git.openjdk.org/jdk/pull/12632
Integrated: 8302863: Speed up String::encodeASCII using countPositives
On Sat, 18 Feb 2023 23:26:08 GMT, Claes Redestad wrote: > When encoding Strings to US-ASCII we can speed up the happy path > significantly by using `StringCoding.countPositives` as a speculative check > for whether there are any chars that needs to be replaced by `'?'`. Once a > non-ASCII char is encountered we fall back to the slow loop and replace as > needed. > > An alternative could be unrolling or using a byte array VarHandle, as > show-cased by Brett Okken here: > https://mail.openjdk.org/pipermail/core-libs-dev/2023-February/100573.html > Having to replace chars with `?` is essentially an encoding error so it might > be safe to assume this case is exceptional in practice. This pull request has now been integrated. Changeset: 92dfa117 Author:Claes Redestad URL: https://git.openjdk.org/jdk/commit/92dfa1175e4898fc491115e004380780b6862473 Stats: 24 lines in 1 file changed: 11 ins; 2 del; 11 mod 8302863: Speed up String::encodeASCII using countPositives Reviewed-by: alanb - PR: https://git.openjdk.org/jdk/pull/12640
Re: RFR: 8302863: Speed up String::encodeASCII using countPositives
On Mon, 20 Feb 2023 21:40:41 GMT, Brett Okken wrote: >> When encoding Strings to US-ASCII we can speed up the happy path >> significantly by using `StringCoding.countPositives` as a speculative check >> for whether there are any chars that needs to be replaced by `'?'`. Once a >> non-ASCII char is encountered we fall back to the slow loop and replace as >> needed. >> >> An alternative could be unrolling or using a byte array VarHandle, as >> show-cased by Brett Okken here: >> https://mail.openjdk.org/pipermail/core-libs-dev/2023-February/100573.html >> Having to replace chars with `?` is essentially an encoding error so it >> might be safe to assume this case is exceptional in practice. > > For the happy path of no encoding failures, this "trivial" change is a great > improvement. Thanks for reviewing - and thanks @bokken for inspiring this change. - PR: https://git.openjdk.org/jdk/pull/12640
Re: RFR: 8302871: Speed up StringLatin1.regionMatchesCI [v7]
On Tue, 21 Feb 2023 11:14:13 GMT, Eirik Bjorsnos wrote: >> This PR suggests we can speed up `StringLatin1.regionMatchesCI` by applying >> 'the oldest ASCII trick in the book'. >> >> The new static method `CharacterDataLatin1.equalsIgnoreCase` compares two >> latin1 bytes for equality ignoring case. `StringLatin1.regionMatchesCI` is >> updated to use `equalsIgnoreCase` >> >> To verify the correctness of `equalsIgnoreCase`, a new test is added to >> `EqualsIgnoreCase` with an exhaustive verification that all 256x256 latin1 >> code point pairs have an `equalsIgnoreCase` consistent with >> Character.toUpperCase, Character.toLowerCase. >> >> Performance is tested for matching and mismatching cases of code point pairs >> picked from the ASCII letter, ASCII number and latin1 letter ranges. Results >> in the first comment below. > > Eirik Bjorsnos has updated the pull request incrementally with one additional > commit since the last revision: > > Remove whitespace following '(' Marked as reviewed by redestad (Reviewer). - PR: https://git.openjdk.org/jdk/pull/12632
Re: RFR: 8302877: Speed up latin1 case conversions [v2]
On Tue, 21 Feb 2023 06:59:47 GMT, Eirik Bjorsnos wrote: >> This PR suggests we speed up Character.toUpperCase and Character.toLowerCase >> for latin1 code points by applying the 'oldest ASCII trick in the book'. >> >> This takes advantage of the fact that latin1 uppercase code points are >> always 0x20 lower than their lowercase (with the exception of two code >> points which uppercase out of latin1). >> >> To verify the correctness of the new implementation, the test >> `Latin1CaseConversion` is added with an exhaustive verification of >> toUpperCase/toLowerCase for all latin1 code points. >> >> The implementation needs to balance the performance of the various ranges in >> latin1. An effort has been made to favour operations on ASCII code points, >> without causing excessive regression for higher code points. >> >> Performance is benchmarked for 7 chosen sample code points, each >> representing a range or a special-case. Results in the first comment. > > Eirik Bjorsnos has updated the pull request incrementally with one additional > commit since the last revision: > > Spell fix for 'exhaustive' in comments in sun/text/resources Looks good. Some nits inline src/java.base/share/classes/java/lang/CharacterDataLatin1.java.template line 142: > 140: } > 141: int l = ch | 0x20; // Lowercase using 'oldest ASCII trick in the > book' > 142: if ( l <= 'z' // In range a-z Suggestion: if (l <= 'z' // In range a-z test/micro/org/openjdk/bench/java/lang/Characters.java line 92: > 90: @Measurement(iterations = 5, time = 1) > 91: @Fork(3) > 92: public static class Latin1CaseConversions { Not sure if qualifying this as "Latin1" is necessary, even though that's what you've focused on for this PR. We could easily add some codePoints outside of the latin1 range (now or later) without changing the test. While having a switch with some readable names is a nice touch I think we should additionally allow integer codePoint as-is to keep it in line with the outer class, e.g. `default -> Integer.parseInt(codePoint);` - Marked as reviewed by redestad (Reviewer). PR: https://git.openjdk.org/jdk/pull/12623
Re: RFR: 8302871: Speed up StringLatin1.regionMatchesCI [v4]
On Mon, 20 Feb 2023 16:16:45 GMT, Eirik Bjorsnos wrote: >> src/java.base/share/classes/java/lang/CharacterDataLatin1.java.template line >> 170: >> >>> 168: * @return true if the two bytes are considered equals ignoring >>> case in latin1 >>> 169: */ >>> 170: static boolean equalsIgnoreCase(byte b1, byte b2) { >> >> Perhaps put this in `CharacterDataLatin1`, keeping it close to >> toLowerCase/toUpperCase that you're changing to use similar logic with >> #12623 >> >> If you apply #12623 first - how much difference does this make on the micro >> you're adding with this PR? > > Is it not already in CharacterDataLatin1? > > Here is a comparison of relying on improvements in > `CharacterDataLatin1.toUpperCase/toLowerCase` only vs. using > `CharacterDataLatin1.equalsIgnoreCase`: > > Character.toUpperCase/toLowerCase only: > > > Benchmark (codePoints) (size) Mode Cnt > ScoreError Units > RegionMatchesIC.Latin1.regionMatchesIC ascii-match1024 avgt 15 > 1310.582 ± 84.777 ns/op > RegionMatchesIC.Latin1.regionMatchesIC ascii-mismatch1024 avgt 15 > 4.547 ± 0.545 ns/op > RegionMatchesIC.Latin1.regionMatchesIC number-match1024 avgt 15 > 686.947 ± 11.850 ns/op > RegionMatchesIC.Latin1.regionMatchesIC number-mismatch1024 avgt 15 > 3.836 ± 0.634 ns/op > RegionMatchesIC.Latin1.regionMatchesIC lat1-match1024 avgt 15 > 2107.219 ± 17.662 ns/op > RegionMatchesIC.Latin1.regionMatchesIClat1-mismatch1024 avgt 15 > 4.924 ± 0.829 ns/op > > > CharacterDataLatin1.equalsIgnoreCase: > > > Benchmark (codePoints) (size) Mode Cnt > ScoreError Units > RegionMatchesIC.Latin1.regionMatchesIC ascii-match1024 avgt 15 > 742.467 ± 34.490 ns/op > RegionMatchesIC.Latin1.regionMatchesIC ascii-mismatch1024 avgt 15 > 3.960 ± 0.046 ns/op > RegionMatchesIC.Latin1.regionMatchesIC number-match1024 avgt 15 > 361.158 ± 37.096 ns/op > RegionMatchesIC.Latin1.regionMatchesIC number-mismatch1024 avgt 15 > 4.039 ± 0.521 ns/op > RegionMatchesIC.Latin1.regionMatchesIC lat1-match1024 avgt 15 > 1158.091 ± 41.617 ns/op > RegionMatchesIC.Latin1.regionMatchesIClat1-mismatch1024 avgt 15 > 4.358 ± 0.123 ns/op Oops, I lost context and thought this was in `StringLatin1`. Thanks for running the numbers with #12623. Looks like you're getting big enough of an improvement on top. - PR: https://git.openjdk.org/jdk/pull/12632
Re: RFR: 8302871: Speed up StringLatin1.regionMatchesCI [v4]
On Mon, 20 Feb 2023 14:45:09 GMT, Eirik Bjorsnos wrote: >> This PR suggests we can speed up `StringLatin1.regionMatchesCI` by applying >> 'the oldest ASCII trick in the book'. >> >> The new static method `CharacterDataLatin1.equalsIgnoreCase` compares two >> latin1 bytes for equality ignoring case. `StringLatin1.regionMatchesCI` is >> updated to use `equalsIgnoreCase` >> >> To verify the correctness of `equalsIgnoreCase`, a new test is added to >> `EqualsIgnoreCase` with an exhaustive verification that all 256x256 latin1 >> code point pairs have an `equalsIgnoreCase` consistent with >> Character.toUpperCase, Character.toLowerCase. >> >> Performance is tested for matching and mismatching cases of code point pairs >> picked from the ASCII letter, ASCII number and latin1 letter ranges. Results >> in the first comment below. > > Eirik Bjorsnos has updated the pull request incrementally with two additional > commits since the last revision: > > - Add @bug tag to EqualsIgnoreCase test for correct issue JDK-8302871 > - Add @bug tag to EqualsIgnoreCase test for JDK-8302877 src/java.base/share/classes/java/lang/CharacterDataLatin1.java.template line 170: > 168: * @return true if the two bytes are considered equals ignoring case > in latin1 > 169: */ > 170: static boolean equalsIgnoreCase(byte b1, byte b2) { Perhaps put this in `CharacterDataLatin1`, keeping it close to toLowerCase/toUpperCase that you're changing to use similar logic with #12623 If you apply #12623 first - how much difference does this make on the micro you're adding with this PR? - PR: https://git.openjdk.org/jdk/pull/12632
Re: Speed up latin1 case folding
RFE filed: https://bugs.openjdk.org/browse/JDK-8302877 /Claes 17 feb. 2023 kl. 18:38 skrev Eirik Bjørsnøs mailto:eir...@gmail.com>>: Hi, The following PR suggests we can speed up Character.toUpperCase and Character.toLowerCase for latin1 code points by applying 'the oldest ASCII trick in the book': https://github.com/openjdk/jdk/pull/12623 Thanks, Eirik.
Re: Speed up StringLatin1.regionMatchesCI
RFE filed: https://bugs.openjdk.org/browse/JDK-8302871 /Claes 18 feb. 2023 kl. 10:28 skrev Eirik Bjørsnøs mailto:eir...@gmail.com>>: Hi, The following PR suggests we can speed up StringLatin1.regionMatchesCI by applying 'the oldest ASCII trick in the book': https://github.com/openjdk/jdk/pull/12632 Thanks, Eirik.
Re: Speed up StringLatin1.regionMatchesCI_UTF16
RFE filed: https://bugs.openjdk.org/browse/JDK-8302872 /Claes 18 feb. 2023 kl. 19:58 skrev Eirik Bjørsnøs mailto:eir...@gmail.com>>: Hi, This PR continues the effort to speed up case-insensitive string comparisons, this time tackling comparison of latin1-coded strings with utf16-coded strings: https://github.com/openjdk/jdk/pull/12637 This builds on top of #12632, it makes sense to review that one first. Thanks, Eirik.
Re: RFR: 8302863: Speed up String::encodeASCII using countPositives
On Sat, 18 Feb 2023 23:26:08 GMT, Claes Redestad wrote: > When encoding Strings to US-ASCII we can speed up the happy path > significantly by using `StringCoding.countPositives` as a speculative check > for whether there are any chars that needs to be replaced by `'?'`. Once a > non-ASCII char is encountered we fall back to the slow loop and replace as > needed. > > An alternative could be unrolling or using a byte array VarHandle, as > show-cased by Brett Okken here: > https://mail.openjdk.org/pipermail/core-libs-dev/2023-February/100573.html > Having to replace chars with `?` is essentially an encoding error so it might > be safe to assume this case is exceptional in practice. Baseline: Benchmark (charsetName) Mode Cnt Score Error Units StringEncode.encodeAsciiLongUS-ASCII avgt5 26626,025 ± 448,307 ns/op StringEncode.encodeAsciiShort US-ASCII avgt5 33,336 ± 2,032 ns/op Patch: Benchmark (charsetName) Mode Cnt ScoreError Units StringEncode.encodeAsciiLongUS-ASCII avgt5 5492,985 ± 40,066 ns/op StringEncode.encodeAsciiShort US-ASCII avgt528,545 ± 4,883 ns/op - PR: https://git.openjdk.org/jdk/pull/12640
Re: RFR: 8302863: Speed up String::encodeASCII using countPositives
On Sun, 19 Feb 2023 07:24:30 GMT, David Schlosnagle wrote: >> When encoding Strings to US-ASCII we can speed up the happy path >> significantly by using `StringCoding.countPositives` as a speculative check >> for whether there are any chars that needs to be replaced by `'?'`. Once a >> non-ASCII char is encountered we fall back to the slow loop and replace as >> needed. >> >> An alternative could be unrolling or using a byte array VarHandle, as >> show-cased by Brett Okken here: >> https://mail.openjdk.org/pipermail/core-libs-dev/2023-February/100573.html >> Having to replace chars with `?` is essentially an encoding error so it >> might be safe to assume this case is exceptional in practice. > > src/java.base/share/classes/java/lang/String.java line 976: > >> 974: private static byte[] encodeASCII(byte coder, byte[] val) { >> 975: if (coder == LATIN1) { >> 976: byte[] dst = Arrays.copyOf(val, val.length); > > Given the tweaks in https://git.openjdk.org/jdk/pull/12613 should this use > `val.clone()` (would skip the length check) > > Suggestion: > > byte[] dst = val.clone(); Yeah, probably makes sense. On that note I found that `val.clone()` underperform `Arrays.copyOf(val, val.length)` in C1 compiled code: https://bugs.openjdk.org/browse/JDK-8302850 - while this shouldn't affect peak performance it might be cause for a warmup regression. > src/java.base/share/classes/java/lang/String.java line 982: > >> 980: if (dst[i] < 0) { >> 981: dst[i] = '?'; >> 982: } > > I'm curious if using countPositives (and vectorization) to scan forward would > be valuable for long (mostly ASCII) strings or if the method call > overhead/non-constant stride is not a win for shorter strings or heavily > non-ascii inputs. > > Suggestion: > > for (int i = positives; i < dst.length; i = > StringCoding.countPositives(dst, i + 1, dst.length - i);) { > if (dst[i] < 0) { > dst[i] = '?'; > } There's some overhead doing countPositives calls, and doing it in a loop might provoke poor worst case performance. Im sure you can find inputs for which this is a win, though. - PR: https://git.openjdk.org/jdk/pull/12640
RFR: 8302863: Speed up String::encodeASCII using countPositives
When encoding Strings to US-ASCII we can speed up the happy path significantly by using `StringCoding.countPositives` as a speculative check for whether there are any chars that needs to be replaced by `'?'`. Once a non-ASCII char is encountered we fall back to the slow loop and replace as needed. An alternative could be unrolling or using a byte array VarHandle, as show-cased by Brett Okken here: https://mail.openjdk.org/pipermail/core-libs-dev/2023-February/100573.html Having to replace chars with `?` is essentially an encoding error so it might be safe to assume this case is exceptional in practice. - Commit messages: - Cleanup, clone - Merge branch 'master' into encodeASCII - Improve encodeASCII by leveraging countPositives Changes: https://git.openjdk.org/jdk/pull/12640/files Webrev: https://webrevs.openjdk.org/?repo=jdk=12640=00 Issue: https://bugs.openjdk.org/browse/JDK-8302863 Stats: 24 lines in 1 file changed: 11 ins; 2 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/12640.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12640/head:pull/12640 PR: https://git.openjdk.org/jdk/pull/12640
Re: String encodeASCII
Unsafe might be similarly tricky dependency-wise, but perhaps less so. Either solution might work fine if we extract the code for encoding to a utility class that isn’t initialized eagerly with String.class itself. I’ll file an RFE and get the ”trivial” fix to use StringCoding.countPositives in to establish a better baseline. I’m sure there are ways to beat this, for example with a fused loop. An unrolled and/or VH-based solution like you’ve proposed could at the very least be used to speedup the case of Strings for which we need to replace with ’?’. /Claes 19 feb. 2023 kl. 02:54 skrev Brett Okken mailto:brett.okken...@gmail.com>>: Thanks for taking a look at this. That is a pretty good improvement with a much smaller change. I recognize the intricacies of bootstrapping, but have no expertise. Would using Unsafe directly be easier? Particularly for shorter strings, doing the copying and checking together rather than as separate loops seems to speed things up considerably, even for happy-path of no failures. Brett On Sat, Feb 18, 2023 at 5:49 PM Claes Redestad mailto:claes.redes...@oracle.com>> wrote: Hi, general comment: String might be one of the trickier places to add a VarHandle dependency, since String is used very early in the bootstrap and depended upon by everything else, so I’d expect such a solution would have to navigate potential circularity issues carefully. It’d be good to experiment with changes to java.lang.String proper to see that the solution that works nice externally is or can be made feasible within String. Specifically on the performance opportunity then while US-ASCII encoding is probably on the way out we shouldn’t neglect it. One way to go about this without pulling VarHandles into String might be to use what other encode methods in String does and leverage StringCoding.countPositives: https://github.com/openjdk/jdk/pull/12640<https://urldefense.com/v3/__https://github.com/openjdk/jdk/pull/12640__;!!ACWV5N9M2RV99hQ!O-1OoPjv-Jw1iGFv2-_q5zTiayoRyLJBcSiQ9J5IEun3kcusUeGRjZzOb_dploCjPh_Kjp6FJXr2cIk7Se5Zp4JhLWU$> Testing this on the existing StringEncode microbenchmark, shows a promising speed-up when the input is ASCII-encodable: Baseline Benchmark (charsetName) Mode Cnt Score Error Units StringEncode.encodeAsciiLongUS-ASCII avgt5 26626,025 ± 448,307 ns/op StringEncode.encodeAsciiShort US-ASCII avgt5 33,336 ± 2,032 ns/op Patch: Benchmark (charsetName) Mode Cnt ScoreError Units StringEncode.encodeAsciiLongUS-ASCII avgt5 5492,985 ± 40,066 ns/op StringEncode.encodeAsciiShort US-ASCII avgt528,545 ± 4,883 ns/op (You might see that this will go back into a scalar loop on encoding failures. That loop could still benefit from unrolling or byteArrayViewVarHandle, but I think you have a bigger problem in an app than raw performance if you have a lot of encoding failures...) WDYT? /Claes 18 feb. 2023 kl. 19:36 skrev Brett Okken mailto:brett.okken...@gmail.com>>: https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L976-L981<https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java*L976-L981__;Iw!!ACWV5N9M2RV99hQ!O-1OoPjv-Jw1iGFv2-_q5zTiayoRyLJBcSiQ9J5IEun3kcusUeGRjZzOb_dploCjPh_Kjp6FJXr2cIk7Se5Z_Ook_Ws$> For String.encodeASCII, with the LATIN1 coder is there any interest in exploring the performance impacts of utilizing a byteArrayViewVarHandle to read/write as longs and utilize a bitmask to identify if negative values are present? A simple jmh benchmark covering either 0 or 1 non-ascii (negative) values shows times cut in half (or more) for most scenarios with strings ranging in length from 3 - ~2000. VM version: JDK 17.0.6, OpenJDK 64-Bit Server VM, 17.0.6+10 Windows 10 Intel(R) Core(TM) i7-9850H Hand unrolling the loops shows noted improvement, but does make for less aesthetically pleasing code. Benchmark (nonascii) (size) Mode CntScore Error Units AsciiEncodeBenchmark.jdk none 3 avgt 4 15.531 ± 1.122 ns/op AsciiEncodeBenchmark.jdk none 10 avgt 4 17.350 ± 0.473 ns/op AsciiEncodeBenchmark.jdk none 16 avgt 4 18.277 ± 0.421 ns/op AsciiEncodeBenchmark.jdk none 23 avgt 4 20.139 ± 0.350 ns/op AsciiEncodeBenchmark.jdk none 33 avgt 4 22.008 ± 0.656 ns/op AsciiEncodeBenchmark.jdk none 42 avgt 4 24.393 ± 1.155 ns/op AsciiEncodeBenchmark.jdk none 201 avgt 4 55.884 ± 0.645 ns/op AsciiEncodeBenchmark.jdk none 511 avgt 4 120.817 ± 2.917 ns/op AsciiEncodeBenchmark.jdk
Re: RFR: 8302822: Method/Field/Constructor/RecordComponent::getGenericInfo() is not thread safe
On Sun, 19 Feb 2023 18:41:18 GMT, liach wrote: > 8302822: Method/Field/Constructor/RecordComponent::getGenericInfo() is not > thread safe I think of this pattern of reading a to-be-lazily-initialized value into a local as simple hygiene, `volatile` or not. The code might seem solid without it - but stranger things than eliding a field load has happened. Storing into the local variable removes some doubt about how this code will be executed. - PR: https://git.openjdk.org/jdk/pull/12643
Re: RFR: 8302822: Method/Field/Constructor/RecordComponent::getGenericInfo() is not thread safe
On Sun, 19 Feb 2023 18:41:18 GMT, liach wrote: > 8302822: Method/Field/Constructor/RecordComponent::getGenericInfo() is not > thread safe Marked as reviewed by redestad (Reviewer). - PR: https://git.openjdk.org/jdk/pull/12643
Re: String encodeASCII
Hi, general comment: String might be one of the trickier places to add a VarHandle dependency, since String is used very early in the bootstrap and depended upon by everything else, so I’d expect such a solution would have to navigate potential circularity issues carefully. It’d be good to experiment with changes to java.lang.String proper to see that the solution that works nice externally is or can be made feasible within String. Specifically on the performance opportunity then while US-ASCII encoding is probably on the way out we shouldn’t neglect it. One way to go about this without pulling VarHandles into String might be to use what other encode methods in String does and leverage StringCoding.countPositives: https://github.com/openjdk/jdk/pull/12640 Testing this on the existing StringEncode microbenchmark, shows a promising speed-up when the input is ASCII-encodable: Baseline Benchmark (charsetName) Mode Cnt Score Error Units StringEncode.encodeAsciiLongUS-ASCII avgt5 26626,025 ± 448,307 ns/op StringEncode.encodeAsciiShort US-ASCII avgt5 33,336 ± 2,032 ns/op Patch: Benchmark (charsetName) Mode Cnt ScoreError Units StringEncode.encodeAsciiLongUS-ASCII avgt5 5492,985 ± 40,066 ns/op StringEncode.encodeAsciiShort US-ASCII avgt528,545 ± 4,883 ns/op (You might see that this will go back into a scalar loop on encoding failures. That loop could still benefit from unrolling or byteArrayViewVarHandle, but I think you have a bigger problem in an app than raw performance if you have a lot of encoding failures...) WDYT? /Claes 18 feb. 2023 kl. 19:36 skrev Brett Okken mailto:brett.okken...@gmail.com>>: https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L976-L981 For String.encodeASCII, with the LATIN1 coder is there any interest in exploring the performance impacts of utilizing a byteArrayViewVarHandle to read/write as longs and utilize a bitmask to identify if negative values are present? A simple jmh benchmark covering either 0 or 1 non-ascii (negative) values shows times cut in half (or more) for most scenarios with strings ranging in length from 3 - ~2000. VM version: JDK 17.0.6, OpenJDK 64-Bit Server VM, 17.0.6+10 Windows 10 Intel(R) Core(TM) i7-9850H Hand unrolling the loops shows noted improvement, but does make for less aesthetically pleasing code. Benchmark (nonascii) (size) Mode CntScore Error Units AsciiEncodeBenchmark.jdk none 3 avgt 4 15.531 ± 1.122 ns/op AsciiEncodeBenchmark.jdk none 10 avgt 4 17.350 ± 0.473 ns/op AsciiEncodeBenchmark.jdk none 16 avgt 4 18.277 ± 0.421 ns/op AsciiEncodeBenchmark.jdk none 23 avgt 4 20.139 ± 0.350 ns/op AsciiEncodeBenchmark.jdk none 33 avgt 4 22.008 ± 0.656 ns/op AsciiEncodeBenchmark.jdk none 42 avgt 4 24.393 ± 1.155 ns/op AsciiEncodeBenchmark.jdk none 201 avgt 4 55.884 ± 0.645 ns/op AsciiEncodeBenchmark.jdk none 511 avgt 4 120.817 ± 2.917 ns/op AsciiEncodeBenchmark.jdk none2087 avgt 4 471.039 ± 13.329 ns/op AsciiEncodeBenchmark.jdkfirst 3 avgt 4 15.794 ± 1.494 ns/op AsciiEncodeBenchmark.jdkfirst 10 avgt 4 18.446 ± 0.780 ns/op AsciiEncodeBenchmark.jdkfirst 16 avgt 4 20.458 ± 0.394 ns/op AsciiEncodeBenchmark.jdkfirst 23 avgt 4 22.934 ± 0.422 ns/op AsciiEncodeBenchmark.jdkfirst 33 avgt 4 25.367 ± 0.178 ns/op AsciiEncodeBenchmark.jdkfirst 42 avgt 4 28.620 ± 0.678 ns/op AsciiEncodeBenchmark.jdkfirst 201 avgt 4 80.250 ± 4.376 ns/op AsciiEncodeBenchmark.jdkfirst 511 avgt 4 185.518 ± 6.370 ns/op AsciiEncodeBenchmark.jdkfirst2087 avgt 4 713.213 ± 13.488 ns/op AsciiEncodeBenchmark.jdk last 3 avgt 4 14.991 ± 0.190 ns/op AsciiEncodeBenchmark.jdk last 10 avgt 4 18.284 ± 0.317 ns/op AsciiEncodeBenchmark.jdk last 16 avgt 4 20.591 ± 1.002 ns/op AsciiEncodeBenchmark.jdk last 23 avgt 4 22.560 ± 0.963 ns/op AsciiEncodeBenchmark.jdk last 33 avgt 4 25.521 ± 0.554 ns/op AsciiEncodeBenchmark.jdk last 42 avgt 4 28.484 ± 0.446 ns/op AsciiEncodeBenchmark.jdk last 201 avgt 4 79.434 ± 2.256 ns/op
Re: RFR: 8302315: Examine cost of clone of primitive arrays compared to arraycopy
On Fri, 17 Feb 2023 09:58:54 GMT, Claes Redestad wrote: > During work on #12453 @schlosna reported that `array.clone()` might > underperform `System.arraycopy` in microbenchmarks and I opted to go with > `arraycopy` throughout while investigating. > > Testing on x86 (SandyBridge, AVX2) I observe no difference at all between the > setups. On aarch the only difference I can observe is that arraycopy seem > curiously slow for input size = 0, otherwise no statistically significant > difference. All tests ran on builds from JDK mainline and 21-b9. > > Since the reported difference was small and mostly visible on very large > arrays I conclude that the maintainability win we get from using `clone()` is > preferable. I've added the microbenchmark provided by @schlosna here. Thanks for reviewing - PR: https://git.openjdk.org/jdk/pull/12613
Integrated: 8302315: Examine cost of clone of primitive arrays compared to arraycopy
On Fri, 17 Feb 2023 09:58:54 GMT, Claes Redestad wrote: > During work on #12453 @schlosna reported that `array.clone()` might > underperform `System.arraycopy` in microbenchmarks and I opted to go with > `arraycopy` throughout while investigating. > > Testing on x86 (SandyBridge, AVX2) I observe no difference at all between the > setups. On aarch the only difference I can observe is that arraycopy seem > curiously slow for input size = 0, otherwise no statistically significant > difference. All tests ran on builds from JDK mainline and 21-b9. > > Since the reported difference was small and mostly visible on very large > arrays I conclude that the maintainability win we get from using `clone()` is > preferable. I've added the microbenchmark provided by @schlosna here. This pull request has now been integrated. Changeset: d6716d2e Author:Claes Redestad URL: https://git.openjdk.org/jdk/commit/d6716d2e5471ee794df8833430dd3171b565f78e Stats: 163 lines in 2 files changed: 92 ins; 55 del; 16 mod 8302315: Examine cost of clone of primitive arrays compared to arraycopy Reviewed-by: alanb - PR: https://git.openjdk.org/jdk/pull/12613
Re: RFR: 8302315: Examine cost of clone of primitive arrays compared to arraycopy
On Fri, 17 Feb 2023 09:58:54 GMT, Claes Redestad wrote: > During work on #12453 @schlosna reported that `array.clone()` might > underperform `System.arraycopy` in microbenchmarks and I opted to go with > `arraycopy` throughout while investigating. > > Testing on x86 (SandyBridge, AVX2) I observe no difference at all between the > setups. On aarch the only difference I can observe is that arraycopy seem > curiously slow for input size = 0, otherwise no statistically significant > difference. All tests ran on builds from JDK mainline and 21-b9. > > Since the reported difference was small and mostly visible on very large > arrays I conclude that the maintainability win we get from using `clone()` is > preferable. I've added the microbenchmark provided by @schlosna here. osx-aarch64: Benchmark (size) Mode CntScore Error Units ArrayClone.byteArraycopy 0 avgt 159,517 ± 1,272 ns/op ArrayClone.byteArraycopy 10 avgt 155,933 ± 0,314 ns/op ArrayClone.byteArraycopy 100 avgt 154,802 ± 0,234 ns/op ArrayClone.byteArraycopy1000 avgt 15 24,671 ± 0,437 ns/op ArrayClone.byteClone 0 avgt 152,417 ± 0,016 ns/op ArrayClone.byteClone 10 avgt 152,924 ± 0,027 ns/op ArrayClone.byteClone 100 avgt 154,563 ± 0,050 ns/op ArrayClone.byteClone1000 avgt 15 24,737 ± 0,262 ns/op ArrayClone.intArraycopy0 avgt 158,156 ± 2,148 ns/op ArrayClone.intArraycopy 10 avgt 153,646 ± 0,025 ns/op ArrayClone.intArraycopy 100 avgt 15 11,430 ± 0,087 ns/op ArrayClone.intArraycopy 1000 avgt 15 106,174 ± 0,721 ns/op ArrayClone.intClone0 avgt 152,455 ± 0,159 ns/op ArrayClone.intClone 10 avgt 153,621 ± 0,013 ns/op ArrayClone.intClone 100 avgt 15 11,648 ± 0,454 ns/op ArrayClone.intClone 1000 avgt 15 106,469 ± 1,295 ns/op linux-x64, sandybridge, avx2: Benchmark (size) Mode CntScoreError Units ArrayClone.byteArraycopy 0 avgt 153.321 ± 0.194 ns/op ArrayClone.byteArraycopy 10 avgt 156.953 ± 0.329 ns/op ArrayClone.byteArraycopy 100 avgt 15 13.490 ± 0.595 ns/op ArrayClone.byteArraycopy1000 avgt 15 150.201 ± 3.451 ns/op ArrayClone.byteClone 0 avgt 155.431 ± 0.252 ns/op ArrayClone.byteClone 10 avgt 156.370 ± 0.329 ns/op ArrayClone.byteClone 100 avgt 15 13.561 ± 0.633 ns/op ArrayClone.byteClone1000 avgt 15 150.300 ± 5.318 ns/op ArrayClone.intArraycopy0 avgt 153.297 ± 0.226 ns/op ArrayClone.intArraycopy 10 avgt 157.171 ± 0.354 ns/op ArrayClone.intArraycopy 100 avgt 15 60.863 ± 1.580 ns/op ArrayClone.intArraycopy 1000 avgt 15 557.770 ± 15.107 ns/op ArrayClone.intClone0 avgt 155.373 ± 0.225 ns/op ArrayClone.intClone 10 avgt 156.965 ± 0.293 ns/op ArrayClone.intClone 100 avgt 15 61.696 ± 1.983 ns/op ArrayClone.intClone 1000 avgt 15 552.809 ± 14.358 ns/op - PR: https://git.openjdk.org/jdk/pull/12613
RFR: 8302315: Examine cost of clone of primitive arrays compared to arraycopy
During work on #12453 @schlosna reported that `array.clone()` might underperform `System.arraycopy` in microbenchmarks and I opted to go with `arraycopy` throughout while investigating. Testing on x86 (SandyBridge, AVX2) I observe no difference at all between the setups. On aarch the only difference I can observe is that arraycopy seem curiously slow for input size = 0, otherwise no statistically significant difference. All tests ran on builds from JDK mainline and 21-b9. Since the reported difference was small and mostly visible on very large arrays I conclude that the maintainability win we get from using `clone()` is preferable. I've added the microbenchmark provided by @schlosna here. - Commit messages: - Examine arraycopy vs clone Changes: https://git.openjdk.org/jdk/pull/12613/files Webrev: https://webrevs.openjdk.org/?repo=jdk=12613=00 Issue: https://bugs.openjdk.org/browse/JDK-8302315 Stats: 163 lines in 2 files changed: 92 ins; 55 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/12613.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12613/head:pull/12613 PR: https://git.openjdk.org/jdk/pull/12613
Integrated: 8302163: Speed up various String comparison methods with ArraysSupport.mismatch
On Mon, 13 Feb 2023 09:59:24 GMT, Claes Redestad wrote: > We can improve various String methods such as `startsWith`, `endsWith` and > `regionMatches` by leveraging the intrinsified mismatch methods in > `ArraysSupport`. This pull request has now been integrated. Changeset: 861e3020 Author: Claes Redestad URL: https://git.openjdk.org/jdk/commit/861e302011bb3aaf0c8431c121b58a57b78481e3 Stats: 171 lines in 5 files changed: 107 ins; 44 del; 20 mod 8302163: Speed up various String comparison methods with ArraysSupport.mismatch Reviewed-by: stsypanov, rriggs, alanb - PR: https://git.openjdk.org/jdk/pull/12528
Re: RFR: 8302163: Speed up various String comparison methods with ArraysSupport.mismatch [v2]
On Mon, 13 Feb 2023 16:10:14 GMT, Claes Redestad wrote: >> We can improve various String methods such as `startsWith`, `endsWith` and >> `regionMatches` by leveraging the intrinsified mismatch methods in >> `ArraysSupport`. > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Clarify coder shift in startsWith Thanks for reviewing! - PR: https://git.openjdk.org/jdk/pull/12528
Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v17]
On Tue, 14 Feb 2023 18:22:32 GMT, Scott Gibbons wrote: >> Added code for Base64 acceleration (encode and decode) which will accelerate >> ~4x for AVX2 platforms. >> >> Encode performance: >> **Old:** >> >> Benchmark (maxNumBytes) Mode Cnt Score Error >> Units >> Base64Encode.testBase64Encode 1024 thrpt3 4309.439 ± 2.632 >> ops/ms >> >> >> **New:** >> >> Benchmark (maxNumBytes) Mode Cnt Score >> Error Units >> Base64Encode.testBase64Encode 1024 thrpt3 24211.397 ± >> 102.026 ops/ms >> >> >> Decode performance: >> **Old:** >> >> Benchmark (errorIndex) (lineSize) (maxNumBytes) >> Mode Cnt ScoreError Units >> Base64Decode.testBase64Decode 144 4 1024 >> thrpt3 3961.768 ± 93.409 ops/ms >> >> **New:** >> Benchmark (errorIndex) (lineSize) (maxNumBytes) >> Mode Cnt ScoreError Units >> Base64Decode.testBase64Decode 144 4 1024 >> thrpt3 14738.051 ± 24.383 ops/ms > > Scott Gibbons has updated the pull request incrementally with one additional > commit since the last revision: > > Last of review comments No problem! Testing looks good. - PR: https://git.openjdk.org/jdk/pull/12126
Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v17]
On Tue, 14 Feb 2023 18:22:32 GMT, Scott Gibbons wrote: >> Added code for Base64 acceleration (encode and decode) which will accelerate >> ~4x for AVX2 platforms. >> >> Encode performance: >> **Old:** >> >> Benchmark (maxNumBytes) Mode Cnt Score Error >> Units >> Base64Encode.testBase64Encode 1024 thrpt3 4309.439 ± 2.632 >> ops/ms >> >> >> **New:** >> >> Benchmark (maxNumBytes) Mode Cnt Score >> Error Units >> Base64Encode.testBase64Encode 1024 thrpt3 24211.397 ± >> 102.026 ops/ms >> >> >> Decode performance: >> **Old:** >> >> Benchmark (errorIndex) (lineSize) (maxNumBytes) >> Mode Cnt ScoreError Units >> Base64Decode.testBase64Decode 144 4 1024 >> thrpt3 3961.768 ± 93.409 ops/ms >> >> **New:** >> Benchmark (errorIndex) (lineSize) (maxNumBytes) >> Mode Cnt ScoreError Units >> Base64Decode.testBase64Decode 144 4 1024 >> thrpt3 14738.051 ± 24.383 ops/ms > > Scott Gibbons has updated the pull request incrementally with one additional > commit since the last revision: > > Last of review comments I've started tier1-5 testing internally. Will let you know if we find any issues. - PR: https://git.openjdk.org/jdk/pull/12126
Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v15]
On Tue, 14 Feb 2023 15:03:50 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2699: >> >>> 2697: __ addptr(dest, 0x18); >>> 2698: __ subl(length, 0x20); >>> 2699: __ jcc(Assembler::lessEqual, L_tailProc); >> >> This could be Assembler::less instead of Assembler::lessEqual. > > Why? There is no performance difference and the intent is clear. Is this > just a "style" thing? I think with `lessEqual` we'll jump to `L_tailProc` for the final 32-byte chunk in inputs that are divisible by 32 (starting from 64), no? Using `less` avoids that, while not affecting performance of any other inputs. - PR: https://git.openjdk.org/jdk/pull/12126
Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v15]
On Fri, 10 Feb 2023 23:18:47 GMT, Claes Redestad wrote: >> Scott Gibbons has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Add URL to microbenchmark > > Marked as reviewed by redestad (Reviewer). > @cl4es Can you please initiate the in-depth testing required for this fix? > Thanks. I've checked out the sources and fired off a sanity run of the first couple of tiers. Holding off on testing higher tiers until @sviswa7's concerns has been resolved. - PR: https://git.openjdk.org/jdk/pull/12126
Re: RFR: 8302163: Speed up various String comparison methods with ArraysSupport.mismatch [v2]
On Mon, 13 Feb 2023 16:43:12 GMT, Eirik Bjorsnos wrote: > Case-insensitive regionMatches could be improved by using > ArraysSupport.mismatch to skip over long common substrings: I considered this but saw regressions similar to what you're getting for size = 6 and backed off. I think this might be a good future enhancement, with some care, but didn't want to encumber this RFE with a case that regresses small inputs (which tend to be more common in actual applications). In similar constructs we have avoided doing the vectorized call in a loop since this could regress worst case inputs severely (an adversary example might be something like `regionMatches(true, "aa", 0, "aAaAaAaAaAaAaAa", 0, 15)` which will call mismatch 8 times on 15 byte of input). - PR: https://git.openjdk.org/jdk/pull/12528
Re: RFR: 8301958: Reduce Arrays.copyOf/-Range overheads [v10]
On Mon, 13 Feb 2023 09:59:52 GMT, Claes Redestad wrote: >> This patch adds special-cases to `Arrays.copyOf` and `Arrays.copyOfRange` to >> copy arrays more efficiently when exactly the whole input array is to be >> copied. This helps eliminate range checks and has been verified to help >> various String operations. Example: >> >> Baseline >> >> Benchmark(size) Mode Cnt >> Score Error Units >> StringConstructor.newStringFromArray 7 avgt 15 >> 16.817 ± 0.369 ns/op >> StringConstructor.newStringFromArrayWithCharset 7 avgt 15 >> 16.866 ± 0.449 ns/op >> StringConstructor.newStringFromArrayWithCharsetName 7 avgt 15 >> 22.198 ± 0.396 ns/op >> >> Patch: >> >> Benchmark(size) Mode Cnt >> Score Error Units >> StringConstructor.newStringFromArray 7 avgt 15 >> 14.666 ± 0.336 ns/op >> StringConstructor.newStringFromArrayWithCharset 7 avgt 15 >> 14.582 ± 0.288 ns/op >> StringConstructor.newStringFromArrayWithCharsetName 7 avgt 15 >> 20.339 ± 0.328 ns/op > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Address review comments Thanks for reviewng! I've filed https://bugs.openjdk.org/browse/JDK-8302315 to investigate the clone/arraycopy performance discrepancy. Ideally we should be able to just do `array.clone()` here and get optimal performance with less code. - PR: https://git.openjdk.org/jdk/pull/12453
Integrated: 8301958: Reduce Arrays.copyOf/-Range overheads
On Tue, 7 Feb 2023 13:30:35 GMT, Claes Redestad wrote: > This patch adds special-cases to `Arrays.copyOf` and `Arrays.copyOfRange` to > copy arrays more efficiently when exactly the whole input array is to be > copied. This helps eliminate range checks and has been verified to help > various String operations. Example: > > Baseline > > Benchmark(size) Mode Cnt > Score Error Units > StringConstructor.newStringFromArray 7 avgt 15 > 16.817 ± 0.369 ns/op > StringConstructor.newStringFromArrayWithCharset 7 avgt 15 > 16.866 ± 0.449 ns/op > StringConstructor.newStringFromArrayWithCharsetName 7 avgt 15 > 22.198 ± 0.396 ns/op > > Patch: > > Benchmark(size) Mode Cnt > Score Error Units > StringConstructor.newStringFromArray 7 avgt 15 > 14.666 ± 0.336 ns/op > StringConstructor.newStringFromArrayWithCharset 7 avgt 15 > 14.582 ± 0.288 ns/op > StringConstructor.newStringFromArrayWithCharsetName 7 avgt 15 > 20.339 ± 0.328 ns/op This pull request has now been integrated. Changeset: 1f9c110c Author:Claes Redestad URL: https://git.openjdk.org/jdk/commit/1f9c110c1f9ea6f5c3621a25692ce9d7abf245d4 Stats: 209 lines in 2 files changed: 179 ins; 21 del; 9 mod 8301958: Reduce Arrays.copyOf/-Range overheads Reviewed-by: alanb, smarks - PR: https://git.openjdk.org/jdk/pull/12453
Re: RFR: 8302163: Speed up various String comparison methods with ArraysSupport.mismatch
On Mon, 13 Feb 2023 09:59:24 GMT, Claes Redestad wrote: > We can improve various String methods such as `startsWith`, `endsWith` and > `regionMatches` by leveraging the intrinsified mismatch methods in > `ArraysSupport`. Microbenchmarking shows decent improvements on small data, scaling up to some impressive gains on large inputs when vectorization kicks in (~41x on regionMatches for size = 1024 on my x64 sandybridge setup, ~34x on m1 in the same config) macosx-aarch64, m1, 21-b9: Benchmark (size) (utf16) Mode Cnt Score Error Units StringComparisons.endsWith 6 true avgt 15 5,949 ± 0,051 ns/op StringComparisons.endsWith 6false avgt 15 4,063 ± 0,038 ns/op StringComparisons.endsWith15 true avgt 1510,758 ± 0,132 ns/op StringComparisons.endsWith15false avgt 15 6,487 ± 0,052 ns/op StringComparisons.endsWith 1024 true avgt 15 653,750 ± 2,835 ns/op StringComparisons.endsWith 1024false avgt 15 314,219 ± 1,264 ns/op StringComparisons.regionMatches6 true avgt 1512,460 ± 0,043 ns/op StringComparisons.regionMatches6false avgt 15 6,647 ± 0,026 ns/op StringComparisons.regionMatches 15 true avgt 1530,502 ± 0,193 ns/op StringComparisons.regionMatches 15false avgt 1515,073 ± 0,030 ns/op StringComparisons.regionMatches 1024 true avgt 15 2147,480 ± 4,622 ns/op StringComparisons.regionMatches 1024false avgt 15 1068,787 ± 14,098 ns/op StringComparisons.regionMatchesCI 6 true avgt 1511,680 ± 0,106 ns/op StringComparisons.regionMatchesCI 6false avgt 15 4,577 ± 0,101 ns/op StringComparisons.regionMatchesCI 15 true avgt 1514,422 ± 0,132 ns/op StringComparisons.regionMatchesCI 15false avgt 15 6,904 ± 0,124 ns/op StringComparisons.regionMatchesCI 1024 true avgt 15 273,810 ± 3,446 ns/op StringComparisons.regionMatchesCI 1024false avgt 15 226,040 ± 2,886 ns/op StringComparisons.regionMatchesRange 6 true avgt 1511,896 ± 0,110 ns/op StringComparisons.regionMatchesRange 6false avgt 15 6,044 ± 0,034 ns/op StringComparisons.regionMatchesRange 15 true avgt 1529,508 ± 0,093 ns/op StringComparisons.regionMatchesRange 15false avgt 1514,336 ± 0,020 ns/op StringComparisons.regionMatchesRange1024 true avgt 15 2187,600 ± 80,285 ns/op StringComparisons.regionMatchesRange1024false avgt 15 1105,813 ± 28,260 ns/op StringComparisons.startsWith 6 true avgt 15 5,315 ± 0,087 ns/op StringComparisons.startsWith 6false avgt 15 3,588 ± 0,020 ns/op StringComparisons.startsWith 15 true avgt 15 9,823 ± 0,144 ns/op StringComparisons.startsWith 15false avgt 15 5,963 ± 0,253 ns/op StringComparisons.startsWith1024 true avgt 15 441,854 ± 9,295 ns/op StringComparisons.startsWith1024false avgt 15 224,386 ± 6,876 ns/op macosx-aarch64. m1, patched: Benchmark (size) (utf16) Mode CntScore Error Units StringComparisons.endsWith 6 true avgt 153,185 ± 0,063 ns/op StringComparisons.endsWith 6false avgt 154,507 ± 0,447 ns/op StringComparisons.endsWith15 true avgt 156,097 ± 0,142 ns/op StringComparisons.endsWith15false avgt 155,736 ± 0,025 ns/op StringComparisons.endsWith 1024 true avgt 15 60,283 ± 4,109 ns/op StringComparisons.endsWith 1024false avgt 15 31,011 ± 0,080 ns/op StringComparisons.regionMatches6 true avgt 153,993 ± 0,063 ns/op StringComparisons.regionMatches6false avgt 154,836 ± 0,474 ns/op StringComparisons.regionMatches 15 true avgt 153,641 ± 0,012 ns/op StringComparisons.regionMatches 15false avgt 152,849 ± 0,065 ns/op StringComparisons.regionMatches 1024 true avgt 15 57,739 ± 0,748 ns/op StringComparisons.regionMatches 1024false avgt 15 30,943 ± 0,423 ns/op StringComparisons.regionMatchesCI 6 true avgt 15 11,729 ± 0,142 ns/op StringComparisons.regionMatchesCI 6false avgt 154,562 ± 0,125 ns/op StringComparisons.regionMatchesCI 15 true avgt 15 14,611 ± 0,227 ns/op StringComparisons.regionMatchesCI 15false avgt 156,970 ± 0,083 ns/op StringComparisons.regionMatchesCI 1024 true
RFR: 8302163: Speed up various String comparison methods with ArraysSupport.mismatch
We can improve various String methods such as `startsWith`, `endsWith` and `regionMatches` by leveraging the intrinsified mismatch methods in `ArraysSupport`. - Commit messages: - Remove overlapping micros, extend testing to endsWith, regionCI and some minor improvements to String::regionMatches - Expand micro coverage - Add micro from @eirbjo - Revert UTF16.compareValues - Add a few micros, apply optimization to StringUTF16.compareValues - Speed up various String comparison methods with ArraysSupport.mismatch Changes: https://git.openjdk.org/jdk/pull/12528/files Webrev: https://webrevs.openjdk.org/?repo=jdk=12528=00 Issue: https://bugs.openjdk.org/browse/JDK-8302163 Stats: 170 lines in 5 files changed: 105 ins; 44 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/12528.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12528/head:pull/12528 PR: https://git.openjdk.org/jdk/pull/12528
Re: RFR: 8301958: Reduce Arrays.copyOf/-Range overheads [v10]
> This patch adds special-cases to `Arrays.copyOf` and `Arrays.copyOfRange` to > copy arrays more efficiently when exactly the whole input array is to be > copied. This helps eliminate range checks and has been verified to help > various String operations. Example: > > Baseline > > Benchmark(size) Mode Cnt > Score Error Units > StringConstructor.newStringFromArray 7 avgt 15 > 16.817 ± 0.369 ns/op > StringConstructor.newStringFromArrayWithCharset 7 avgt 15 > 16.866 ± 0.449 ns/op > StringConstructor.newStringFromArrayWithCharsetName 7 avgt 15 > 22.198 ± 0.396 ns/op > > Patch: > > Benchmark(size) Mode Cnt > Score Error Units > StringConstructor.newStringFromArray 7 avgt 15 > 14.666 ± 0.336 ns/op > StringConstructor.newStringFromArrayWithCharset 7 avgt 15 > 14.582 ± 0.288 ns/op > StringConstructor.newStringFromArrayWithCharsetName 7 avgt 15 > 20.339 ± 0.328 ns/op Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Address review comments - Changes: - all: https://git.openjdk.org/jdk/pull/12453/files - new: https://git.openjdk.org/jdk/pull/12453/files/350612d4..1031dd85 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=12453=09 - incr: https://webrevs.openjdk.org/?repo=jdk=12453=08-09 Stats: 31 lines in 1 file changed: 7 ins; 8 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/12453.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12453/head:pull/12453 PR: https://git.openjdk.org/jdk/pull/12453
Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v15]
On Thu, 9 Feb 2023 18:08:15 GMT, Scott Gibbons wrote: >> Added code for Base64 acceleration (encode and decode) which will accelerate >> ~4x for AVX2 platforms. >> >> Encode performance: >> **Old:** >> >> Benchmark (maxNumBytes) Mode Cnt Score Error >> Units >> Base64Encode.testBase64Encode 1024 thrpt3 4309.439 ± 2.632 >> ops/ms >> >> >> **New:** >> >> Benchmark (maxNumBytes) Mode Cnt Score >> Error Units >> Base64Encode.testBase64Encode 1024 thrpt3 24211.397 ± >> 102.026 ops/ms >> >> >> Decode performance: >> **Old:** >> >> Benchmark (errorIndex) (lineSize) (maxNumBytes) >> Mode Cnt ScoreError Units >> Base64Decode.testBase64Decode 144 4 1024 >> thrpt3 3961.768 ± 93.409 ops/ms >> >> **New:** >> Benchmark (errorIndex) (lineSize) (maxNumBytes) >> Mode Cnt ScoreError Units >> Base64Decode.testBase64Decode 144 4 1024 >> thrpt3 14738.051 ± 24.383 ops/ms > > Scott Gibbons has updated the pull request incrementally with one additional > commit since the last revision: > > Add URL to microbenchmark Marked as reviewed by redestad (Reviewer). - PR: https://git.openjdk.org/jdk/pull/12126
Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v7]
On Thu, 9 Feb 2023 11:46:42 GMT, Eirik Bjorsnos wrote: > > In addition to - or instead of - an `equals` shortcut then if coders are > > the same we could use `ArraysSupport.mismatch` which should get similar > > speed and help more generally. > > ..and if String had (an optimized) mismatch method, then I bet all or most of > the comparison methods (equals, compareTo, endsWith, startsWith, > regionMatches) could delegate to that :-) A private mismatch method might make sense. A public method would require making a stronger case, I think, e.g. showing use cases a mismatcher would solve (elegantly, performantly) that can't be expressed with existing methods. - PR: https://git.openjdk.org/jdk/pull/12290
Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v7]
On Wed, 8 Feb 2023 16:36:16 GMT, Eirik Bjorsnos wrote: >> After finding a hash match, getEntryPos needs to compare the lookup name up >> to the encoded entry name in the CEN. This comparison is done by decoding >> the entry name into a String. The names can then be compared using the >> String API. This decoding step adds a significat cost to this method. >> >> This PR suggest to update the string comparison such that in the common case >> where both the lookup name and the entry name are encoded in >> ASCII-compatible UTF-8, decoding can be avoided and the byte arrays can >> instead be compared direcly. >> >> ZipCoder is updated with a new method to compare a string with an encoded >> byte array range. The default implementation decodes to string (like the >> current code), while the UTF-8 implementation uses >> JavaLangAccess.getBytesNoRepl to get the bytes. Both methods thes uses >> Arrays.mismatch for comparison with or without matching trailing slashes. >> >> Additionally, this PR suggest to make the following updates to getEntryPos: >> >> - The try/catch for IAE is redundant and can be safely removed. (initCEN >> already checks this and will throws IAE for invalid UTF-8). This seems to >> give a 3-4% speedup on micros) >> - A new test InvalidBytesInEntryNameOrComment is a added to verify that >> initCEN does in fact reject invalid UTF-8 in CEN file names and comments. (I >> found no existing test coverage for this) >> - The recursion when looking for "name/" matches is replaced with iteration. >> We keep track of any "name/" match and return it at the end of the search. >> (I feel this is easier to follow and it also gives a ~30% speedup for >> addSlash lookups with no regression on regular lookups) >> >> (My though is that including these additional updates in this PR might >> reduce reviewer overhead given that it touches the exact same code. I might >> be wrong on this, please advise :) >> >> I'm seeing a ~17% saving on the micro ZipFileGetEntry.getEntryHit (modified >> to use xalan.jar): >> >> Baseline: >> >> Benchmark (size) Mode CntScore Error >> Units >> ZipFileGetEntry.getEntryHit 512 avgt 15 74.941 ± 1.004 >> ns/op >> ZipFileGetEntry.getEntryHit 1024 avgt 15 84.943 ± 1.320 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 512 avgt 15 120.371 ± 2.386 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 126.128 ± 1.075 >> ns/op >> ZipFileGetEntry.getEntryMiss 512 avgt 15 23.818 ± 0.838 >> ns/op >> ZipFileGetEntry.getEntryMiss1024 avgt 15 29.762 ± 5.998 >> ns/op >> ZipFileGetEntry.getEntryMissUncached 512 avgt 15 59.405 ± 0.545 >> ns/op >> ZipFileGetEntry.getEntryMissUncached1024 avgt 15 71.840 ± 2.455 >> ns/op >> ZipFileGetEntry.getEntrySlash512 avgt 15 135.621 ± 4.341 >> ns/op >> ZipFileGetEntry.getEntrySlash 1024 avgt 15 134.190 ± 2.141 >> ns/op >> >> >> >> PR: >> >> >> Benchmark (size) Mode CntScore Error >> Units >> ZipFileGetEntry.getEntryHit 512 avgt 15 62.267 ± 1.329 >> ns/op >> ZipFileGetEntry.getEntryHit 1024 avgt 15 72.916 ± 2.428 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 512 avgt 15 101.630 ± 1.154 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 113.161 ± 0.502 >> ns/op >> ZipFileGetEntry.getEntryMiss 512 avgt 15 23.003 ± 1.191 >> ns/op >> ZipFileGetEntry.getEntryMiss1024 avgt 15 23.236 ± 1.114 >> ns/op >> ZipFileGetEntry.getEntryMissUncached 512 avgt 15 56.781 ± 1.505 >> ns/op >> ZipFileGetEntry.getEntryMissUncached1024 avgt 15 67.767 ± 1.963 >> ns/op >> ZipFileGetEntry.getEntrySlash512 avgt 15 73.745 ± 2.717 >> ns/op >> ZipFileGetEntry.getEntrySlash 1024 avgt 15 75.784 ± 1.051 >> ns/op >> >> >> To assess the impact on startup/warmup, I made a main method class which >> measures the total time of calling ZipFile.getEntry for all entries in the >> 109 JAR file dependenies of spring-petclinic. The shows a nice improvement >> (time in micros): >> >> >> Percentile Baseline Patch >> 50 % 23155 21149 >> 75 % 23598 21454 >> 90 % 23989 21691 >> 95 % 24238 21973 >> 99 % 25270 22446 >> STDEV 792 549 >> Count 500 500 > > Eirik Bjorsnos has updated the pull request incrementally with one additional > commit since the last revision: > > Update the sameHashAndLengthDirLookup to use ASCII example colliding paths, > allowing the test to run over all charsets Oh -- fun! Perhaps `startsWith` doesn't take advantage of the optimized intrinsic for `equals`. Could be interesting to see if special-casing `startsWith` to call `equals` when possible would help: diff --git
Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v7]
On Wed, 8 Feb 2023 16:36:16 GMT, Eirik Bjorsnos wrote: >> After finding a hash match, getEntryPos needs to compare the lookup name up >> to the encoded entry name in the CEN. This comparison is done by decoding >> the entry name into a String. The names can then be compared using the >> String API. This decoding step adds a significat cost to this method. >> >> This PR suggest to update the string comparison such that in the common case >> where both the lookup name and the entry name are encoded in >> ASCII-compatible UTF-8, decoding can be avoided and the byte arrays can >> instead be compared direcly. >> >> ZipCoder is updated with a new method to compare a string with an encoded >> byte array range. The default implementation decodes to string (like the >> current code), while the UTF-8 implementation uses >> JavaLangAccess.getBytesNoRepl to get the bytes. Both methods thes uses >> Arrays.mismatch for comparison with or without matching trailing slashes. >> >> Additionally, this PR suggest to make the following updates to getEntryPos: >> >> - The try/catch for IAE is redundant and can be safely removed. (initCEN >> already checks this and will throws IAE for invalid UTF-8). This seems to >> give a 3-4% speedup on micros) >> - A new test InvalidBytesInEntryNameOrComment is a added to verify that >> initCEN does in fact reject invalid UTF-8 in CEN file names and comments. (I >> found no existing test coverage for this) >> - The recursion when looking for "name/" matches is replaced with iteration. >> We keep track of any "name/" match and return it at the end of the search. >> (I feel this is easier to follow and it also gives a ~30% speedup for >> addSlash lookups with no regression on regular lookups) >> >> (My though is that including these additional updates in this PR might >> reduce reviewer overhead given that it touches the exact same code. I might >> be wrong on this, please advise :) >> >> I'm seeing a ~17% saving on the micro ZipFileGetEntry.getEntryHit (modified >> to use xalan.jar): >> >> Baseline: >> >> Benchmark (size) Mode CntScore Error >> Units >> ZipFileGetEntry.getEntryHit 512 avgt 15 74.941 ± 1.004 >> ns/op >> ZipFileGetEntry.getEntryHit 1024 avgt 15 84.943 ± 1.320 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 512 avgt 15 120.371 ± 2.386 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 126.128 ± 1.075 >> ns/op >> ZipFileGetEntry.getEntryMiss 512 avgt 15 23.818 ± 0.838 >> ns/op >> ZipFileGetEntry.getEntryMiss1024 avgt 15 29.762 ± 5.998 >> ns/op >> ZipFileGetEntry.getEntryMissUncached 512 avgt 15 59.405 ± 0.545 >> ns/op >> ZipFileGetEntry.getEntryMissUncached1024 avgt 15 71.840 ± 2.455 >> ns/op >> ZipFileGetEntry.getEntrySlash512 avgt 15 135.621 ± 4.341 >> ns/op >> ZipFileGetEntry.getEntrySlash 1024 avgt 15 134.190 ± 2.141 >> ns/op >> >> >> >> PR: >> >> >> Benchmark (size) Mode CntScore Error >> Units >> ZipFileGetEntry.getEntryHit 512 avgt 15 62.267 ± 1.329 >> ns/op >> ZipFileGetEntry.getEntryHit 1024 avgt 15 72.916 ± 2.428 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 512 avgt 15 101.630 ± 1.154 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 113.161 ± 0.502 >> ns/op >> ZipFileGetEntry.getEntryMiss 512 avgt 15 23.003 ± 1.191 >> ns/op >> ZipFileGetEntry.getEntryMiss1024 avgt 15 23.236 ± 1.114 >> ns/op >> ZipFileGetEntry.getEntryMissUncached 512 avgt 15 56.781 ± 1.505 >> ns/op >> ZipFileGetEntry.getEntryMissUncached1024 avgt 15 67.767 ± 1.963 >> ns/op >> ZipFileGetEntry.getEntrySlash512 avgt 15 73.745 ± 2.717 >> ns/op >> ZipFileGetEntry.getEntrySlash 1024 avgt 15 75.784 ± 1.051 >> ns/op >> >> >> To assess the impact on startup/warmup, I made a main method class which >> measures the total time of calling ZipFile.getEntry for all entries in the >> 109 JAR file dependenies of spring-petclinic. The shows a nice improvement >> (time in micros): >> >> >> Percentile Baseline Patch >> 50 % 23155 21149 >> 75 % 23598 21454 >> 90 % 23989 21691 >> 95 % 24238 21973 >> 99 % 25270 22446 >> STDEV 792 549 >> Count 500 500 > > Eirik Bjorsnos has updated the pull request incrementally with one additional > commit since the last revision: > > Update the sameHashAndLengthDirLookup to use ASCII example colliding paths, > allowing the test to run over all charsets You could skip the `equals`: if (decoded.startsWith(str)) { if (decoded.length() == str.length()) { return Comparison.EXACT_MATCH; } else if (addSlash &&
Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v7]
On Wed, 8 Feb 2023 16:36:16 GMT, Eirik Bjorsnos wrote: >> After finding a hash match, getEntryPos needs to compare the lookup name up >> to the encoded entry name in the CEN. This comparison is done by decoding >> the entry name into a String. The names can then be compared using the >> String API. This decoding step adds a significat cost to this method. >> >> This PR suggest to update the string comparison such that in the common case >> where both the lookup name and the entry name are encoded in >> ASCII-compatible UTF-8, decoding can be avoided and the byte arrays can >> instead be compared direcly. >> >> ZipCoder is updated with a new method to compare a string with an encoded >> byte array range. The default implementation decodes to string (like the >> current code), while the UTF-8 implementation uses >> JavaLangAccess.getBytesNoRepl to get the bytes. Both methods thes uses >> Arrays.mismatch for comparison with or without matching trailing slashes. >> >> Additionally, this PR suggest to make the following updates to getEntryPos: >> >> - The try/catch for IAE is redundant and can be safely removed. (initCEN >> already checks this and will throws IAE for invalid UTF-8). This seems to >> give a 3-4% speedup on micros) >> - A new test InvalidBytesInEntryNameOrComment is a added to verify that >> initCEN does in fact reject invalid UTF-8 in CEN file names and comments. (I >> found no existing test coverage for this) >> - The recursion when looking for "name/" matches is replaced with iteration. >> We keep track of any "name/" match and return it at the end of the search. >> (I feel this is easier to follow and it also gives a ~30% speedup for >> addSlash lookups with no regression on regular lookups) >> >> (My though is that including these additional updates in this PR might >> reduce reviewer overhead given that it touches the exact same code. I might >> be wrong on this, please advise :) >> >> I'm seeing a ~17% saving on the micro ZipFileGetEntry.getEntryHit (modified >> to use xalan.jar): >> >> Baseline: >> >> Benchmark (size) Mode CntScore Error >> Units >> ZipFileGetEntry.getEntryHit 512 avgt 15 74.941 ± 1.004 >> ns/op >> ZipFileGetEntry.getEntryHit 1024 avgt 15 84.943 ± 1.320 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 512 avgt 15 120.371 ± 2.386 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 126.128 ± 1.075 >> ns/op >> ZipFileGetEntry.getEntryMiss 512 avgt 15 23.818 ± 0.838 >> ns/op >> ZipFileGetEntry.getEntryMiss1024 avgt 15 29.762 ± 5.998 >> ns/op >> ZipFileGetEntry.getEntryMissUncached 512 avgt 15 59.405 ± 0.545 >> ns/op >> ZipFileGetEntry.getEntryMissUncached1024 avgt 15 71.840 ± 2.455 >> ns/op >> ZipFileGetEntry.getEntrySlash512 avgt 15 135.621 ± 4.341 >> ns/op >> ZipFileGetEntry.getEntrySlash 1024 avgt 15 134.190 ± 2.141 >> ns/op >> >> >> >> PR: >> >> >> Benchmark (size) Mode CntScore Error >> Units >> ZipFileGetEntry.getEntryHit 512 avgt 15 62.267 ± 1.329 >> ns/op >> ZipFileGetEntry.getEntryHit 1024 avgt 15 72.916 ± 2.428 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 512 avgt 15 101.630 ± 1.154 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 113.161 ± 0.502 >> ns/op >> ZipFileGetEntry.getEntryMiss 512 avgt 15 23.003 ± 1.191 >> ns/op >> ZipFileGetEntry.getEntryMiss1024 avgt 15 23.236 ± 1.114 >> ns/op >> ZipFileGetEntry.getEntryMissUncached 512 avgt 15 56.781 ± 1.505 >> ns/op >> ZipFileGetEntry.getEntryMissUncached1024 avgt 15 67.767 ± 1.963 >> ns/op >> ZipFileGetEntry.getEntrySlash512 avgt 15 73.745 ± 2.717 >> ns/op >> ZipFileGetEntry.getEntrySlash 1024 avgt 15 75.784 ± 1.051 >> ns/op >> >> >> To assess the impact on startup/warmup, I made a main method class which >> measures the total time of calling ZipFile.getEntry for all entries in the >> 109 JAR file dependenies of spring-petclinic. The shows a nice improvement >> (time in micros): >> >> >> Percentile Baseline Patch >> 50 % 23155 21149 >> 75 % 23598 21454 >> 90 % 23989 21691 >> 95 % 24238 21973 >> 99 % 25270 22446 >> STDEV 792 549 >> Count 500 500 > > Eirik Bjorsnos has updated the pull request incrementally with one additional > commit since the last revision: > > Update the sameHashAndLengthDirLookup to use ASCII example colliding paths, > allowing the test to run over all charsets Marked as reviewed by redestad (Reviewer). Great! Extending coverage to provoke the issue on most charsets is good, and it should guard UTF-8 from regressing too - no? - PR:
Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v6]
On Wed, 8 Feb 2023 15:31:15 GMT, Eirik Bjorsnos wrote: >> After finding a hash match, getEntryPos needs to compare the lookup name up >> to the encoded entry name in the CEN. This comparison is done by decoding >> the entry name into a String. The names can then be compared using the >> String API. This decoding step adds a significat cost to this method. >> >> This PR suggest to update the string comparison such that in the common case >> where both the lookup name and the entry name are encoded in >> ASCII-compatible UTF-8, decoding can be avoided and the byte arrays can >> instead be compared direcly. >> >> ZipCoder is updated with a new method to compare a string with an encoded >> byte array range. The default implementation decodes to string (like the >> current code), while the UTF-8 implementation uses >> JavaLangAccess.getBytesNoRepl to get the bytes. Both methods thes uses >> Arrays.mismatch for comparison with or without matching trailing slashes. >> >> Additionally, this PR suggest to make the following updates to getEntryPos: >> >> - The try/catch for IAE is redundant and can be safely removed. (initCEN >> already checks this and will throws IAE for invalid UTF-8). This seems to >> give a 3-4% speedup on micros) >> - A new test InvalidBytesInEntryNameOrComment is a added to verify that >> initCEN does in fact reject invalid UTF-8 in CEN file names and comments. (I >> found no existing test coverage for this) >> - The recursion when looking for "name/" matches is replaced with iteration. >> We keep track of any "name/" match and return it at the end of the search. >> (I feel this is easier to follow and it also gives a ~30% speedup for >> addSlash lookups with no regression on regular lookups) >> >> (My though is that including these additional updates in this PR might >> reduce reviewer overhead given that it touches the exact same code. I might >> be wrong on this, please advise :) >> >> I'm seeing a ~17% saving on the micro ZipFileGetEntry.getEntryHit (modified >> to use xalan.jar): >> >> Baseline: >> >> Benchmark (size) Mode CntScore Error >> Units >> ZipFileGetEntry.getEntryHit 512 avgt 15 74.941 ± 1.004 >> ns/op >> ZipFileGetEntry.getEntryHit 1024 avgt 15 84.943 ± 1.320 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 512 avgt 15 120.371 ± 2.386 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 126.128 ± 1.075 >> ns/op >> ZipFileGetEntry.getEntryMiss 512 avgt 15 23.818 ± 0.838 >> ns/op >> ZipFileGetEntry.getEntryMiss1024 avgt 15 29.762 ± 5.998 >> ns/op >> ZipFileGetEntry.getEntryMissUncached 512 avgt 15 59.405 ± 0.545 >> ns/op >> ZipFileGetEntry.getEntryMissUncached1024 avgt 15 71.840 ± 2.455 >> ns/op >> ZipFileGetEntry.getEntrySlash512 avgt 15 135.621 ± 4.341 >> ns/op >> ZipFileGetEntry.getEntrySlash 1024 avgt 15 134.190 ± 2.141 >> ns/op >> >> >> >> PR: >> >> >> Benchmark (size) Mode CntScore Error >> Units >> ZipFileGetEntry.getEntryHit 512 avgt 15 62.267 ± 1.329 >> ns/op >> ZipFileGetEntry.getEntryHit 1024 avgt 15 72.916 ± 2.428 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 512 avgt 15 101.630 ± 1.154 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 113.161 ± 0.502 >> ns/op >> ZipFileGetEntry.getEntryMiss 512 avgt 15 23.003 ± 1.191 >> ns/op >> ZipFileGetEntry.getEntryMiss1024 avgt 15 23.236 ± 1.114 >> ns/op >> ZipFileGetEntry.getEntryMissUncached 512 avgt 15 56.781 ± 1.505 >> ns/op >> ZipFileGetEntry.getEntryMissUncached1024 avgt 15 67.767 ± 1.963 >> ns/op >> ZipFileGetEntry.getEntrySlash512 avgt 15 73.745 ± 2.717 >> ns/op >> ZipFileGetEntry.getEntrySlash 1024 avgt 15 75.784 ± 1.051 >> ns/op >> >> >> To assess the impact on startup/warmup, I made a main method class which >> measures the total time of calling ZipFile.getEntry for all entries in the >> 109 JAR file dependenies of spring-petclinic. The shows a nice improvement >> (time in micros): >> >> >> Percentile Baseline Patch >> 50 % 23155 21149 >> 75 % 23598 21454 >> 90 % 23989 21691 >> 95 % 24238 21973 >> 99 % 25270 22446 >> STDEV 792 549 >> Count 500 500 > > Eirik Bjorsnos has updated the pull request incrementally with one additional > commit since the last revision: > > Align entry name whitespace Good! Would it be too much to ask for you to try and construct such a test for UTF-8 zip files, too? - PR: https://git.openjdk.org/jdk/pull/12290
Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v4]
On Wed, 8 Feb 2023 11:57:16 GMT, Claes Redestad wrote: > > > Seems there's a possible real test failure lurking here, might be > > > intermittent since it only showed on one platform: > > > > > > Did you get this from GHA somehow? Do you happen to know the platform, > > timezone and encoding used? > > Yes, clicked through the failing macosx-x64 test suite. Here's a link to the > summary which has the failing test logs at the bottom: > https://github.com/eirbjo/jdk/actions/runs/4114300244 Seems the failing assert is checking timestamps: assertEquals((x.getTime() / 2000), y.getTime() / 2000); ... so maybe this is a rare intermittent issue. It's just very suspect though that it happens in the test you're changing. - PR: https://git.openjdk.org/jdk/pull/12290
Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v4]
On Wed, 8 Feb 2023 11:32:02 GMT, Eirik Bjorsnos wrote: > > Seems there's a possible real test failure lurking here, might be > > intermittent since it only showed on one platform: > > Did you get this from GHA somehow? Do you happen to know the platform, > timezone and encoding used? Yes, clicked through the failing macosx-x64 test suite. Here's a link to the summary which has the failing test logs at the bottom: https://github.com/eirbjo/jdk/actions/runs/4114300244 - PR: https://git.openjdk.org/jdk/pull/12290
Re: RFR: 8301958: Reduce Arrays.copyOf/-Range overheads [v7]
On Wed, 8 Feb 2023 08:16:05 GMT, Francesco Nigro wrote: >> Claes Redestad has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Minimize, force inline, generalize > > test/micro/org/openjdk/bench/java/lang/StringConstructor.java line 40: > >> 38: >> 39: @Param({"0", "7", "64"}) >> 40: public int size; > > I suggest to add the param `offset` for future experiment: together with > `perfasm` it helps to check how different stubs are used and emulate the > different branches of the optimized code Done - PR: https://git.openjdk.org/jdk/pull/12453
Re: RFR: 8301958: Reduce Arrays.copyOf/-Range overheads [v7]
On Wed, 8 Feb 2023 03:38:24 GMT, David Schlosnagle wrote: >> Claes Redestad has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Minimize, force inline, generalize > > src/java.base/share/classes/java/util/Arrays.java line 4142: > >> 4140: private static double[] copyOfRangeGeneric(double[] original, int >> from, int to) { >> 4141: checkLength(from, to); >> 4142: int newLength = to - from; > > Suggestion: > > int newLength = checkLength(from, to); > I'm also curious if returning the new length from `checkLength` would be > worthwhile I had this arrangement earlier but saw no win from it. These helper methods will be aggressively inlined and the JIT sorts it out optimally, so I prefer to keep the `check` method zoomed in on its purpose - PR: https://git.openjdk.org/jdk/pull/12453
Re: RFR: 8301958: Reduce Arrays.copyOf/-Range overheads [v9]
> This patch adds special-cases to `Arrays.copyOf` and `Arrays.copyOfRange` to > clone arrays when `newLength` or range inputs span the input array. This > helps eliminate range checks and has been verified to help various String > operations. Example: > > Baseline > > Benchmark(size) Mode Cnt > Score Error Units > StringConstructor.newStringFromArray 7 avgt 15 > 16.817 ± 0.369 ns/op > StringConstructor.newStringFromArrayWithCharset 7 avgt 15 > 16.866 ± 0.449 ns/op > StringConstructor.newStringFromArrayWithCharsetName 7 avgt 15 > 22.198 ± 0.396 ns/op > > Patch: > > Benchmark(size) Mode Cnt > Score Error Units > StringConstructor.newStringFromArray 7 avgt 15 > 14.666 ± 0.336 ns/op > StringConstructor.newStringFromArrayWithCharset 7 avgt 15 > 14.582 ± 0.288 ns/op > StringConstructor.newStringFromArrayWithCharsetName 7 avgt 15 > 20.339 ± 0.328 ns/op Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Add offset param to micro, reduce number of configurations and variants - Changes: - all: https://git.openjdk.org/jdk/pull/12453/files - new: https://git.openjdk.org/jdk/pull/12453/files/85b50169..350612d4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=12453=08 - incr: https://webrevs.openjdk.org/?repo=jdk=12453=07-08 Stats: 13 lines in 1 file changed: 6 ins; 4 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/12453.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12453/head:pull/12453 PR: https://git.openjdk.org/jdk/pull/12453
Re: RFR: 8301958: Reduce Arrays.copyOf/-Range overheads [v7]
On Wed, 8 Feb 2023 01:10:59 GMT, David Schlosnagle wrote: >> Claes Redestad has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Minimize, force inline, generalize > > src/java.base/share/classes/java/util/Arrays.java line 3594: > >> 3592: public static int[] copyOf(int[] original, int newLength) { >> 3593: if (newLength == original.length) { >> 3594: return original.clone(); > > I am curious about the use of `clone` for some primitive array types > (`short[]`, `int[]`, `long[]`, `char[]`, `float[]`) and `copyOf` using > `System.arraycopy` in other types (`byte[]`, `double[]`). Do these types > optimize differently or hit different intrinsics depending on primitive type? > Is there difference in array zeroing? > > From a quick [JMH > benchmark](https://gist.github.com/schlosna/975e26965ec822ad42034b3ea2b08676) > `System.arraycopy` seems slightly better. I went back and forth on this and also saw a small win using `arraycopy`, but the PR ended up in an inconsistent state with some using one and some using the other. While this discrepancy seem like something we should treat as a bug, I've arranged to use `copyOf` helper consistently for now. - PR: https://git.openjdk.org/jdk/pull/12453
Re: RFR: 8301958: Reduce Arrays.copyOf/-Range overheads [v8]
> This patch adds special-cases to `Arrays.copyOf` and `Arrays.copyOfRange` to > clone arrays when `newLength` or range inputs span the input array. This > helps eliminate range checks and has been verified to help various String > operations. Example: > > Baseline > > Benchmark(size) Mode Cnt > Score Error Units > StringConstructor.newStringFromArray 7 avgt 15 > 16.817 ± 0.369 ns/op > StringConstructor.newStringFromArrayWithCharset 7 avgt 15 > 16.866 ± 0.449 ns/op > StringConstructor.newStringFromArrayWithCharsetName 7 avgt 15 > 22.198 ± 0.396 ns/op > > Patch: > > Benchmark(size) Mode Cnt > Score Error Units > StringConstructor.newStringFromArray 7 avgt 15 > 14.666 ± 0.336 ns/op > StringConstructor.newStringFromArrayWithCharset 7 avgt 15 > 14.582 ± 0.288 ns/op > StringConstructor.newStringFromArrayWithCharsetName 7 avgt 15 > 20.339 ± 0.328 ns/op Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Err on the side of copyOf - Changes: - all: https://git.openjdk.org/jdk/pull/12453/files - new: https://git.openjdk.org/jdk/pull/12453/files/c906c730..85b50169 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=12453=07 - incr: https://webrevs.openjdk.org/?repo=jdk=12453=06-07 Stats: 90 lines in 1 file changed: 56 ins; 23 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/12453.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12453/head:pull/12453 PR: https://git.openjdk.org/jdk/pull/12453
Re: RFR: 8301958: Reduce Arrays.copyOf/-Range overheads [v7]
> This patch adds special-cases to `Arrays.copyOf` and `Arrays.copyOfRange` to > clone arrays when `newLength` or range inputs span the input array. This > helps eliminate range checks and has been verified to help various String > operations. Example: > > Baseline > > Benchmark(size) Mode Cnt > Score Error Units > StringConstructor.newStringFromArray 7 avgt 15 > 16.817 ± 0.369 ns/op > StringConstructor.newStringFromArrayWithCharset 7 avgt 15 > 16.866 ± 0.449 ns/op > StringConstructor.newStringFromArrayWithCharsetName 7 avgt 15 > 22.198 ± 0.396 ns/op > > Patch: > > Benchmark(size) Mode Cnt > Score Error Units > StringConstructor.newStringFromArray 7 avgt 15 > 14.666 ± 0.336 ns/op > StringConstructor.newStringFromArrayWithCharset 7 avgt 15 > 14.582 ± 0.288 ns/op > StringConstructor.newStringFromArrayWithCharsetName 7 avgt 15 > 20.339 ± 0.328 ns/op Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Minimize, force inline, generalize - Changes: - all: https://git.openjdk.org/jdk/pull/12453/files - new: https://git.openjdk.org/jdk/pull/12453/files/72138930..c906c730 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=12453=06 - incr: https://webrevs.openjdk.org/?repo=jdk=12453=05-06 Stats: 154 lines in 1 file changed: 120 ins; 23 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/12453.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12453/head:pull/12453 PR: https://git.openjdk.org/jdk/pull/12453
Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v4]
On Tue, 7 Feb 2023 13:23:26 GMT, Eirik Bjorsnos wrote: >> After finding a hash match, getEntryPos needs to compare the lookup name up >> to the encoded entry name in the CEN. This comparison is done by decoding >> the entry name into a String. The names can then be compared using the >> String API. This decoding step adds a significat cost to this method. >> >> This PR suggest to update the string comparison such that in the common case >> where both the lookup name and the entry name are encoded in >> ASCII-compatible UTF-8, decoding can be avoided and the byte arrays can >> instead be compared direcly. >> >> ZipCoder is updated with a new method to compare a string with an encoded >> byte array range. The default implementation decodes to string (like the >> current code), while the UTF-8 implementation uses >> JavaLangAccess.getBytesNoRepl to get the bytes. Both methods thes uses >> Arrays.mismatch for comparison with or without matching trailing slashes. >> >> Additionally, this PR suggest to make the following updates to getEntryPos: >> >> - The try/catch for IAE is redundant and can be safely removed. (initCEN >> already checks this and will throws IAE for invalid UTF-8). This seems to >> give a 3-4% speedup on micros) >> - A new test InvalidBytesInEntryNameOrComment is a added to verify that >> initCEN does in fact reject invalid UTF-8 in CEN file names and comments. (I >> found no existing test coverage for this) >> - The recursion when looking for "name/" matches is replaced with iteration. >> We keep track of any "name/" match and return it at the end of the search. >> (I feel this is easier to follow and it also gives a ~30% speedup for >> addSlash lookups with no regression on regular lookups) >> >> (My though is that including these additional updates in this PR might >> reduce reviewer overhead given that it touches the exact same code. I might >> be wrong on this, please advise :) >> >> I'm seeing a ~17% saving on the micro ZipFileGetEntry.getEntryHit (modified >> to use xalan.jar): >> >> Baseline: >> >> Benchmark (size) Mode CntScore Error >> Units >> ZipFileGetEntry.getEntryHit 512 avgt 15 74.941 ± 1.004 >> ns/op >> ZipFileGetEntry.getEntryHit 1024 avgt 15 84.943 ± 1.320 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 512 avgt 15 120.371 ± 2.386 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 126.128 ± 1.075 >> ns/op >> ZipFileGetEntry.getEntryMiss 512 avgt 15 23.818 ± 0.838 >> ns/op >> ZipFileGetEntry.getEntryMiss1024 avgt 15 29.762 ± 5.998 >> ns/op >> ZipFileGetEntry.getEntryMissUncached 512 avgt 15 59.405 ± 0.545 >> ns/op >> ZipFileGetEntry.getEntryMissUncached1024 avgt 15 71.840 ± 2.455 >> ns/op >> ZipFileGetEntry.getEntrySlash512 avgt 15 135.621 ± 4.341 >> ns/op >> ZipFileGetEntry.getEntrySlash 1024 avgt 15 134.190 ± 2.141 >> ns/op >> >> >> >> PR: >> >> >> Benchmark (size) Mode CntScore Error >> Units >> ZipFileGetEntry.getEntryHit 512 avgt 15 62.267 ± 1.329 >> ns/op >> ZipFileGetEntry.getEntryHit 1024 avgt 15 72.916 ± 2.428 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 512 avgt 15 101.630 ± 1.154 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 113.161 ± 0.502 >> ns/op >> ZipFileGetEntry.getEntryMiss 512 avgt 15 23.003 ± 1.191 >> ns/op >> ZipFileGetEntry.getEntryMiss1024 avgt 15 23.236 ± 1.114 >> ns/op >> ZipFileGetEntry.getEntryMissUncached 512 avgt 15 56.781 ± 1.505 >> ns/op >> ZipFileGetEntry.getEntryMissUncached1024 avgt 15 67.767 ± 1.963 >> ns/op >> ZipFileGetEntry.getEntrySlash512 avgt 15 73.745 ± 2.717 >> ns/op >> ZipFileGetEntry.getEntrySlash 1024 avgt 15 75.784 ± 1.051 >> ns/op >> >> >> To assess the impact on startup/warmup, I made a main method class which >> measures the total time of calling ZipFile.getEntry for all entries in the >> 109 JAR file dependenies of spring-petclinic. The shows a nice improvement >> (time in micros): >> >> >> Percentile Baseline Patch >> 50 % 23155 21149 >> 75 % 23598 21454 >> 90 % 23989 21691 >> 95 % 24238 21973 >> 99 % 25270 22446 >> STDEV 792 549 >> Count 500 500 > > Eirik Bjorsnos has updated the pull request with a new target base due to a > merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains 20 additional > commits since the last revision: > > - Merge branch 'master' into getentrypos-prefixmatch > - Fix incorrect offset for the CEN "length of extra field". Fixed spelling > of "invalid". > - Spelling fix in test data for non-ascii
Re: RFR: 8301958: Avoid Arrays.copyOfRange overhead in java.lang.String [v5]
On Tue, 7 Feb 2023 19:10:08 GMT, Francesco Nigro wrote: >> Claes Redestad has updated the pull request incrementally with one >> additional commit since the last revision: >> >> copyrights > > Thanks @cl4es to look into this! @franz1981 idea seems to apply nicely here, and going back and applying it to `Arrays.copyOfRange` end up on top for the common case where `copyOfRange` copies the entire range: Benchmark(size) Mode Cnt Score Error Units StringConstructor.newStringFromArray 7 avgt 15 14.666 ± 0.336 ns/op StringConstructor.newStringFromArrayWithCharset 7 avgt 15 14.582 ± 0.288 ns/op StringConstructor.newStringFromArrayWithCharsetName 7 avgt 15 20.339 ± 0.328 ns/op We might still benefit for some cases to specialize a `copyBytes` method, but this solution might help more cases. If others agree I might take a step back and apply this optimization to all the `copyOfRange` methods and add some microbenchmarking to verify this more widely. - PR: https://git.openjdk.org/jdk/pull/12453
Re: RFR: 8301958: Avoid Arrays.copyOfRange overhead in java.lang.String [v6]
> This adds a local, specialized `copyBytes` method to `String` that avoids > certain redundant range checks and clamping that JIT has issues removing > fully. > > This has a small but statistically significant effect on `String` > microbenchmarks, eg.: > > Baseline > > Benchmark(size) Mode Cnt > Score Error Units > StringConstructor.newStringFromArray 7 avgt 15 > 16.817 ± 0.369 ns/op > StringConstructor.newStringFromArrayWithCharset 7 avgt 15 > 16.866 ± 0.449 ns/op > StringConstructor.newStringFromArrayWithCharsetName 7 avgt 15 > 22.198 ± 0.396 ns/op > > > Patch: > > Benchmark(size) Mode Cnt > Score Error Units > StringConstructor.newStringFromArray 7 avgt 15 > 15.477 ± 0.342 ns/op > StringConstructor.newStringFromArrayWithCharset 7 avgt 15 > 15.557 ± 0.352 ns/op > StringConstructor.newStringFromArrayWithCharsetName 7 avgt 15 > 21.272 ± 0.398 ns/op > > > Care has to be taken to ensure preconditions have been checked when using > `checkBytes`. In the case of `String(AbstractStringBuilder)` there's a > possible pre-existing issue where the constructor might either throw an > exception or truncate the buffer if the builder byte array and length is not > in agreement (theoretically possible if you clear/remove and call > `trimToSize()` concurrently). Adding an explicit check here seem to be the > right thing to do regardless of this RFE. Claes Redestad has updated the pull request incrementally with two additional commits since the last revision: - Apply @franz1981's idea but directly to Arrays.copyOfRange, factoring out range checks and helping JIT pick the best arraycopy adapter - Back out all changes except micro - Changes: - all: https://git.openjdk.org/jdk/pull/12453/files - new: https://git.openjdk.org/jdk/pull/12453/files/cece6f80..72138930 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=12453=05 - incr: https://webrevs.openjdk.org/?repo=jdk=12453=04-05 Stats: 62 lines in 4 files changed: 24 ins; 8 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/12453.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12453/head:pull/12453 PR: https://git.openjdk.org/jdk/pull/12453
Re: RFR: 8301958: Avoid Arrays.copyOfRange overhead in java.lang.String [v5]
On Tue, 7 Feb 2023 19:08:59 GMT, Francesco Nigro wrote: >> Claes Redestad has updated the pull request incrementally with one >> additional commit since the last revision: >> >> copyrights > > src/java.base/share/classes/java/lang/String.java line 698: > >> 696: } >> 697: >> 698: static byte[] copyBytes(byte[] bytes, int offset, int length) { > > Given that the stub generated for array copy seems highly dependent by the > call site constrains, did you tried adding a check for offset == 0 and/or > length == bytes.length? > > If (offset == 0 && bytes.length == length) { > System.arrayCopy(bytes, 0, dst, 0, bytes.length); > // etc etc the other combinations > > This should have different generated stubs with much smaller ASM depending by > the enforced constrains (and shouldn't affect terribly the code size of the > method, given that the stub won't be inlined AFAIK) > > Beware, as noted by others, I'm not suggesting that's the way to fix this, > but it would be interesting to check how much perf we leave on the ground due > to the this supposed "inefficient" stub generation (if that's the issue). I did some quick experiments but saw no clear win from doing anything like this here. Feel free to experiment and see if there's some particular configuration that comes out ahead. FTR I did not intend for this RFE to solve https://bugs.openjdk.org/browse/JDK-8295496 completely, but provide a small, partial win that might possibly clear a path to solving that likely orthogonal issue. - PR: https://git.openjdk.org/jdk/pull/12453
Re: RFR: 8301958: Avoid Arrays.copyOfRange overhead in java.lang.String [v5]
On Tue, 7 Feb 2023 15:25:05 GMT, Claes Redestad wrote: >> This adds a local, specialized `copyBytes` method to `String` that avoids >> certain redundant range checks and clamping that JIT has issues removing >> fully. >> >> This has a small but statistically significant effect on `String` >> microbenchmarks, eg.: >> >> Baseline >> >> Benchmark(size) Mode Cnt >> Score Error Units >> StringConstructor.newStringFromArray 7 avgt 15 >> 16.817 ± 0.369 ns/op >> StringConstructor.newStringFromArrayWithCharset 7 avgt 15 >> 16.866 ± 0.449 ns/op >> StringConstructor.newStringFromArrayWithCharsetName 7 avgt 15 >> 22.198 ± 0.396 ns/op >> >> >> Patch: >> >> Benchmark(size) Mode Cnt >> Score Error Units >> StringConstructor.newStringFromArray 7 avgt 15 >> 15.477 ± 0.342 ns/op >> StringConstructor.newStringFromArrayWithCharset 7 avgt 15 >> 15.557 ± 0.352 ns/op >> StringConstructor.newStringFromArrayWithCharsetName 7 avgt 15 >> 21.272 ± 0.398 ns/op >> >> >> Care has to be taken to ensure preconditions have been checked when using >> `checkBytes`. In the case of `String(AbstractStringBuilder)` there's a >> possible pre-existing issue where the constructor might either throw an >> exception or truncate the buffer if the builder byte array and length is not >> in agreement (theoretically possible if you clear/remove and call >> `trimToSize()` concurrently). Adding an explicit check here seem to be the >> right thing to do regardless of this RFE. > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > copyrights > Rather than splitting right down the middle isn't it more effective to factor out code that would typically not be executed, such as the exception creation + formatting? That additionally allows the JIT to outline such code altogether, allowing more aggressive inlining of the non-exceptional path(s). - PR: https://git.openjdk.org/jdk/pull/12453
Re: RFR: 8301958: Avoid Arrays.copyOfRange overhead in java.lang.String [v5]
On Tue, 7 Feb 2023 19:24:11 GMT, Stuart Marks wrote: > > It might be that the redundant checks in Arrays.copyOfRange would be > > eliminated properly with more inlining, and that the issue here is that the > > affected String constructor is particularly gnarly. This gnarliness is due > > 1) the need to construct the value and coder in one go and 2) the lack of > > multiple return values. Give me a value-based record type so we can split > > apart the constructor into charset-specific methods with zero overhead and > > we might be able to split up the logic into better-sized chunks. > > I'm wondering if another contributing factor to the complexity of this code > is the continued support of the non-compact-String codepaths. This means > there are actually three code paths through every string computation. Do we > need to continue to support the non-compact-string code paths? I'm concerned > about maintainability too. I think most apps have sufficient ASCII/latin1-encodable data to make compact strings a net win. Especially with recent improvements to key intrinsics that has narrowed the gap. I still think turning off compact strings might be beneficial in locales where most strings are UTF-16, but as you say there might be wins to maintainability and code complexity by ripping out `-XX:-CompactStrings` (which could mean a net performance win for all). - PR: https://git.openjdk.org/jdk/pull/12453
Re: RFR: 8301958: Avoid Arrays.copyOfRange overhead in java.lang.String [v5]
On Tue, 7 Feb 2023 18:35:32 GMT, John R Rose wrote: > Our source code is a reference implementation, and people will look at this > change as evidence that `Arrays::copyOfRange` should be hand-inlined by savvy > coders. Surely we could also fix this small performance pothole by improving > C2’s treatment of `Arrays.copyOfRange`. That would benefit all users as well, > not just `String`. That is our preferred way to handle things. > > On the other hand, `String` is an important class and worthy of every tiny > tweak we give it. Do we need this fix now? If so, I suggest putting in a > comment in the code which says something like “normally one would use > Arrays.copyOfRange here, but we get slightly better code in this particular > case”. Also, regarding the bug against the JIT, I suggest that we back out > this change to `String` when that JIT bug is fixed. Perhaps the comment in > `String` should reference that bug. It might be that the redundant checks in `Arrays.copyOfRange` would be eliminated properly with more inlining, and that the issue here is that the affected `String` constructor is particularly gnarly. This gnarliness is due 1) the need to construct the `value` and `coder` in one go and 2) the lack of multiple return values. Give me a value-based record type so we can split apart the constructor into charset-specific methods with zero overhead and we might be able to split up the logic into better-sized chunks. But yes, I will add some commentary to the effect that this should ideally be handled by our JIT, along with comments that the method deliberately avoids safety checks. - PR: https://git.openjdk.org/jdk/pull/12453
Re: RFR: 8301958: Avoid Arrays.copyOfRange overhead in java.lang.String [v5]
> This adds a local, specialized `copyBytes` method to `String` that avoids > certain redundant range checks and clamping that JIT has issues removing > fully. > > This has a small but statistically significant effect on `String` > microbenchmarks, eg.: > > Baseline > > Benchmark(size) Mode Cnt > Score Error Units > StringConstructor.newStringFromArray 7 avgt 15 > 16.817 ± 0.369 ns/op > StringConstructor.newStringFromArrayWithCharset 7 avgt 15 > 16.866 ± 0.449 ns/op > StringConstructor.newStringFromArrayWithCharsetName 7 avgt 15 > 22.198 ± 0.396 ns/op > > > Patch: > > Benchmark(size) Mode Cnt > Score Error Units > StringConstructor.newStringFromArray 7 avgt 15 > 15.477 ± 0.342 ns/op > StringConstructor.newStringFromArrayWithCharset 7 avgt 15 > 15.557 ± 0.352 ns/op > StringConstructor.newStringFromArrayWithCharsetName 7 avgt 15 > 21.272 ± 0.398 ns/op > > > Care has to be taken to ensure preconditions have been checked when using > `checkBytes`. In the case of `String(AbstractStringBuilder)` there's a > possible pre-existing issue where the constructor might either throw an > exception or truncate the buffer if the builder byte array and length is not > in agreement (theoretically possible if you clear/remove and call > `trimToSize()` concurrently). Adding an explicit check here seem to be the > right thing to do regardless of this RFE. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: copyrights - Changes: - all: https://git.openjdk.org/jdk/pull/12453/files - new: https://git.openjdk.org/jdk/pull/12453/files/2e33c27f..cece6f80 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=12453=04 - incr: https://webrevs.openjdk.org/?repo=jdk=12453=03-04 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/12453.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12453/head:pull/12453 PR: https://git.openjdk.org/jdk/pull/12453
Re: RFR: 8301958: Avoid Arrays.copyOfRange overhead in java.lang.String [v3]
On Tue, 7 Feb 2023 14:57:52 GMT, Alan Bateman wrote: >> Claes Redestad has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Update StringLatin1.trim for consistency > > src/java.base/share/classes/java/lang/String.java line 4546: > >> 4544: // To avoid surprises due to data races (which would either >> truncate or throw an exception) >> 4545: // we should check that length <= val.length up front >> 4546: checkOffset(length, val.length); > > I agree a check is needed here but I assume using checkOffset means that > SB::toString could fail with SIOOBE. I wonder if Math.min(length, val.length) > would be better here. Ok. That keeps behavior consistent for most cases and removes a path where we can fail with SIOOBE in the existing code (down `StringUTF16::compress`). - PR: https://git.openjdk.org/jdk/pull/12453
Re: RFR: 8301958: Avoid Arrays.copyOfRange overhead in java.lang.String [v4]
> This adds a local, specialized `copyBytes` method to `String` that avoids > certain redundant range checks and clamping that JIT has issues removing > fully. > > This has a small but statistically significant effect on `String` > microbenchmarks, eg.: > > Baseline > > Benchmark(size) Mode Cnt > Score Error Units > StringConstructor.newStringFromArray 7 avgt 15 > 16.817 ± 0.369 ns/op > StringConstructor.newStringFromArrayWithCharset 7 avgt 15 > 16.866 ± 0.449 ns/op > StringConstructor.newStringFromArrayWithCharsetName 7 avgt 15 > 22.198 ± 0.396 ns/op > > > Patch: > > Benchmark(size) Mode Cnt > Score Error Units > StringConstructor.newStringFromArray 7 avgt 15 > 15.477 ± 0.342 ns/op > StringConstructor.newStringFromArrayWithCharset 7 avgt 15 > 15.557 ± 0.352 ns/op > StringConstructor.newStringFromArrayWithCharsetName 7 avgt 15 > 21.272 ± 0.398 ns/op > > > Care has to be taken to ensure preconditions have been checked when using > `checkBytes`. In the case of `String(AbstractStringBuilder)` there's a > possible pre-existing issue where the constructor might either throw an > exception or truncate the buffer if the builder byte array and length is not > in agreement (theoretically possible if you clear/remove and call > `trimToSize()` concurrently). Adding an explicit check here seem to be the > right thing to do regardless of this RFE. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: truncate rather than throw in String(AbstractStringBuilder), keeping current behavior - Changes: - all: https://git.openjdk.org/jdk/pull/12453/files - new: https://git.openjdk.org/jdk/pull/12453/files/a5e494bc..2e33c27f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=12453=03 - incr: https://webrevs.openjdk.org/?repo=jdk=12453=02-03 Stats: 4 lines in 1 file changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/12453.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12453/head:pull/12453 PR: https://git.openjdk.org/jdk/pull/12453
Re: RFR: 8301958: Avoid Arrays.copyOfRange overhead in java.lang.String [v2]
> This adds a local, specialized `copyBytes` method to `String` that avoids > certain redundant range checks and clamping that JIT has issues removing > fully. > > This has a small but statistically significant effect on `String` > microbenchmarks, eg.: > > Baseline > > Benchmark(size) Mode Cnt > Score Error Units > StringConstructor.newStringFromArray 7 avgt 15 > 16.817 ± 0.369 ns/op > StringConstructor.newStringFromArrayWithCharset 7 avgt 15 > 16.866 ± 0.449 ns/op > StringConstructor.newStringFromArrayWithCharsetName 7 avgt 15 > 22.198 ± 0.396 ns/op > > > Patch: > > Benchmark(size) Mode Cnt > Score Error Units > StringConstructor.newStringFromArray 7 avgt 15 > 15.477 ± 0.342 ns/op > StringConstructor.newStringFromArrayWithCharset 7 avgt 15 > 15.557 ± 0.352 ns/op > StringConstructor.newStringFromArrayWithCharsetName 7 avgt 15 > 21.272 ± 0.398 ns/op > > > Care has to be taken to ensure preconditions have been checked when using > `checkBytes`. In the case of `String(AbstractStringBuilder)` there's a > possible pre-existing issue where the constructor might either throw an > exception or truncate the buffer if the builder byte array and length is not > in agreement (theoretically possible if you clear/remove and call > `trimToSize()` concurrently). Adding an explicit check here seem to be the > right thing to do regardless of this RFE. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: fix StringUTF16.trim bug, rename locals to reduce confusion - Changes: - all: https://git.openjdk.org/jdk/pull/12453/files - new: https://git.openjdk.org/jdk/pull/12453/files/2d1f8e37..5b637e09 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=12453=01 - incr: https://webrevs.openjdk.org/?repo=jdk=12453=00-01 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/12453.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12453/head:pull/12453 PR: https://git.openjdk.org/jdk/pull/12453
Re: RFR: 8301958: Avoid Arrays.copyOfRange overhead in java.lang.String [v3]
> This adds a local, specialized `copyBytes` method to `String` that avoids > certain redundant range checks and clamping that JIT has issues removing > fully. > > This has a small but statistically significant effect on `String` > microbenchmarks, eg.: > > Baseline > > Benchmark(size) Mode Cnt > Score Error Units > StringConstructor.newStringFromArray 7 avgt 15 > 16.817 ± 0.369 ns/op > StringConstructor.newStringFromArrayWithCharset 7 avgt 15 > 16.866 ± 0.449 ns/op > StringConstructor.newStringFromArrayWithCharsetName 7 avgt 15 > 22.198 ± 0.396 ns/op > > > Patch: > > Benchmark(size) Mode Cnt > Score Error Units > StringConstructor.newStringFromArray 7 avgt 15 > 15.477 ± 0.342 ns/op > StringConstructor.newStringFromArrayWithCharset 7 avgt 15 > 15.557 ± 0.352 ns/op > StringConstructor.newStringFromArrayWithCharsetName 7 avgt 15 > 21.272 ± 0.398 ns/op > > > Care has to be taken to ensure preconditions have been checked when using > `checkBytes`. In the case of `String(AbstractStringBuilder)` there's a > possible pre-existing issue where the constructor might either throw an > exception or truncate the buffer if the builder byte array and length is not > in agreement (theoretically possible if you clear/remove and call > `trimToSize()` concurrently). Adding an explicit check here seem to be the > right thing to do regardless of this RFE. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Update StringLatin1.trim for consistency - Changes: - all: https://git.openjdk.org/jdk/pull/12453/files - new: https://git.openjdk.org/jdk/pull/12453/files/5b637e09..a5e494bc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=12453=02 - incr: https://webrevs.openjdk.org/?repo=jdk=12453=01-02 Stats: 9 lines in 2 files changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/12453.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12453/head:pull/12453 PR: https://git.openjdk.org/jdk/pull/12453
RFR: 8301958: Avoid Arrays.copyOfRange overhead in java.lang.String
This adds a local, specialized `copyBytes` method to `String` that avoids certain redundant range checks and clamping that JIT has issues removing fully. This has a small but statistically significant effect on `String` microbenchmarks, eg.: Baseline Benchmark(size) Mode Cnt Score Error Units StringConstructor.newStringFromArray 7 avgt 15 16.817 ± 0.369 ns/op StringConstructor.newStringFromArrayWithCharset 7 avgt 15 16.866 ± 0.449 ns/op StringConstructor.newStringFromArrayWithCharsetName 7 avgt 15 22.198 ± 0.396 ns/op Patch: Benchmark(size) Mode Cnt Score Error Units StringConstructor.newStringFromArray 7 avgt 15 15.477 ± 0.342 ns/op StringConstructor.newStringFromArrayWithCharset 7 avgt 15 15.557 ± 0.352 ns/op StringConstructor.newStringFromArrayWithCharsetName 7 avgt 15 21.272 ± 0.398 ns/op Care has to be taken to ensure preconditions have been checked when using `checkBytes`. In the case of `String(AbstractStringBuilder)` there's a possible pre-existing issue where the constructor might either throw an exception or truncate the buffer if the builder byte array and length is not in agreement (theoretically possible if you clear/remove and call `trimToSize()` concurrently). Adding an explicit check here seem to be the right thing to do regardless of this RFE. - Commit messages: - Include some callsites in StringLatin1/UTF16, rename to copyBytes to disambiguate from generic utility methods, tune microbenchmark - 8301958: Avoid overhead of Arrays.copyOfRange in String Changes: https://git.openjdk.org/jdk/pull/12453/files Webrev: https://webrevs.openjdk.org/?repo=jdk=12453=00 Issue: https://bugs.openjdk.org/browse/JDK-8301958 Stats: 36 lines in 4 files changed: 14 ins; 2 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/12453.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12453/head:pull/12453 PR: https://git.openjdk.org/jdk/pull/12453
Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v3]
On Mon, 6 Feb 2023 16:14:14 GMT, Eirik Bjorsnos wrote: >> After finding a hash match, getEntryPos needs to compare the lookup name up >> to the encoded entry name in the CEN. This comparison is done by decoding >> the entry name into a String. The names can then be compared using the >> String API. This decoding step adds a significat cost to this method. >> >> This PR suggest to update the string comparison such that in the common case >> where both the lookup name and the entry name are encoded in >> ASCII-compatible UTF-8, decoding can be avoided and the byte arrays can >> instead be compared direcly. >> >> ZipCoder is updated with a new method to compare a string with an encoded >> byte array range. The default implementation decodes to string (like the >> current code), while the UTF-8 implementation uses >> JavaLangAccess.getBytesNoRepl to get the bytes. Both methods thes uses >> Arrays.mismatch for comparison with or without matching trailing slashes. >> >> Additionally, this PR suggest to make the following updates to getEntryPos: >> >> - The try/catch for IAE is redundant and can be safely removed. (initCEN >> already checks this and will throws IAE for invalid UTF-8). This seems to >> give a 3-4% speedup on micros) >> - A new test InvalidBytesInEntryNameOrComment is a added to verify that >> initCEN does in fact reject invalid UTF-8 in CEN file names and comments. (I >> found no existing test coverage for this) >> - The recursion when looking for "name/" matches is replaced with iteration. >> We keep track of any "name/" match and return it at the end of the search. >> (I feel this is easier to follow and it also gives a ~30% speedup for >> addSlash lookups with no regression on regular lookups) >> >> (My though is that including these additional updates in this PR might >> reduce reviewer overhead given that it touches the exact same code. I might >> be wrong on this, please advise :) >> >> I'm seeing a ~17% saving on the micro ZipFileGetEntry.getEntryHit (modified >> to use xalan.jar): >> >> Baseline: >> >> Benchmark (size) Mode CntScore Error >> Units >> ZipFileGetEntry.getEntryHit 512 avgt 15 74.941 ± 1.004 >> ns/op >> ZipFileGetEntry.getEntryHit 1024 avgt 15 84.943 ± 1.320 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 512 avgt 15 120.371 ± 2.386 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 126.128 ± 1.075 >> ns/op >> ZipFileGetEntry.getEntryMiss 512 avgt 15 23.818 ± 0.838 >> ns/op >> ZipFileGetEntry.getEntryMiss1024 avgt 15 29.762 ± 5.998 >> ns/op >> ZipFileGetEntry.getEntryMissUncached 512 avgt 15 59.405 ± 0.545 >> ns/op >> ZipFileGetEntry.getEntryMissUncached1024 avgt 15 71.840 ± 2.455 >> ns/op >> ZipFileGetEntry.getEntrySlash512 avgt 15 135.621 ± 4.341 >> ns/op >> ZipFileGetEntry.getEntrySlash 1024 avgt 15 134.190 ± 2.141 >> ns/op >> >> >> >> PR: >> >> >> Benchmark (size) Mode CntScore Error >> Units >> ZipFileGetEntry.getEntryHit 512 avgt 15 62.267 ± 1.329 >> ns/op >> ZipFileGetEntry.getEntryHit 1024 avgt 15 72.916 ± 2.428 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 512 avgt 15 101.630 ± 1.154 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 113.161 ± 0.502 >> ns/op >> ZipFileGetEntry.getEntryMiss 512 avgt 15 23.003 ± 1.191 >> ns/op >> ZipFileGetEntry.getEntryMiss1024 avgt 15 23.236 ± 1.114 >> ns/op >> ZipFileGetEntry.getEntryMissUncached 512 avgt 15 56.781 ± 1.505 >> ns/op >> ZipFileGetEntry.getEntryMissUncached1024 avgt 15 67.767 ± 1.963 >> ns/op >> ZipFileGetEntry.getEntrySlash512 avgt 15 73.745 ± 2.717 >> ns/op >> ZipFileGetEntry.getEntrySlash 1024 avgt 15 75.784 ± 1.051 >> ns/op >> >> >> To assess the impact on startup/warmup, I made a main method class which >> measures the total time of calling ZipFile.getEntry for all entries in the >> 109 JAR file dependenies of spring-petclinic. The shows a nice improvement >> (time in micros): >> >> >> Percentile Baseline Patch >> 50 % 23155 21149 >> 75 % 23598 21454 >> 90 % 23989 21691 >> 95 % 24238 21973 >> 99 % 25270 22446 >> STDEV 792 549 >> Count 500 500 > > Eirik Bjorsnos has updated the pull request incrementally with one additional > commit since the last revision: > > Fix incorrect offset for the CEN "length of extra field". Fixed spelling of > "invalid". Some GHA testing seem to have failed due to infrastructure issues - try merging in master to trigger testing again. - PR: https://git.openjdk.org/jdk/pull/12290
Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v2]
On Mon, 6 Feb 2023 15:21:10 GMT, Eirik Bjorsnos wrote: >> After finding a hash match, getEntryPos needs to compare the lookup name up >> to the encoded entry name in the CEN. This comparison is done by decoding >> the entry name into a String. The names can then be compared using the >> String API. This decoding step adds a significat cost to this method. >> >> This PR suggest to update the string comparison such that in the common case >> where both the lookup name and the entry name are encoded in >> ASCII-compatible UTF-8, decoding can be avoided and the byte arrays can >> instead be compared direcly. >> >> ZipCoder is updated with a new method to compare a string with an encoded >> byte array range. The default implementation decodes to string (like the >> current code), while the UTF-8 implementation uses >> JavaLangAccess.getBytesNoRepl to get the bytes. Both methods thes uses >> Arrays.mismatch for comparison with or without matching trailing slashes. >> >> Additionally, this PR suggest to make the following updates to getEntryPos: >> >> - The try/catch for IAE is redundant and can be safely removed. (initCEN >> already checks this and will throws IAE for invalid UTF-8). This seems to >> give a 3-4% speedup on micros) >> - A new test InvalidBytesInEntryNameOrComment is a added to verify that >> initCEN does in fact reject invalid UTF-8 in CEN file names and comments. (I >> found no existing test coverage for this) >> - The recursion when looking for "name/" matches is replaced with iteration. >> We keep track of any "name/" match and return it at the end of the search. >> (I feel this is easier to follow and it also gives a ~30% speedup for >> addSlash lookups with no regression on regular lookups) >> >> (My though is that including these additional updates in this PR might >> reduce reviewer overhead given that it touches the exact same code. I might >> be wrong on this, please advise :) >> >> I'm seeing a ~17% saving on the micro ZipFileGetEntry.getEntryHit (modified >> to use xalan.jar): >> >> Baseline: >> >> Benchmark (size) Mode CntScore Error >> Units >> ZipFileGetEntry.getEntryHit 512 avgt 15 74.941 ± 1.004 >> ns/op >> ZipFileGetEntry.getEntryHit 1024 avgt 15 84.943 ± 1.320 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 512 avgt 15 120.371 ± 2.386 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 126.128 ± 1.075 >> ns/op >> ZipFileGetEntry.getEntryMiss 512 avgt 15 23.818 ± 0.838 >> ns/op >> ZipFileGetEntry.getEntryMiss1024 avgt 15 29.762 ± 5.998 >> ns/op >> ZipFileGetEntry.getEntryMissUncached 512 avgt 15 59.405 ± 0.545 >> ns/op >> ZipFileGetEntry.getEntryMissUncached1024 avgt 15 71.840 ± 2.455 >> ns/op >> ZipFileGetEntry.getEntrySlash512 avgt 15 135.621 ± 4.341 >> ns/op >> ZipFileGetEntry.getEntrySlash 1024 avgt 15 134.190 ± 2.141 >> ns/op >> >> >> >> PR: >> >> >> Benchmark (size) Mode CntScore Error >> Units >> ZipFileGetEntry.getEntryHit 512 avgt 15 62.267 ± 1.329 >> ns/op >> ZipFileGetEntry.getEntryHit 1024 avgt 15 72.916 ± 2.428 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 512 avgt 15 101.630 ± 1.154 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 113.161 ± 0.502 >> ns/op >> ZipFileGetEntry.getEntryMiss 512 avgt 15 23.003 ± 1.191 >> ns/op >> ZipFileGetEntry.getEntryMiss1024 avgt 15 23.236 ± 1.114 >> ns/op >> ZipFileGetEntry.getEntryMissUncached 512 avgt 15 56.781 ± 1.505 >> ns/op >> ZipFileGetEntry.getEntryMissUncached1024 avgt 15 67.767 ± 1.963 >> ns/op >> ZipFileGetEntry.getEntrySlash512 avgt 15 73.745 ± 2.717 >> ns/op >> ZipFileGetEntry.getEntrySlash 1024 avgt 15 75.784 ± 1.051 >> ns/op >> >> >> To assess the impact on startup/warmup, I made a main method class which >> measures the total time of calling ZipFile.getEntry for all entries in the >> 109 JAR file dependenies of spring-petclinic. The shows a nice improvement >> (time in micros): >> >> >> Percentile Baseline Patch >> 50 % 23155 21149 >> 75 % 23598 21454 >> 90 % 23989 21691 >> 95 % 24238 21973 >> 99 % 25270 22446 >> STDEV 792 549 >> Count 500 500 > > Eirik Bjorsnos has updated the pull request incrementally with one additional > commit since the last revision: > > Spelling fix in test data for non-ascii latin1 string I think this looks good. Glad we can get this done without adding even more ways to peek into `String` internals. test/jdk/java/util/zip/ZipFile/InvalidBytesInEntryNameOrComment.java line 158: > 156: int coff = off + CEN_HDR + nlen + elen; > 157: > 158: // Write
Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v2]
On Mon, 6 Feb 2023 15:21:10 GMT, Eirik Bjorsnos wrote: >> After finding a hash match, getEntryPos needs to compare the lookup name up >> to the encoded entry name in the CEN. This comparison is done by decoding >> the entry name into a String. The names can then be compared using the >> String API. This decoding step adds a significat cost to this method. >> >> This PR suggest to update the string comparison such that in the common case >> where both the lookup name and the entry name are encoded in >> ASCII-compatible UTF-8, decoding can be avoided and the byte arrays can >> instead be compared direcly. >> >> ZipCoder is updated with a new method to compare a string with an encoded >> byte array range. The default implementation decodes to string (like the >> current code), while the UTF-8 implementation uses >> JavaLangAccess.getBytesNoRepl to get the bytes. Both methods thes uses >> Arrays.mismatch for comparison with or without matching trailing slashes. >> >> Additionally, this PR suggest to make the following updates to getEntryPos: >> >> - The try/catch for IAE is redundant and can be safely removed. (initCEN >> already checks this and will throws IAE for invalid UTF-8). This seems to >> give a 3-4% speedup on micros) >> - A new test InvalidBytesInEntryNameOrComment is a added to verify that >> initCEN does in fact reject invalid UTF-8 in CEN file names and comments. (I >> found no existing test coverage for this) >> - The recursion when looking for "name/" matches is replaced with iteration. >> We keep track of any "name/" match and return it at the end of the search. >> (I feel this is easier to follow and it also gives a ~30% speedup for >> addSlash lookups with no regression on regular lookups) >> >> (My though is that including these additional updates in this PR might >> reduce reviewer overhead given that it touches the exact same code. I might >> be wrong on this, please advise :) >> >> I'm seeing a ~17% saving on the micro ZipFileGetEntry.getEntryHit (modified >> to use xalan.jar): >> >> Baseline: >> >> Benchmark (size) Mode CntScore Error >> Units >> ZipFileGetEntry.getEntryHit 512 avgt 15 74.941 ± 1.004 >> ns/op >> ZipFileGetEntry.getEntryHit 1024 avgt 15 84.943 ± 1.320 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 512 avgt 15 120.371 ± 2.386 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 126.128 ± 1.075 >> ns/op >> ZipFileGetEntry.getEntryMiss 512 avgt 15 23.818 ± 0.838 >> ns/op >> ZipFileGetEntry.getEntryMiss1024 avgt 15 29.762 ± 5.998 >> ns/op >> ZipFileGetEntry.getEntryMissUncached 512 avgt 15 59.405 ± 0.545 >> ns/op >> ZipFileGetEntry.getEntryMissUncached1024 avgt 15 71.840 ± 2.455 >> ns/op >> ZipFileGetEntry.getEntrySlash512 avgt 15 135.621 ± 4.341 >> ns/op >> ZipFileGetEntry.getEntrySlash 1024 avgt 15 134.190 ± 2.141 >> ns/op >> >> >> >> PR: >> >> >> Benchmark (size) Mode CntScore Error >> Units >> ZipFileGetEntry.getEntryHit 512 avgt 15 62.267 ± 1.329 >> ns/op >> ZipFileGetEntry.getEntryHit 1024 avgt 15 72.916 ± 2.428 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 512 avgt 15 101.630 ± 1.154 >> ns/op >> ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 113.161 ± 0.502 >> ns/op >> ZipFileGetEntry.getEntryMiss 512 avgt 15 23.003 ± 1.191 >> ns/op >> ZipFileGetEntry.getEntryMiss1024 avgt 15 23.236 ± 1.114 >> ns/op >> ZipFileGetEntry.getEntryMissUncached 512 avgt 15 56.781 ± 1.505 >> ns/op >> ZipFileGetEntry.getEntryMissUncached1024 avgt 15 67.767 ± 1.963 >> ns/op >> ZipFileGetEntry.getEntrySlash512 avgt 15 73.745 ± 2.717 >> ns/op >> ZipFileGetEntry.getEntrySlash 1024 avgt 15 75.784 ± 1.051 >> ns/op >> >> >> To assess the impact on startup/warmup, I made a main method class which >> measures the total time of calling ZipFile.getEntry for all entries in the >> 109 JAR file dependenies of spring-petclinic. The shows a nice improvement >> (time in micros): >> >> >> Percentile Baseline Patch >> 50 % 23155 21149 >> 75 % 23598 21454 >> 90 % 23989 21691 >> 95 % 24238 21973 >> 99 % 25270 22446 >> STDEV 792 549 >> Count 500 500 > > Eirik Bjorsnos has updated the pull request incrementally with one additional > commit since the last revision: > > Spelling fix in test data for non-ascii latin1 string test/jdk/java/util/zip/ZipFile/InvalidBytesInEntryNameOrComment.java line 153: > 151: // Need to skip past the length of the name and extra fields > 152: int nlen = buffer.getShort(off + NLEN); > 153: int elen = buffer.getShort(off + NLEN); Is this supposed to
Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v2]
On Mon, 6 Feb 2023 15:14:37 GMT, Eirik Bjorsnos wrote: >> Eirik Bjorsnos has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Spelling fix in test data for non-ascii latin1 string > > test/jdk/java/util/zip/ZipFile/TestZipFileEncodings.java line 126: > >> 124: // latin1, but not ASCII >> 125: String entryName = "smörgårdsbord"; >> 126: > > @cl4es Are we ok with non-ASCII in source files? I'd hate to escape this ;-) As long as the file is UTF-8 encoded then I think it should be fine. Thanks for fixing the spelling before I could remark on it! :D - PR: https://git.openjdk.org/jdk/pull/12290
Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos
On Mon, 6 Feb 2023 12:01:19 GMT, Eirik Bjorsnos wrote: >> Nice, I have updated the PR such that the new shared secret is replaced with >> using getBytesNoRepl instead. If there is a performance difference, it seems >> to hide in the noise. >> >> I had expected such a regression to be caught by existing tests, which seems >> not to be the case. I added TestZipFileEncodings.latin1NotAscii to adress >> this. > > getBytesNoRepl throws CharacterCodingException "for malformed input or > unmappable characters". > > This should never happen since initCEN should already reject it. If it should > happen anyway, I return NO_MATCH which will ignore the match just like the > catch in getEntryPos currently does. Yes, this should be fine. - PR: https://git.openjdk.org/jdk/pull/12290
Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos
On Mon, 30 Jan 2023 10:32:38 GMT, Eirik Bjorsnos wrote: > After finding a hash match, getEntryPos needs to compare the lookup name up > to the encoded entry name in the CEN. This comparison is done by decoding the > entry name into a String. The names can then be compared using the String > API. This decoding step adds a significat cost to this method. > > This PR suggest to update the string comparison such that in the common case > where both the lookup name and the entry name are encoded in > ASCII-compatible UTF-8, decoding can be avoided and the byte arrays can > instead be compared direcly. > > ZipCoder is updated with a new method to compare a string with an encoded > byte array range. The default implementation decodes to string (like the > current code), while the UTF-8 implementation uses > JavaLangAccess.getBytesNoRepl to get the bytes. Both methods thes uses > Arrays.mismatch for comparison with or without matching trailing slashes. > > Additionally, this PR suggest to make the following updates to getEntryPos: > > - The try/catch for IAE is redundant and can be safely removed. (initCEN > already checks this and will throws IAE for invalid UTF-8). This seems to > give a 3-4% speedup on micros) > - A new test InvalidBytesInEntryNameOrComment is a added to verify that > initCEN does in fact reject invalid UTF-8 in CEN file names and comments. (I > found no existing test coverage for this) > - The recursion when looking for "name/" matches is replaced with iteration. > We keep track of any "name/" match and return it at the end of the search. (I > feel this is easier to follow and it also gives a ~30% speedup for addSlash > lookups with no regression on regular lookups) > > (My though is that including these additional updates in this PR might reduce > reviewer overhead given that it touches the exact same code. I might be wrong > on this, please advise :) > > I'm seeing a ~17% saving on the micro ZipFileGetEntry.getEntryHit (modified > to use xalan.jar): > > Baseline: > > Benchmark (size) Mode CntScore Error > Units > ZipFileGetEntry.getEntryHit 512 avgt 15 74.941 ± 1.004 > ns/op > ZipFileGetEntry.getEntryHit 1024 avgt 15 84.943 ± 1.320 > ns/op > ZipFileGetEntry.getEntryHitUncached 512 avgt 15 120.371 ± 2.386 > ns/op > ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 126.128 ± 1.075 > ns/op > ZipFileGetEntry.getEntryMiss 512 avgt 15 23.818 ± 0.838 > ns/op > ZipFileGetEntry.getEntryMiss1024 avgt 15 29.762 ± 5.998 > ns/op > ZipFileGetEntry.getEntryMissUncached 512 avgt 15 59.405 ± 0.545 > ns/op > ZipFileGetEntry.getEntryMissUncached1024 avgt 15 71.840 ± 2.455 > ns/op > ZipFileGetEntry.getEntrySlash512 avgt 15 135.621 ± 4.341 > ns/op > ZipFileGetEntry.getEntrySlash 1024 avgt 15 134.190 ± 2.141 > ns/op > > > > PR: > > > Benchmark (size) Mode CntScore Error > Units > ZipFileGetEntry.getEntryHit 512 avgt 15 62.267 ± 1.329 > ns/op > ZipFileGetEntry.getEntryHit 1024 avgt 15 72.916 ± 2.428 > ns/op > ZipFileGetEntry.getEntryHitUncached 512 avgt 15 101.630 ± 1.154 > ns/op > ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 113.161 ± 0.502 > ns/op > ZipFileGetEntry.getEntryMiss 512 avgt 15 23.003 ± 1.191 > ns/op > ZipFileGetEntry.getEntryMiss1024 avgt 15 23.236 ± 1.114 > ns/op > ZipFileGetEntry.getEntryMissUncached 512 avgt 15 56.781 ± 1.505 > ns/op > ZipFileGetEntry.getEntryMissUncached1024 avgt 15 67.767 ± 1.963 > ns/op > ZipFileGetEntry.getEntrySlash512 avgt 15 73.745 ± 2.717 > ns/op > ZipFileGetEntry.getEntrySlash 1024 avgt 15 75.784 ± 1.051 > ns/op > > > To assess the impact on startup/warmup, I made a main method class which > measures the total time of calling ZipFile.getEntry for all entries in the > 109 JAR file dependenies of spring-petclinic. The shows a nice improvement > (time in micros): > > > Percentile Baseline Patch > 50 % 23155 21149 > 75 % 23598 21454 > 90 % 23989 21691 > 95 % 24238 21973 > 99 % 25270 22446 > STDEV 792 549 > Count 500 500 Filed JDK-8301873 for this, update PR title when you're ready. src/java.base/share/classes/java/lang/System.java line 2668: > 2666: @Override > 2667: public int mismatchUTF8(String str, byte[] b, int > fromIndex, int toIndex) { > 2668: byte[] encoded = str.isLatin1() ? str.value() : > str.getBytes(UTF_8.INSTANCE); I think this is incorrect: latin-1 characters above codepoint 127 (non-ascii) would be represented by 2 bytes in UTF-8. What you want here is probably `str.isAscii() ? ...`. The ASCII check will have to
Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos
On Mon, 6 Feb 2023 14:27:36 GMT, Claes Redestad wrote: >> Interesting. Would be nice to solve this in the JIT! >> >> This disabled code got deleted in my last commit, but it seems like you have >> a good analysis so we can let it go now. > > Right. I might have fumbled this experiment a bit, and perhaps your setup > would inline and then eliminate some of the known-in-range checks already. > > Though we have intrinsified some of the `Preconditions.check*` methods in the > past to help improve range checks, but the `checkFromToIndex` method that > would be applicable here has not been intrinsified. It might be a reasonable > path forward to replace `Arrays.rangeCheck` with > `Preconditions.checkFromToIndex` and then look at intrinsifying that method > to help eliminating or optimizing some of the checks. Nevermind, I had a flaw in my experiment and seems the first range check in a call like `Arrays.mismatch(encoded, 0, encoded.length, b, off, off+len);` should be eliminated. So perhaps you're seeing the cost of the second range check, which might be unavoidable to be safe (zip meta data could otherwise be doctored to try and perform out of bounds reads via intrinsified code) - PR: https://git.openjdk.org/jdk/pull/12290
Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos
On Mon, 6 Feb 2023 11:56:22 GMT, Eirik Bjorsnos wrote: >> src/java.base/share/classes/java/lang/System.java line 2671: >> >>> 2669: if (false) { >>> 2670: // Arrays.mismatch without the range checks (~5% >>> faster micro getEntryHit) >>> 2671: int aLength = encoded.length; >> >> Part of the difference you're seeing is due to knowing that you'll be >> matching the entire length of the first array (`encoded, 0, encoded.length`). >> >> As an experiment I added `Arrays.mismatch(byte[], byte[], int, int)` to >> mismatch the entire range of the first array argument vs the range of the >> second and can spot an improvement in affected micros: >> >> Benchmark (size) Mode Cnt Score >> Error Units >> ArraysMismatch.Char.differentSubrangeMatches 90 avgt 10 12.165 ± >> 0.074 ns/op # mismatch(a, aFrom, aTo, b, bFrom, bTo) >> ArraysMismatch.Char.subrangeMatches 90 avgt 10 10.748 ± >> 0.006 ns/op # mismatch(a, b, bFrom, bTo) >> >> This might be something we can solve in the JITs without having to add new >> methods to `java.util.Arrays` to deal as efficiently as possible with the >> case when we're matching against the entirety of one of the arrays. > > Interesting. Would be nice to solve this in the JIT! > > This disabled code got deleted in my last commit, but it seems like you have > a good analysis so we can let it go now. Right. I might have fumbled this experiment a bit, and perhaps your setup would inline and then eliminate some of the known-in-range checks already. Though we have intrinsified some of the `Preconditions.check*` methods in the past to help improve range checks, but the `checkFromToIndex` method that would be applicable here has not been intrinsified. It might be a reasonable path forward to replace `Arrays.rangeCheck` with `Preconditions.checkFromToIndex` and then look at intrinsifying that method to help eliminating or optimizing some of the checks. - PR: https://git.openjdk.org/jdk/pull/12290
Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v7]
On Thu, 2 Feb 2023 15:33:29 GMT, Scott Gibbons wrote: >> Names are important, but always hard to get right. At the very least they >> need to be correct. Maybe call it something like >> `..parameterized_decode_tables..` and the other `..shared_decode_tables..`? > > I prefer leaving them the way they are. I don't think the names, along with > the associated comments within the tables, causes any undue confusion as to > their function. However I will implement the name change if that's all it > takes to procure a review approval. Please provide the specific names you'd > like me to use and I'll change them. Or just approve as-is :-). I meant no disrespect here. By your own words `base64_AVX2_decode_URL_tables_addr` is "essentially incorrect", so I suggested some alternatives that I thought would make the code slightly more approachable. Regardless of whether you fix this detail I am not ready to approve this PR since I would need time to digest the latest changes. I'll ask a more senior engineer to review and give final approval (these changes need 2 reviewers approval anyhow). - PR: https://git.openjdk.org/jdk/pull/12126
Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v7]
On Wed, 1 Feb 2023 20:59:24 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2202: >> >>> 2200: } >>> 2201: >>> 2202: address StubGenerator::base64_AVX2_decode_URL_tables_addr() { >> >> Shouldn't this be `decode_lut_tables`? As it's used for URL and non-URL >> decoding alike. > > These tables are used for both URL and non-URL based on the parameter, and > they are only two of the three lut tables used (the other is in > `base64_AVX2_decode_tables_addr` ). Both names are essentially incorrect. > Does the name really matter that much? It's the same as > `base64_AVX2_decode_tables_addr` with the addition of URL tables. Names are important, but always hard to get right. At the very least they need to be correct. Maybe call it something like `..parameterized_decode_tables..` and the other `..shared_decode_tables..`? - PR: https://git.openjdk.org/jdk/pull/12126
Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v7]
On Wed, 1 Feb 2023 18:28:25 GMT, Scott Gibbons wrote: >> Added code for Base64 acceleration (encode and decode) which will accelerate >> ~4x for AVX2 platforms. >> >> Encode performance: >> **Old:** >> >> Benchmark (maxNumBytes) Mode Cnt Score Error >> Units >> Base64Encode.testBase64Encode 1024 thrpt3 4309.439 ± 2.632 >> ops/ms >> >> >> **New:** >> >> Benchmark (maxNumBytes) Mode Cnt Score >> Error Units >> Base64Encode.testBase64Encode 1024 thrpt3 24211.397 ± >> 102.026 ops/ms >> >> >> Decode performance: >> **Old:** >> >> Benchmark (errorIndex) (lineSize) (maxNumBytes) >> Mode Cnt ScoreError Units >> Base64Decode.testBase64Decode 144 4 1024 >> thrpt3 3961.768 ± 93.409 ops/ms >> >> **New:** >> Benchmark (errorIndex) (lineSize) (maxNumBytes) >> Mode Cnt ScoreError Units >> Base64Decode.testBase64Decode 144 4 1024 >> thrpt3 14738.051 ± 24.383 ops/ms > > Scott Gibbons has updated the pull request incrementally with one additional > commit since the last revision: > > Handle AVX2 URL; address review comments src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2202: > 2200: } > 2201: > 2202: address StubGenerator::base64_AVX2_decode_URL_tables_addr() { Shouldn't this be `decode_lut_tables`? As it's used for URL and non-URL decoding alike. - PR: https://git.openjdk.org/jdk/pull/12126
Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v6]
On Fri, 27 Jan 2023 21:36:29 GMT, Claes Redestad wrote: >> Scott Gibbons has updated the pull request with a new target base due to a >> merge or a rebase. The incremental webrev excludes the unrelated changes >> brought in by the merge/rebase. The pull request contains 13 additional >> commits since the last revision: >> >> - Merge branch 'openjdk:master' into Base64-AVX2 >> - Merge branch 'openjdk:master' into Base64-AVX2 >> - Merge branch 'openjdk:master' into Base64-AVX2 >> - Merge branch 'Base64-AVX2' of https://github.com/asgibbons/jdk into >> Base64-AVX2 >> - Merge branch 'openjdk:master' into Base64-AVX2 >> - Address review comment >> - Remove whitespace >> - Fix wrong register usage >> - Working version of Base64 decode with AVX2 (4x perf improvement). No URL >> support >> - Merge branch 'Base64-AVX2' of https://github.com/asgibbons/jdk into >> Base64-AVX2 >> - ... and 3 more: https://git.openjdk.org/jdk/compare/50dc1196...3e66f7be > > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2643: > >> 2641: // Handle isURL / MIME?!?! r12 is used for length calculation >> (from out) >> 2642: // >> 2643: // rbx is out, r12 is saved out, rdx is size, rsi is src > > It seems that on windows `r12` is in use, see line 2323. GHA seem to be > having some trouble finishing Windows testing on time - could there be some > issue here? Nevermind, you're not using r12 and GHA Windows testing is green now. - PR: https://git.openjdk.org/jdk/pull/12126
Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v6]
On Fri, 27 Jan 2023 18:31:50 GMT, Scott Gibbons wrote: >> Added code for Base64 acceleration (encode and decode) which will accelerate >> ~4x for AVX2 platforms. >> >> Encode performance: >> **Old:** >> >> Benchmark (maxNumBytes) Mode Cnt Score Error >> Units >> Base64Encode.testBase64Encode 1024 thrpt3 4309.439 ± 2.632 >> ops/ms >> >> >> **New:** >> >> Benchmark (maxNumBytes) Mode Cnt Score >> Error Units >> Base64Encode.testBase64Encode 1024 thrpt3 24211.397 ± >> 102.026 ops/ms >> >> >> Decode performance: >> **Old:** >> >> Benchmark (errorIndex) (lineSize) (maxNumBytes) >> Mode Cnt ScoreError Units >> Base64Decode.testBase64Decode 144 4 1024 >> thrpt3 3961.768 ± 93.409 ops/ms >> >> **New:** >> Benchmark (errorIndex) (lineSize) (maxNumBytes) >> Mode Cnt ScoreError Units >> Base64Decode.testBase64Decode 144 4 1024 >> thrpt3 14738.051 ± 24.383 ops/ms > > Scott Gibbons has updated the pull request with a new target base due to a > merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains 13 additional > commits since the last revision: > > - Merge branch 'openjdk:master' into Base64-AVX2 > - Merge branch 'openjdk:master' into Base64-AVX2 > - Merge branch 'openjdk:master' into Base64-AVX2 > - Merge branch 'Base64-AVX2' of https://github.com/asgibbons/jdk into > Base64-AVX2 > - Merge branch 'openjdk:master' into Base64-AVX2 > - Address review comment > - Remove whitespace > - Fix wrong register usage > - Working version of Base64 decode with AVX2 (4x perf improvement). No URL > support > - Merge branch 'Base64-AVX2' of https://github.com/asgibbons/jdk into > Base64-AVX2 > - ... and 3 more: https://git.openjdk.org/jdk/compare/50dc1196...3e66f7be src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2159: > 2157: } > 2158: > 2159: address StubGenerator::base64_AVX2_tables_addr() { A casual reader might wonder why there's 3 other avx2-tables and then this, so for readability it'd be nice to add "decode" to the name of this new table. src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2643: > 2641: // Handle isURL / MIME?!?! r12 is used for length calculation > (from out) > 2642: // > 2643: // rbx is out, r12 is saved out, rdx is size, rsi is src It seems that on windows `r12` is in use, see line 2323. GHA seem to be having some trouble finishing Windows testing on time - could there be some issue here? src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2647: > 2645: > 2646: > 2647: // *** NEEDS TO BE FIXED While it's fine to leave implementation of `getMimeDecoder` and `getUrlDecoder` for a follow-up, I think these comments needs to be cleaned up. E.g. a follow-up RFE filed and referenced. src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2651: > 2649: __ jcc(Assembler::notZero, L_tailProc); > 2650: > 2651: __ cmpl(length, 44); Perform `length` checks first to avoid unnecessary branches on small inputs? Ideal might be to move this length check up just before the `_cmpl(length, 128)` in the AVX-512 block, so that if `AVX=3` short inputs branch directly to the scalar tail procedure without jumping around. This might also apply to the encode stub, though that's pre-existing. src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2673: > 2671: __ vmovdqu(xmm13, Address(r13, 0xc8)); > 2672: __ vmovdqu(xmm12, Address(r13, 0x08)); > 2673: __ jmp(L_enterLoop); This got me curious (sorry!) and maybe there's something going on here that I'm not getting... But why have you split the loop apart and jump into the middle of it? It'd seem not doing this would be more straightforward, with no difference semantically and one less jump? (align32 should just translate to a narrow nop instruction, if anything) - PR: https://git.openjdk.org/jdk/pull/12126
Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2
On Mon, 23 Jan 2023 18:14:16 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2661: >> >>> 2659: __ vpbroadcastq(xmm4, Address(r13, 0), Assembler::AVX_256bit); >>> 2660: __ vmovdqu(xmm11, Address(r13, 0x28)); >>> 2661: __ vpbroadcastb(xmm10, Address(r13, 0), Assembler::AVX_256bit); >> >> Sorry in advance since I'm probably reading this wrong: the data that `r13` >> is pointing to appears to be a repeated byte pattern (`0x2f2f2f...`), does >> this mean this `vpbroadcastb` and the `vpbroadcastq` above end up filling up >> their respective registers with the exact same bits? If so, and since >> neither of them is mutated in the code below, then perhaps this can be >> simplified a bit. > > You're reading it correctly - this is redundant and could be handled > differently, as the same value is being loaded into ymm4 and ymm10. I don't > think there will be any significant performance gain either way. This was > done in this manner to allow easier transition to URL acceleration when it is > implemented, as URLs require handling '-' and '_' instead of '+' and '/' ('/' > = 0x2f). I was mainly curious if there was some obscure detail or difference that eluded me. It wouldn't be the first time! As it's outside of the loop you're probably right about it not being very important to overall performance, though we should be mindful of setup overheads of transitioning into AVX code. Especially since inputs likely are skewed towards the smallest applicable lengths. I think it would be prudent to run and check the microbenchmark with values near the AVX2 threshold, such as `maxNumBytes = 48`. If you choose to keep the code as-is would you mind documenting the rationale behind the redundancy? (Is there WIP on more generalized URL acceleration that could be referenced?) - PR: https://git.openjdk.org/jdk/pull/12126
Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2
On Sat, 21 Jan 2023 00:15:10 GMT, Scott Gibbons wrote: > Added code for Base64 acceleration (encode and decode) which will accelerate > ~4x for AVX2 platforms. > > Encode performance: > **Old:** > > Benchmark (maxNumBytes) Mode Cnt Score Error > Units > Base64Encode.testBase64Encode 1024 thrpt3 4309.439 ± 2.632 > ops/ms > > > **New:** > > Benchmark (maxNumBytes) Mode Cnt Score Error > Units > Base64Encode.testBase64Encode 1024 thrpt3 24211.397 ± 102.026 > ops/ms > > > Decode performance: > **Old:** > > Benchmark (errorIndex) (lineSize) (maxNumBytes) Mode > Cnt ScoreError Units > Base64Decode.testBase64Decode 144 4 1024 thrpt >3 3961.768 ± 93.409 ops/ms > > **New:** > Benchmark (errorIndex) (lineSize) (maxNumBytes) Mode > Cnt ScoreError Units > Base64Decode.testBase64Decode 144 4 1024 thrpt >3 14738.051 ± 24.383 ops/ms src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2661: > 2659: __ vpbroadcastq(xmm4, Address(r13, 0), Assembler::AVX_256bit); > 2660: __ vmovdqu(xmm11, Address(r13, 0x28)); > 2661: __ vpbroadcastb(xmm10, Address(r13, 0), Assembler::AVX_256bit); Sorry in advance since I'm probably reading this wrong: the data that `r13` is pointing to appears to be a repeated byte pattern (`0x2f2f2f...`), does this mean this `vpbroadcastb` and the `vpbroadcastq` above end up filling up their respective registers with the exact same bits? If so, and since neither of them is mutated in the code below, then perhaps this can be simplified a bit. - PR: https://git.openjdk.org/jdk/pull/12126
Re: RFR: 8300818: Reduce complexity of padding with DateTimeFormatter [v2]
On Sun, 22 Jan 2023 13:28:06 GMT, Sergey Tsypanov wrote: >> src/java.base/share/classes/java/time/format/DateTimeFormatterBuilder.java >> line 2611: >> >>> 2609: "Cannot print as output of " + len + " characters >>> exceeds pad width of " + padWidth); >>> 2610: } >>> 2611: buf.insert(preLen, >>> String.valueOf(padChar).repeat(padWidth - len)); >> >> Have you checked with a microbenchmark that this added allocation can be >> elided by JITs and that there's a significant speed-up with your changes? I >> don't have the necessary domain expertise to assert anything here but I >> suspect padding widths are typically short. Such as 2 or 4 (for day/month >> and year fields) so a micro should examine there's no regression for little >> to no padding. Unlike the original code you call `insert` even if `padWidth >> - len == 0`. This might be optimized away by JITs, but it'd be good to >> verify which is best. > > The modified code is called only when a user explicitly calls `padNext(int, > char)`, i.e. if I modified the example snippet as > > DateTimeFormatter dtf = new DateTimeFormatterBuilder() > .appendLiteral("Date:") > //.padNext(20, ' ') > .append(DateTimeFormatter.ISO_DATE) > .toFormatter(); > String text = dtf.format(LocalDateTime.now()); > > there's no regression. > > Right now I cannot build ad-hoc JDK with my changes and check it with JMH, so > I'd convert this to draft until I can verify it Meant that you should verify that something like this, which just add a little padding, doesn't regress with your changes: DateTimeFormatter dtf = new DateTimeFormatterBuilder() .appendLiteral("Year:") .padNext(5) .appendValue(ChronoField.YEAR) .toFormatter(); ... dtf.format(LocalDateTime.now()); And similar for effectively no padding (`.padNext(4)` in the above example). As this API might often be used to ensure short 2-4 char fields are correctly padded I think it's important that we're not adding overhead to such use cases. - PR: https://git.openjdk.org/jdk/pull/12131
Re: RFR: 8300818: Reduce complexity of padding with DateTimeFormatter [v2]
On Sun, 22 Jan 2023 09:50:21 GMT, Sergey Tsypanov wrote: >> Currently it's O(n) - we do `n` shifts of bytes within `StringBuilder`. This >> can be reduced to O(1) improving the code like: >> >> DateTimeFormatter dtf = new DateTimeFormatterBuilder() >> .appendLiteral("Date:") >> .padNext(20, ' ') >> .append(DateTimeFormatter.ISO_DATE) >> .toFormatter(); >> String text = dtf.format(LocalDateTime.now()); > > Sergey Tsypanov has updated the pull request with a new target base due to a > merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains two additional > commits since the last revision: > > - Merge branch 'master' into dtfb > - Improve padding of DateTimeFormatter Changes requested by redestad (Reviewer). src/java.base/share/classes/java/time/format/DateTimeFormatterBuilder.java line 2603: > 2601: public boolean format(DateTimePrintContext context, > StringBuilder buf) { > 2602: int preLen = buf.length(); > 2603: if (!printerParser.format(context, buf)) { Non-standard as it may be, this style of using `expr == false` instead of `!expr` is a style choice by the original author. I would advice against changing these piecemeal without discussion and agreement that we should consistently enforce the `!expr` style. src/java.base/share/classes/java/time/format/DateTimeFormatterBuilder.java line 2611: > 2609: "Cannot print as output of " + len + " characters > exceeds pad width of " + padWidth); > 2610: } > 2611: buf.insert(preLen, String.valueOf(padChar).repeat(padWidth > - len)); Have you checked with a microbenchmark that this added allocation can be elided by JITs and that there's a significant speed-up with your changes? I don't have the necessary domain expertise to assert anything here but I suspect padding widths are typically short. Such as 2 or 4 (for day/month and year fields) so a micro should examine there's no regression for little to no padding. Unlike the original code you call `insert` even if `padWidth - len == 0`. This might be optimized away by JITs, but it'd be good to verify which is best. - PR: https://git.openjdk.org/jdk/pull/12131
Integrated: 8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder
On Wed, 18 Jan 2023 16:53:04 GMT, Claes Redestad wrote: > `ZipCoder::checkedHashCode` emulates `StringLatin1::hashCode` but operates on > a `byte[]` subrange. It can profitably use the recently introduced > `ArraysSupport::vectorizedHashCode` method to see a speed-up, which > translates to a small but significant speed-up on `ZipFile` creation. > > Before: > > Benchmark (size) Mode Cnt Score Error Units > ZipFileOpen.openCloseZipFile 512 avgt 15 83007.325 ± 1446.716 ns/op > ZipFileOpen.openCloseZipFile1024 avgt 15 154550.631 ± 2166.673 ns/op > > After: > > Benchmark (size) Mode Cnt Score Error Units > ZipFileOpen.openCloseZipFile 512 avgt 15 79512.902 ± 814.449 ns/op > ZipFileOpen.openCloseZipFile1024 avgt 15 147892.522 ± 2744.017 ns/op This pull request has now been integrated. Changeset: bb42e61a Author:Claes Redestad URL: https://git.openjdk.org/jdk/commit/bb42e61a6176a7f4f9485efa47a248b23b09a16d Stats: 120 lines in 4 files changed: 103 ins; 8 del; 9 mod 8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder Reviewed-by: alanb, lancea - PR: https://git.openjdk.org/jdk/pull/12077
Re: RFR: 8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder [v2]
On Fri, 20 Jan 2023 16:42:47 GMT, Lance Andersen wrote: >> Claes Redestad has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Vary dir and entry name lengths across a wider spread, keeping most >> entries short but making the longest paths longer > > Thank you Claes Thanks @LanceAndersen and @AlanBateman for reviewing! - PR: https://git.openjdk.org/jdk/pull/12077
Re: RFR: 8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder
On Wed, 18 Jan 2023 16:53:04 GMT, Claes Redestad wrote: > `ZipCoder::checkedHashCode` emulates `StringLatin1::hashCode` but operates on > a `byte[]` subrange. It can profitably use the recently introduced > `ArraysSupport::vectorizedHashCode` method to see a speed-up, which > translates to a small but significant speed-up on `ZipFile` creation. > > Before: > > Benchmark (size) Mode Cnt Score Error Units > ZipFileOpen.openCloseZipFile 512 avgt 15 83007.325 ± 1446.716 ns/op > ZipFileOpen.openCloseZipFile1024 avgt 15 154550.631 ± 2166.673 ns/op > > After: > > Benchmark (size) Mode Cnt Score Error Units > ZipFileOpen.openCloseZipFile 512 avgt 15 79512.902 ± 814.449 ns/op > ZipFileOpen.openCloseZipFile1024 avgt 15 147892.522 ± 2744.017 ns/op Updated micro to vary entry sizes more to ensure we exercise the different code paths through the `hashCode` intrinsic. The new setup generates both longer and shorter entries than before, weighting up the average length a bit by increasing the spread. The longer entries see a proportionately larger speed-up, as expected since they benefit from vectorization. Removed some pointless randomness. Baseline: Benchmark (size) Mode Cnt Score Error Units ZipFileOpen.openCloseZipFile 512 avgt 15 98832.801 ± 2155.928 ns/op ZipFileOpen.openCloseZipFile1024 avgt 15 187373.545 ± 2296.779 ns/op Patched: Benchmark (size) Mode Cnt Score Error Units ZipFileOpen.openCloseZipFile 512 avgt 15 85574.648 ± 920.477 ns/op ZipFileOpen.openCloseZipFile1024 avgt 15 160493.277 ± 3450.928 ns/op - PR: https://git.openjdk.org/jdk/pull/12077
Re: RFR: 8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder [v2]
> `ZipCoder::checkedHashCode` emulates `StringLatin1::hashCode` but operates on > a `byte[]` subrange. It can profitably use the recently introduced > `ArraysSupport::vectorizedHashCode` method to see a speed-up, which > translates to a small but significant speed-up on `ZipFile` creation. > > Before: > > Benchmark (size) Mode Cnt Score Error Units > ZipFileOpen.openCloseZipFile 512 avgt 15 83007.325 ± 1446.716 ns/op > ZipFileOpen.openCloseZipFile1024 avgt 15 154550.631 ± 2166.673 ns/op > > After: > > Benchmark (size) Mode Cnt Score Error Units > ZipFileOpen.openCloseZipFile 512 avgt 15 79512.902 ± 814.449 ns/op > ZipFileOpen.openCloseZipFile 1024 avgt 15 147892.522 ± 2744.017 ns/op Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Vary dir and entry name lengths across a wider spread, keeping most entries short but making the longest paths longer - Changes: - all: https://git.openjdk.org/jdk/pull/12077/files - new: https://git.openjdk.org/jdk/pull/12077/files/0c7f0f4f..369537c5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=12077=01 - incr: https://webrevs.openjdk.org/?repo=jdk=12077=00-01 Stats: 9 lines in 1 file changed: 5 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/12077.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12077/head:pull/12077 PR: https://git.openjdk.org/jdk/pull/12077
Re: RFR: 8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder
On Fri, 20 Jan 2023 11:25:22 GMT, Lance Andersen wrote: >> `ZipCoder::checkedHashCode` emulates `StringLatin1::hashCode` but operates >> on a `byte[]` subrange. It can profitably use the recently introduced >> `ArraysSupport::vectorizedHashCode` method to see a speed-up, which >> translates to a small but significant speed-up on `ZipFile` creation. >> >> Before: >> >> Benchmark (size) Mode Cnt Score Error Units >> ZipFileOpen.openCloseZipFile 512 avgt 15 83007.325 ± 1446.716 ns/op >> ZipFileOpen.openCloseZipFile1024 avgt 15 154550.631 ± 2166.673 ns/op >> >> After: >> >> Benchmark (size) Mode Cnt Score Error Units >> ZipFileOpen.openCloseZipFile 512 avgt 15 79512.902 ± 814.449 ns/op >> ZipFileOpen.openCloseZipFile1024 avgt 15 147892.522 ± 2744.017 ns/op > > test/micro/org/openjdk/bench/java/util/zip/ZipFileOpen.java line 69: > >> 67: >> 68: ename += "long-entry-name-" + (random.nextInt(9) + >> 1) + "-" + i; >> 69: zos.putNextEntry(new ZipEntry(ename)); > > Would it be worth to add some random sized data when the entries are created > to allow for getting a bit more insight, or perhaps do that in a separate > benchmark> I was experimenting with varying the entry names length to see what - if any - impact it had and saw a small effect on the micro. It does make more sense to vary lengths now that very long names will take different paths in the vectorized intrinsic. I'll see what I can do without overengineering this. - PR: https://git.openjdk.org/jdk/pull/12077
Re: RFR: 8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder
On Fri, 20 Jan 2023 11:10:13 GMT, Alan Bateman wrote: > > FWIW the micro is derived from the sibling `ZipFileGetEntry` micro in the > > same directory. It's not exactly necessary for this use case to add such a > > benchmark, but I think there's value in verifying that optimizing > > `checkedHash` improves `ZipFile` setup and adding the micro might allow us > > to find further opportunities down the line. > > I've no doubt it improves checkedHash, it's just that open the zip file and > reading in the CEN probably dominates when not in the file system cache. Right, the micro is a poor proxy for real-world implications since time to open a zip file very much depends on the filesystem speed but this is sort of by design. We have separate startup tests that tries to emulate more "cold start" scenarios, which micros like this are complementary to and not a substitute for. - PR: https://git.openjdk.org/jdk/pull/12077
Integrated: 8300647: Miscellaneous hashCode improvements in java.base
On Thu, 19 Jan 2023 11:45:12 GMT, Claes Redestad wrote: > Went through the jdk and found a few more places where > `ArraysSupport::vectorizedHashCode` can be used, and a few where adhoc > methods could be replaced with a plain call to `java.util.Arrays` > equivalents. This patch addresses that. > > After this, #12068, and #12077 I think we're reaching the limit of sensible > direct use of `AS::vectorizedHashCode`. I've found a few places outside of > java.base where `vectorizedHashCode` might be applicable, but none that seem > important enough to warrant an export of this method outside of java.base. This pull request has now been integrated. Changeset: d85243f0 Author:Claes Redestad URL: https://git.openjdk.org/jdk/commit/d85243f02b34d03bd7af63a5bcbc73f500f720df Stats: 37 lines in 2 files changed: 1 ins; 31 del; 5 mod 8300647: Miscellaneous hashCode improvements in java.base Reviewed-by: stsypanov, rriggs - PR: https://git.openjdk.org/jdk/pull/12093
Re: RFR: 8300647: Miscellaneous hashCode improvements in java.base [v2]
On Thu, 19 Jan 2023 13:46:26 GMT, Claes Redestad wrote: >> Went through the jdk and found a few more places where >> `ArraysSupport::vectorizedHashCode` can be used, and a few where adhoc >> methods could be replaced with a plain call to `java.util.Arrays` >> equivalents. This patch addresses that. >> >> After this, #12068, and #12077 I think we're reaching the limit of sensible >> direct use of `AS::vectorizedHashCode`. I've found a few places outside of >> java.base where `vectorizedHashCode` might be applicable, but none that seem >> important enough to warrant an export of this method outside of java.base. > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Revert ZipFileSystem changes since it's additionally built as a library > dependency and can't rely on jdk.internal Thanks for reviewing! - PR: https://git.openjdk.org/jdk/pull/12093
Re: RFR: 8300647: Miscellaneous hashCode improvements in java.base [v2]
> Went through the jdk and found a few more places where > `ArraysSupport::vectorizedHashCode` can be used, and a few where adhoc > methods could be replaced with a plain call to `java.util.Arrays` > equivalents. This patch addresses that. > > After this, #12068, and #12077 I think we're reaching the limit of sensible > direct use of `AS::vectorizedHashCode`. I've found a few places outside of > java.base where `vectorizedHashCode` might be applicable, but none that seem > important enough to warrant an export of this method outside of java.base. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Revert ZipFileSystem changes since it's additionally built as a library dependency and can't rely on jdk.internal - Changes: - all: https://git.openjdk.org/jdk/pull/12093/files - new: https://git.openjdk.org/jdk/pull/12093/files/5f36ebd7..b1507e69 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=12093=01 - incr: https://webrevs.openjdk.org/?repo=jdk=12093=00-01 Stats: 7 lines in 1 file changed: 2 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/12093.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12093/head:pull/12093 PR: https://git.openjdk.org/jdk/pull/12093
RFR: 8300647: Miscellaneous hashCode improvements in java.base
Went through the jdk and found a few more places where `ArraysSupport::vectorizedHashCode` can be used, and a few where adhoc methods could be replaced with a plain call to `java.util.Arrays` equivalents. This patch addresses that. After this, #12068, and #12077 I think we're reaching the limit of sensible direct use of `AS::vectorizedHashCode`. I've found a few places outside of java.base where `vectorizedHashCode` might be applicable, but none that seem important enough to warrant an export of this method outside of java.base. - Commit messages: - Miscellaneous hashCode improvements in java.base Changes: https://git.openjdk.org/jdk/pull/12093/files Webrev: https://webrevs.openjdk.org/?repo=jdk=12093=00 Issue: https://bugs.openjdk.org/browse/JDK-8300647 Stats: 44 lines in 3 files changed: 3 ins; 33 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/12093.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12093/head:pull/12093 PR: https://git.openjdk.org/jdk/pull/12093
Re: RFR: 8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder
On Wed, 18 Jan 2023 16:53:04 GMT, Claes Redestad wrote: > `ZipCoder::checkedHashCode` emulates `StringLatin1::hashCode` but operates on > a `byte[]` subrange. It can profitably use the recently introduced > `ArraysSupport::vectorizedHashCode` method to see a speed-up, which > translates to a small but significant speed-up on `ZipFile` creation. > > Before: > > Benchmark (size) Mode Cnt Score Error Units > ZipFileOpen.openCloseZipFile 512 avgt 15 83007.325 ± 1446.716 ns/op > ZipFileOpen.openCloseZipFile1024 avgt 15 154550.631 ± 2166.673 ns/op > > After: > > Benchmark (size) Mode Cnt Score Error Units > ZipFileOpen.openCloseZipFile 512 avgt 15 79512.902 ± 814.449 ns/op > ZipFileOpen.openCloseZipFile1024 avgt 15 147892.522 ± 2744.017 ns/op FWIW the micro is derived from the sibling `ZipFileGetEntry` micro in the same directory. It's not exactly necessary for this use case to add such a benchmark, but I think there's value in verifying that optimizing `checkedHash` improves `ZipFile` setup and adding the micro might allow us to find further opportunities down the line. - PR: https://git.openjdk.org/jdk/pull/12077
Re: RFR: 8300489: Use ArraysSupport.vectorizedHashCode in j.l.CharacterName
On Wed, 18 Jan 2023 11:04:57 GMT, Claes Redestad wrote: > This patch makes use of `ArraysSupport.vectorizedHashCode` and helps verify > the performance win also for range-based hash calculations. Also modernized > and cleaned up the surrounding code somewhat. > > > Benchmark Mode CntScore Error Units > Characters.CodePoints.codePointOf avgt 15 153.753 ± 6.119 ns/op > Characters.CodePoints.codePointOf avgt 15 134.753 ± 4.386 ns/op # after, > 1.14x > > > For C1 and C2 on non-x86 platforms the performance is neutral. Interpreter > sees a 1-2% regression due a few added calls. Thanks for reviewing! - PR: https://git.openjdk.org/jdk/pull/12068
Integrated: 8300489: Use ArraysSupport.vectorizedHashCode in j.l.CharacterName
On Wed, 18 Jan 2023 11:04:57 GMT, Claes Redestad wrote: > This patch makes use of `ArraysSupport.vectorizedHashCode` and helps verify > the performance win also for range-based hash calculations. Also modernized > and cleaned up the surrounding code somewhat. > > > Benchmark Mode CntScore Error Units > Characters.CodePoints.codePointOf avgt 15 153.753 ± 6.119 ns/op > Characters.CodePoints.codePointOf avgt 15 134.753 ± 4.386 ns/op # after, > 1.14x > > > For C1 and C2 on non-x86 platforms the performance is neutral. Interpreter > sees a 1-2% regression due a few added calls. This pull request has now been integrated. Changeset: 3ea0b8bc Author:Claes Redestad URL: https://git.openjdk.org/jdk/commit/3ea0b8bc253f1498895a879681406ecc15f37afb Stats: 30 lines in 2 files changed: 15 ins; 5 del; 10 mod 8300489: Use ArraysSupport.vectorizedHashCode in j.l.CharacterName Reviewed-by: alanb, naoto - PR: https://git.openjdk.org/jdk/pull/12068