Re: RFR: 8336856: Optimize String Concat [v15]
On Thu, 25 Jul 2024 08:58:14 GMT, Shaojin Wen wrote: >> The current implementation of StringConcat is to mix the coder and length >> into a long. This operation will have some overhead for int/long/boolean >> types. We can separate the calculation of the coder from the calculation of >> the length, which can improve the performance in the scenario of concat >> int/long/boolean. >> >> This idea comes from the suggestion of @l4es in the discussion of PR >> https://github.com/openjdk/jdk/pull/20253#issuecomment-2240412866 > > Shaojin Wen has updated the pull request incrementally with one additional > commit since the last revision: > > add benchmark test/micro/org/openjdk/bench/java/lang/StringConcatGenerate.java line 47: > 45: @Measurement(iterations = 5, time = 1000, timeUnit = > TimeUnit.MILLISECONDS) > 46: @Fork(value = 3, jvmArgsAppend = > "-Djava.lang.invoke.StringConcat.highArityThreshold=0") > 47: public class StringConcatGenerate extends StringConcat { Adding a subclass with an overridden `@Fork` to pass a different `jvmArgsAppend` is a reasonable trick, but could be moved to a nested class within `StringConcat` to keep it in the same context. I'm not sure if this micro brings a persistent added value, though: for experimentation we can just run `StringConcat` with different `-Djava.lang.invoke.StringConcat.highArityThreshold` settings, while for continuous regression testing we're more interested in validating the default settings. Supplying the new `main` method doesn't add anything here, either, since a standalone execution wouldn't pick up the `jvmArgsAppend` value. - PR Review Comment: https://git.openjdk.org/jdk/pull/20273#discussion_r1691205621
Re: RFR: 8336856: Optimize String Concat [v10]
On Wed, 24 Jul 2024 10:18:03 GMT, Shaojin Wen wrote: >> The current implementation of StringConcat is to mix the coder and length >> into a long. This operation will have some overhead for int/long/boolean >> types. We can separate the calculation of the coder from the calculation of >> the length, which can improve the performance in the scenario of concat >> int/long/boolean. >> >> This idea comes from the suggestion of @l4es in the discussion of PR >> https://github.com/openjdk/jdk/pull/20253#issuecomment-2240412866 > > Shaojin Wen has updated the pull request incrementally with ten additional > commits since the last revision: > > - minor refactor > - minor refactor > - reduce change > - copyright > - reduce change > - refactor based on 8335182 > - use Float.toString & Double.toString > - remove ForceInline > - fix build error This is looking much better! An idea: evolve this PR to first and foremost be a replacement for the current `SimpleStringBuilderStrategy`. If you can prove the new strategy is always superior to the baseline `SimpleStringBuilderStrategy` (performance close to `generateMHInlineCopy` but without the problematic scaling JIT overheads for high-arity expressions) then we have a good, incremental improvement with relatively low risk. We can then follow-up with making this better for low-arity expressions by implementing a sharing strategy (need to build classes where we inject rather than embed constants and cache them, e.g. in a `ReferencedKeyMap`) as a follow-up. One issue for both this and that future PR is that the hidden classes are generated into `java.lang` with `Lookup.ClassOption.STRONG`. This means such classes effectively can never be unloaded. That's a blocker. We need to get rid of that `STRONG` coupling, add tests to ensure generated classes can be unloaded and validate that this doesn't add too much overhead. For startup verification I added a `StringConcatStartup` JMH in #19927. You could extend that with a few tests that exercise high-arity expressions. src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 153: > 151: @SuppressWarnings("removal") > 152: private static boolean isGenerateInlineCopy() { > 153: return GENERATE_INLINE_COPY && System.getSecurityManager() == > null; This security manager hack is unfortunate. Likely some bootstrapping issue due use of the classfile API, which might be pulling in lambdas in the paths travelled by this patch. src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1253: > 1251: } > 1252: > 1253: int lengthCoderSloat = paramSlotsTotalSize; Suggestion: int lengthCoderSlot = paramSlotsTotalSize; - PR Review: https://git.openjdk.org/jdk/pull/20273#pullrequestreview-2196340289 PR Review Comment: https://git.openjdk.org/jdk/pull/20273#discussion_r1689583138 PR Review Comment: https://git.openjdk.org/jdk/pull/20273#discussion_r1689578966
Re: RFR: 8336856: Optimize String Concat [v5]
On Tue, 23 Jul 2024 12:22:57 GMT, Shaojin Wen wrote: >> The current implementation of StringConcat is to mix the coder and length >> into a long. This operation will have some overhead for int/long/boolean >> types. We can separate the calculation of the coder from the calculation of >> the length, which can improve the performance in the scenario of concat >> int/long/boolean. >> >> This idea comes from the suggestion of @l4es in the discussion of PR >> https://github.com/openjdk/jdk/pull/20253#issuecomment-2240412866 > > Shaojin Wen has updated the pull request with a new target base due to a > merge or a rebase. The pull request now contains six commits: > > - Merge remote-tracking branch 'origin/optim_simple_concat_202407' into > optim_concat_factory_202407 > ># Conflicts: ># src/java.base/share/classes/java/lang/StringConcatHelper.java ># src/java.base/share/classes/java/lang/System.java ># src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java > - reduce change & refactor > - fix TRUSTED not work > - non-MH-based implementation > - non-MH-based implementation > - optimize string concat While a reasonable start I'd suggest: - adding a flag to switch over to the new implementation and leaving the existing strategy in place initially, then when we have something that is superior in every way we can clean out the old strategy - AFAICT this mimics the `SimpleStringBuilderStrategy` and generates and installs a new class for each expression. This is likely fine for high-arity concatenations which are likely to be one-of-a-kind (the sort of expressions the `SimpleStringBuilderStrategy` was brought back to life for), but could generate a proliferation of classes for low-arity expressions. Instead we probably need to aim for sharing - such shareable classes could be made simpler if we put them in java.lang so that they can link directly with package private methods in `StringConcatHelper` and `String` rather than using trusted `MethodHandles` to cross such boundaries Again: I am working on and have a prototype for this which I aim to present and discuss in two weeks as JVMLS. I appreciate the effort here, but I'll need to focus on my talk and prototype for now. - PR Comment: https://git.openjdk.org/jdk/pull/20273#issuecomment-2245279590
Re: RFR: 8336831: Optimize StringConcatHelper.simpleConcat [v5]
On Tue, 23 Jul 2024 12:49:52 GMT, Shaojin Wen wrote: >> Yes, this isn't beholden to JLS 15.18.1, and it's already specified that >> `foo.concat("")` returns `foo` - so why shouldn't `"".concat(foo)` return >> `foo`? But still it's an observable semantic change so some care needs to be >> taken - possibly even a CSR is warranted. Doing `return new String(str)` to >> retain behavior avoids that headache for a fringe case. > > Adding isEmpty check is not related to this PR and should be a new PR. Thanks. Removing this check will make the `"".append(foo)` edge case a little bit slower (likely inconsequential), but removes a branch and might thus even be an optimization in the common case. - PR Review Comment: https://git.openjdk.org/jdk/pull/20253#discussion_r1688032897
Re: RFR: 8336831: Optimize StringConcatHelper.simpleConcat [v7]
On Tue, 23 Jul 2024 12:56:07 GMT, Shaojin Wen wrote: >> Currently simpleConcat is implemented using mix and prepend, but in this >> simple scenario, it can be implemented in a simpler way and can improve >> performance. > > Shaojin Wen has updated the pull request incrementally with one additional > commit since the last revision: > > reduce change Marked as reviewed by redestad (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/20253#pullrequestreview-2193939149
Re: RFR: 8336831: Optimize StringConcatHelper.simpleConcat [v5]
On Tue, 23 Jul 2024 12:12:41 GMT, Chen Liang wrote: >> src/java.base/share/classes/java/lang/String.java line 2991: >> >>> 2989: } >>> 2990: if (isEmpty()) { >>> 2991: return str; >> >> This case should probably be reflected more precisely in the specification, >> or do `return new String(str);` to avoid changing semantics for this >> corner-case. > > The spec for concat only asked for equal representation instead of the > existence of a new identity. Thus this should be fine. Yes, this isn't beholden to JLS 15.18.1, and it's already specified that `foo.concat("")` returns `foo` - so why shouldn't `"".concat(foo)` return `foo`? But still it's an observable semantic change so some care needs to be taken - possibly even a CSR is warranted. Doing `return new String(str)` to retain behavior avoids that headache for a fringe case. - PR Review Comment: https://git.openjdk.org/jdk/pull/20253#discussion_r1687990148
Re: RFR: 8336831: Optimize StringConcatHelper.simpleConcat [v5]
On Fri, 19 Jul 2024 21:42:09 GMT, Shaojin Wen wrote: >> Currently simpleConcat is implemented using mix and prepend, but in this >> simple scenario, it can be implemented in a simpler way and can improve >> performance. > > Shaojin Wen has updated the pull request incrementally with two additional > commits since the last revision: > > - Update src/java.base/share/classes/java/lang/String.java > >Co-authored-by: Chen Liang > - add comments This change seems reasonable, and we will need the `newArray(int)` method in future work. This patch dials back on the idea that `simpleConcat` is an "explainer" for what `StringConcatFactory` is doing, but as it was already imprecise in that regard we should favor simplicity and performance here. src/java.base/share/classes/java/lang/String.java line 2991: > 2989: } > 2990: if (isEmpty()) { > 2991: return str; This case should probably be reflected more precisely in the specification, or do `return new String(str);` to avoid changing semantics for this corner-case. - Marked as reviewed by redestad (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20253#pullrequestreview-2193780530 PR Review Comment: https://git.openjdk.org/jdk/pull/20253#discussion_r1687937671
Re: RFR: 8335182: Consolidate and streamline String concat code shapes [v3]
On Tue, 9 Jul 2024 21:16:33 GMT, Claes Redestad wrote: >> This PR attains a speed-up on some microbenchmarks by simplifying how we >> build up the MH combinator tree shape >> (only use prependers with prefix, always add a suffix to the newArray >> combinator), then simplifying/inlining some of the code in the helper >> functions. >> >> Many simple concatenation expressions stay performance neutral, while the >> win comes from enabling C2 to better optimize more complex shapes >> (concat13String, concatMix4String, concatConst6String see relatively large >> improvements): >> >> >> NameCnt Base Error Test >> Error Unit Change >> StringConcat.concat13String 50 66.380 ± 0.18953.017 ± >> 0.241 ns/op 1.25x (p = 0.000*) >> StringConcat.concat4String 50 13.620 ± 0.05312.382 ± >> 0.089 ns/op 1.10x (p = 0.000*) >> StringConcat.concat6String 50 17.168 ± 0.07016.046 ± >> 0.064 ns/op 1.07x (p = 0.000*) >> StringConcat.concatConst2String 509.820 ± 0.081 8.158 ± >> 0.041 ns/op 1.20x (p = 0.000*) >> StringConcat.concatConst4String 50 14.938 ± 0.12412.800 ± >> 0.049 ns/op 1.17x (p = 0.000*) >> StringConcat.concatConst6Object 50 56.115 ± 0.28854.046 ± >> 0.214 ns/op 1.04x (p = 0.000*) >> StringConcat.concatConst6String 50 19.032 ± 0.07316.213 ± >> 0.093 ns/op 1.17x (p = 0.000*) >> StringConcat.concatConstBoolByte 508.486 ± 0.066 8.556 ± >> 0.050 ns/op 0.99x (p = 0.004*) >> StringConcat.concatConstInt 508.942 ± 0.052 7.677 ± >> 0.029 ns/op 1.16x (p = 0.000*) >> StringConcat.concatConstIntConstInt 50 12.883 ± 0.07012.431 ± >> 0.070 ns/op 1.04x (p = 0.000*) >> StringConcat.concatConstString 507.523 ± 0.050 7.486 ± >> 0.044 ns/op 1.00x (p = 0.058 ) >> StringConcat.concatConstStringConstInt 50 11.961 ± 0.03211.699 ± >> 0.049 ns/op 1.02x (p = 0.000*) >> StringConcat.concatEmptyConstInt 507.778 ± 0.038 7.723 ± >> 0.037 ns/op 1.01x (p = 0.000*) >> StringConcat.concatEmptyConstString 503.506 ± 0.026 3.505 ± >> 0.015 ns/op 1.00x (p = 0.930 ) >> StringConcat.concatEmptyLeft 503.573 ± 0.075 3.518 ± >> 0.057 ns/op 1.02x (p = 0.044 ) >> StringConcat.concatEmptyRight503.713 ± 0.049 3.622 ± >> 0.053 ns/op 1.02x (p = 0.000*) >> StringConcat.concatMethodConstString 507.418 ± 0.030 7.478 ± >> 0.066 ns/op 0.99x (p... > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Excess blankspace Thanks for reviews and patience - went on vacation for a couple of weeks and didn't want to integrate this and leave any bug trail for others to deal with. Back part-time for a while so let's go! - PR Comment: https://git.openjdk.org/jdk/pull/19927#issuecomment-2245022856
Integrated: 8335182: Consolidate and streamline String concat code shapes
On Thu, 27 Jun 2024 12:27:26 GMT, Claes Redestad wrote: > This PR attains a speed-up on some microbenchmarks by simplifying how we > build up the MH combinator tree shape > (only use prependers with prefix, always add a suffix to the newArray > combinator), then simplifying/inlining some of the code in the helper > functions. > > Many simple concatenation expressions stay performance neutral, while the win > comes from enabling C2 to better optimize more complex shapes > (concat13String, concatMix4String, concatConst6String see relatively large > improvements): > > > NameCnt Base Error Test > Error Unit Change > StringConcat.concat13String 50 66.380 ± 0.18953.017 ± > 0.241 ns/op 1.25x (p = 0.000*) > StringConcat.concat4String 50 13.620 ± 0.05312.382 ± > 0.089 ns/op 1.10x (p = 0.000*) > StringConcat.concat6String 50 17.168 ± 0.07016.046 ± > 0.064 ns/op 1.07x (p = 0.000*) > StringConcat.concatConst2String 509.820 ± 0.081 8.158 ± > 0.041 ns/op 1.20x (p = 0.000*) > StringConcat.concatConst4String 50 14.938 ± 0.12412.800 ± > 0.049 ns/op 1.17x (p = 0.000*) > StringConcat.concatConst6Object 50 56.115 ± 0.28854.046 ± > 0.214 ns/op 1.04x (p = 0.000*) > StringConcat.concatConst6String 50 19.032 ± 0.07316.213 ± > 0.093 ns/op 1.17x (p = 0.000*) > StringConcat.concatConstBoolByte 508.486 ± 0.066 8.556 ± > 0.050 ns/op 0.99x (p = 0.004*) > StringConcat.concatConstInt 508.942 ± 0.052 7.677 ± > 0.029 ns/op 1.16x (p = 0.000*) > StringConcat.concatConstIntConstInt 50 12.883 ± 0.07012.431 ± > 0.070 ns/op 1.04x (p = 0.000*) > StringConcat.concatConstString 507.523 ± 0.050 7.486 ± > 0.044 ns/op 1.00x (p = 0.058 ) > StringConcat.concatConstStringConstInt 50 11.961 ± 0.03211.699 ± > 0.049 ns/op 1.02x (p = 0.000*) > StringConcat.concatEmptyConstInt 507.778 ± 0.038 7.723 ± > 0.037 ns/op 1.01x (p = 0.000*) > StringConcat.concatEmptyConstString 503.506 ± 0.026 3.505 ± > 0.015 ns/op 1.00x (p = 0.930 ) > StringConcat.concatEmptyLeft 503.573 ± 0.075 3.518 ± > 0.057 ns/op 1.02x (p = 0.044 ) > StringConcat.concatEmptyRight503.713 ± 0.049 3.622 ± > 0.053 ns/op 1.02x (p = 0.000*) > StringConcat.concatMethodConstString 507.418 ± 0.030 7.478 ± > 0.066 ns/op 0.99x (p = 0.005*) > StringConcat.concatMix4String... This pull request has now been integrated. Changeset: e83b4b23 Author:Claes Redestad URL: https://git.openjdk.org/jdk/commit/e83b4b236eca48d0b75094006f7f888398194fe4 Stats: 566 lines in 6 files changed: 412 ins; 114 del; 40 mod 8335182: Consolidate and streamline String concat code shapes Reviewed-by: liach, jvernee - PR: https://git.openjdk.org/jdk/pull/19927
Re: RFR: 8336856: Optimize String Concat
On Sun, 21 Jul 2024 12:25:36 GMT, Shaojin Wen wrote: > The current implementation of StringConcat is to mix the coder and length > into a long. This operation will have some overhead for int/long/boolean > types. We can separate the calculation of the coder from the calculation of > the length, which can improve the performance in the scenario of concat > int/long/boolean. > > This idea comes from the suggestion of @l4es in the discussion of PR > https://github.com/openjdk/jdk/pull/20253#issuecomment-2240412866 This reverts startup optimizations which are important for the overall performance of the current MH-based implementation. I did not suggest splitting apart length and coder in the context of this current implementation, but that in a non-MH-based implementation that combination trick makes little sense. I am trying to find time to work out the kinks of a bytecode-spinning solution and would appreciate it if we leave the venerable `StringConcatFactory` alone for the time being - at least until after JVMLS (Aug 5-7). (I noticed in some tests that splitting apart like this seem to only affect M1/aarch64, not x64, suggesting some JIT compiler deficiency on aarch64. Perhaps that would be a useful side track to explore.) - PR Comment: https://git.openjdk.org/jdk/pull/20273#issuecomment-2241602473
Re: RFR: 8336831: Optimize StringConcatHelper.simpleConcat [v5]
On Fri, 19 Jul 2024 21:42:09 GMT, Shaojin Wen wrote: >> Currently simpleConcat is implemented using mix and prepend, but in this >> simple scenario, it can be implemented in a simpler way and can improve >> performance. > > Shaojin Wen has updated the pull request incrementally with two additional > commits since the last revision: > > - Update src/java.base/share/classes/java/lang/String.java > >Co-authored-by: Chen Liang > - add comments No, my idea is to rework and have SCF spin hidden classes using the new classfile API which will emit code or less in the style you are doing here for `simpleConcat`, but generalized. This may or may not work out beautifully. I'm coming back from a short vacation now and will work on having something to present on this topic at JVMLS two and a half weeks from now. FTR I don't mind if you get this optimization in here. I'll just rebase my prototype, and it shouldn't alter things too much. - PR Comment: https://git.openjdk.org/jdk/pull/20253#issuecomment-2240728186
Re: RFR: 8336831: Optimize StringConcatHelper.simpleConcat [v5]
On Fri, 19 Jul 2024 21:42:09 GMT, Shaojin Wen wrote: >> Currently simpleConcat is implemented using mix and prepend, but in this >> simple scenario, it can be implemented in a simpler way and can improve >> performance. > > Shaojin Wen has updated the pull request incrementally with two additional > commits since the last revision: > > - Update src/java.base/share/classes/java/lang/String.java > >Co-authored-by: Chen Liang > - add comments FWIW one of the ideas when implementing `StringConcatHelper.simpleConcat` was that by using the primitives used by the `StringConcatFactory` as straightforwardly as possible the method acts as a documentation-of-sorts or guide to understand how the `SCF` expression trees are built up. It's not perfect, though. Concatenation of a `String` + constant would be handled differently for one. So perhaps the value as a guide is not high. I'm also experimenting with replacing the MH-based strategy with spinning hidden, shareable classes instead. In that case we might actually be better off getting rid of the `long indexCoder` hacks and generate code more similar to the `doConcat` you've come up with here. The main benefit of combining `coder` and `index/length` into a single `long` was to reduce the MH combinator overheads, but if we're spinning code with access to optimized primitives then that isn't really needed. - PR Comment: https://git.openjdk.org/jdk/pull/20253#issuecomment-2240412866
Re: RFR: 8335182: Consolidate and streamline String concat code shapes [v3]
> This PR attains a speed-up on some microbenchmarks by simplifying how we > build up the MH combinator tree shape > (only use prependers with prefix, always add a suffix to the newArray > combinator), then simplifying/inlining some of the code in the helper > functions. > > Many simple concatenation expressions stay performance neutral, while the win > comes from enabling C2 to better optimize more complex shapes > (concat13String, concatMix4String, concatConst6String see relatively large > improvements): > > > NameCnt Base Error Test > Error Unit Change > StringConcat.concat13String 50 66.380 ± 0.18953.017 ± > 0.241 ns/op 1.25x (p = 0.000*) > StringConcat.concat4String 50 13.620 ± 0.05312.382 ± > 0.089 ns/op 1.10x (p = 0.000*) > StringConcat.concat6String 50 17.168 ± 0.07016.046 ± > 0.064 ns/op 1.07x (p = 0.000*) > StringConcat.concatConst2String 509.820 ± 0.081 8.158 ± > 0.041 ns/op 1.20x (p = 0.000*) > StringConcat.concatConst4String 50 14.938 ± 0.12412.800 ± > 0.049 ns/op 1.17x (p = 0.000*) > StringConcat.concatConst6Object 50 56.115 ± 0.28854.046 ± > 0.214 ns/op 1.04x (p = 0.000*) > StringConcat.concatConst6String 50 19.032 ± 0.07316.213 ± > 0.093 ns/op 1.17x (p = 0.000*) > StringConcat.concatConstBoolByte 508.486 ± 0.066 8.556 ± > 0.050 ns/op 0.99x (p = 0.004*) > StringConcat.concatConstInt 508.942 ± 0.052 7.677 ± > 0.029 ns/op 1.16x (p = 0.000*) > StringConcat.concatConstIntConstInt 50 12.883 ± 0.07012.431 ± > 0.070 ns/op 1.04x (p = 0.000*) > StringConcat.concatConstString 507.523 ± 0.050 7.486 ± > 0.044 ns/op 1.00x (p = 0.058 ) > StringConcat.concatConstStringConstInt 50 11.961 ± 0.03211.699 ± > 0.049 ns/op 1.02x (p = 0.000*) > StringConcat.concatEmptyConstInt 507.778 ± 0.038 7.723 ± > 0.037 ns/op 1.01x (p = 0.000*) > StringConcat.concatEmptyConstString 503.506 ± 0.026 3.505 ± > 0.015 ns/op 1.00x (p = 0.930 ) > StringConcat.concatEmptyLeft 503.573 ± 0.075 3.518 ± > 0.057 ns/op 1.02x (p = 0.044 ) > StringConcat.concatEmptyRight503.713 ± 0.049 3.622 ± > 0.053 ns/op 1.02x (p = 0.000*) > StringConcat.concatMethodConstString 507.418 ± 0.030 7.478 ± > 0.066 ns/op 0.99x (p = 0.005*) > StringConcat.concatMix4String... Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Excess blankspace - Changes: - all: https://git.openjdk.org/jdk/pull/19927/files - new: https://git.openjdk.org/jdk/pull/19927/files/7c50d7dd..58fe4b12 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=19927=02 - incr: https://webrevs.openjdk.org/?repo=jdk=19927=01-02 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/19927.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19927/head:pull/19927 PR: https://git.openjdk.org/jdk/pull/19927
Re: RFR: 8335366: Improve String.format performance with fastpath [v9]
On Mon, 1 Jul 2024 06:18:55 GMT, Chen Liang wrote: >> Shaojin Wen has updated the pull request incrementally with one additional >> commit since the last revision: >> >> code style > > As promising as the performance number is, I think we need to ensure two > things: > 1. Correctness: this patch adds a lot of special cases; not sure if the > current test cases already cover all of them. In addition, format is i18n > stuff, which will need extra review besides the review for performance gains. > 2. Validity: the existing benchmarks don't have profile pollution: see > `ReflectionSpeedBenchmark` > https://github.com/openjdk/jdk/blob/d9bcf061450ebfb7fe02b5a50c855db1d9178e5d/test/micro/org/openjdk/bench/java/lang/reflect/ReflectionSpeedBenchmark.java#L291 > where the `Method.invoke` and `Constructor.newInstance` are called to tamper > JIT's profiling, as JIT can conclude that only one format shape is ever used > in your benchmark, which is unlikely in production. You should call > `String.format` and `String.formatted` with varied format strings and > arguments in setup for profile pollution. > An extreme example would be at > https://github.com/openjdk/jdk/pull/14944#issuecomment-1644050455, where > `Arrays.hashCode(Object[])` where every element is an `Integer` is extremely > fast, but slows down drastically once different arrays are passed. Fully agree with the points @liach made above that this needs to be scrutinized for correctness and validity. For example we need to better explore cases where detection for the fast path might be costlier, for example a longer format string with or without some acceptable specifiers early, then something that forces us into the slow path only at the end. - PR Comment: https://git.openjdk.org/jdk/pull/19956#issuecomment-2199478317
Re: RFR: 8335252: Reduce size of j.u.Formatter.Conversion#isValid [v4]
On Fri, 28 Jun 2024 12:46:49 GMT, Shaojin Wen wrote: >> Currently, the java.util.Formatter$Conversion::isValid method is implemented >> based on switch, which cannot be inlined because codeSize > 325. This >> problem can be avoided by implementing it with ImmutableBitSetPredicate. >> >> use `-XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining` to see the master >> branch: >> >> @ 109 java.util.Formatter$Conversion::isValid (358 bytes) failed to >> inline: hot method too big >> >> >> current version >> >> @ 109 java.util.Formatter$Conversion::isValid (10 bytes) inline (hot) >> @ 4 >> jdk.internal.util.ImmutableBitSetPredicate$SmallImmutableBitSetPredicate::test >> (50 bytes) inline (hot) > > Shaojin Wen has updated the pull request incrementally with one additional > commit since the last revision: > > fix comment I agree with improving the legacy `String.format` code, within reason due it's wide use and appeal. Doing so does not detract from the goal of providing smarter and faster formatting via `StringTemplates`. - PR Comment: https://git.openjdk.org/jdk/pull/19926#issuecomment-2198045830
Re: RFR: 8335252: Reduce size of j.u.Formatter.Conversion#isValid [v4]
On Fri, 28 Jun 2024 12:46:49 GMT, Shaojin Wen wrote: >> Currently, the java.util.Formatter$Conversion::isValid method is implemented >> based on switch, which cannot be inlined because codeSize > 325. This >> problem can be avoided by implementing it with ImmutableBitSetPredicate. >> >> use `-XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining` to see the master >> branch: >> >> @ 109 java.util.Formatter$Conversion::isValid (358 bytes) failed to >> inline: hot method too big >> >> >> current version >> >> @ 109 java.util.Formatter$Conversion::isValid (10 bytes) inline (hot) >> @ 4 >> jdk.internal.util.ImmutableBitSetPredicate$SmallImmutableBitSetPredicate::test >> (50 bytes) inline (hot) > > Shaojin Wen has updated the pull request incrementally with one additional > commit since the last revision: > > fix comment Thanks for the updates. I think this is now a harmless and straightforward improvement. - Marked as reviewed by redestad (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19926#pullrequestreview-2147942903
Re: RFR: 8335182: Consolidate and streamline String concat code shapes [v2]
> This PR attains a speed-up on some microbenchmarks by simplifying how we > build up the MH combinator tree shape > (only use prependers with prefix, always add a suffix to the newArray > combinator), then simplifying/inlining some of the code in the helper > functions. > > Many simple concatenation expressions stay performance neutral, while the win > comes from enabling C2 to better optimize more complex shapes > (concat13String, concatMix4String, concatConst6String see relatively large > improvements): > > > NameCnt Base Error Test > Error Unit Change > StringConcat.concat13String 50 66.380 ± 0.18953.017 ± > 0.241 ns/op 1.25x (p = 0.000*) > StringConcat.concat4String 50 13.620 ± 0.05312.382 ± > 0.089 ns/op 1.10x (p = 0.000*) > StringConcat.concat6String 50 17.168 ± 0.07016.046 ± > 0.064 ns/op 1.07x (p = 0.000*) > StringConcat.concatConst2String 509.820 ± 0.081 8.158 ± > 0.041 ns/op 1.20x (p = 0.000*) > StringConcat.concatConst4String 50 14.938 ± 0.12412.800 ± > 0.049 ns/op 1.17x (p = 0.000*) > StringConcat.concatConst6Object 50 56.115 ± 0.28854.046 ± > 0.214 ns/op 1.04x (p = 0.000*) > StringConcat.concatConst6String 50 19.032 ± 0.07316.213 ± > 0.093 ns/op 1.17x (p = 0.000*) > StringConcat.concatConstBoolByte 508.486 ± 0.066 8.556 ± > 0.050 ns/op 0.99x (p = 0.004*) > StringConcat.concatConstInt 508.942 ± 0.052 7.677 ± > 0.029 ns/op 1.16x (p = 0.000*) > StringConcat.concatConstIntConstInt 50 12.883 ± 0.07012.431 ± > 0.070 ns/op 1.04x (p = 0.000*) > StringConcat.concatConstString 507.523 ± 0.050 7.486 ± > 0.044 ns/op 1.00x (p = 0.058 ) > StringConcat.concatConstStringConstInt 50 11.961 ± 0.03211.699 ± > 0.049 ns/op 1.02x (p = 0.000*) > StringConcat.concatEmptyConstInt 507.778 ± 0.038 7.723 ± > 0.037 ns/op 1.01x (p = 0.000*) > StringConcat.concatEmptyConstString 503.506 ± 0.026 3.505 ± > 0.015 ns/op 1.00x (p = 0.930 ) > StringConcat.concatEmptyLeft 503.573 ± 0.075 3.518 ± > 0.057 ns/op 1.02x (p = 0.044 ) > StringConcat.concatEmptyRight503.713 ± 0.049 3.622 ± > 0.053 ns/op 1.02x (p = 0.000*) > StringConcat.concatMethodConstString 507.418 ± 0.030 7.478 ± > 0.066 ns/op 0.99x (p = 0.005*) > StringConcat.concatMix4String... Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Copyrights - Changes: - all: https://git.openjdk.org/jdk/pull/19927/files - new: https://git.openjdk.org/jdk/pull/19927/files/6525e945..7c50d7dd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=19927=01 - incr: https://webrevs.openjdk.org/?repo=jdk=19927=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19927.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19927/head:pull/19927 PR: https://git.openjdk.org/jdk/pull/19927
Re: RFR: 8335252: ForceInline j.u.Formatter.Conversion#isValid [v3]
On Fri, 28 Jun 2024 11:27:47 GMT, Shaojin Wen wrote: >> Currently, the java.util.Formatter$Conversion::isValid method is implemented >> based on switch, which cannot be inlined because codeSize > 325. This >> problem can be avoided by implementing it with ImmutableBitSetPredicate. >> >> use `-XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining` to see the master >> branch: >> >> @ 109 java.util.Formatter$Conversion::isValid (358 bytes) failed to >> inline: hot method too big >> >> >> current version >> >> @ 109 java.util.Formatter$Conversion::isValid (10 bytes) inline (hot) >> @ 4 >> jdk.internal.util.ImmutableBitSetPredicate$SmallImmutableBitSetPredicate::test >> (50 bytes) inline (hot) > > Shaojin Wen has updated the pull request incrementally with one additional > commit since the last revision: > > remove ForceInline This looks better. Avoids a branch up front, minimized code changes. The comment is good, but perhaps move it down into the default case? Update bug description to something like "Reduce size of j.u.Formatter.Conversion#isValid" ? - PR Comment: https://git.openjdk.org/jdk/pull/19926#issuecomment-2196808696
Re: RFR: 8335252: ForceInline j.u.Formatter.Conversion#isValid [v2]
On Thu, 27 Jun 2024 14:12:36 GMT, Shaojin Wen wrote: >> Currently, the java.util.Formatter$Conversion::isValid method is implemented >> based on switch, which cannot be inlined because codeSize > 325. This >> problem can be avoided by implementing it with ImmutableBitSetPredicate. >> >> use `-XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining` to see the master >> branch: >> >> @ 109 java.util.Formatter$Conversion::isValid (358 bytes) failed to >> inline: hot method too big >> >> >> current version >> >> @ 109 java.util.Formatter$Conversion::isValid (10 bytes) inline (hot) >> @ 4 >> jdk.internal.util.ImmutableBitSetPredicate$SmallImmutableBitSetPredicate::test >> (50 bytes) inline (hot) > > Shaojin Wen has updated the pull request incrementally with one additional > commit since the last revision: > > revert & use `@ForceInline` So the compound issue here is that there's a heuristic which prevents inlining of large methods, and that a tableswitch on inputs spanning from `'%' (37)` to `'x' (120)` become expansive enough (350bytes) to make the JIT enforce that rule. As an alternative to `@ForceInline` I tested splitting uppercase and lowercase chars into two switches. This looks a bit nasty, but actually reduces the size of the method: diff --git a/src/java.base/share/classes/java/util/Formatter.java b/src/java.base/share/classes/java/util/Formatter.java index a7d95ee5780..50419f1ea23 100644 --- a/src/java.base/share/classes/java/util/Formatter.java +++ b/src/java.base/share/classes/java/util/Formatter.java @@ -4947,30 +4947,36 @@ static class Conversion { static final char PERCENT_SIGN= '%'; static boolean isValid(char c) { -return switch (c) { -case BOOLEAN, - BOOLEAN_UPPER, - STRING, - STRING_UPPER, - HASHCODE, - HASHCODE_UPPER, - CHARACTER, - CHARACTER_UPPER, - DECIMAL_INTEGER, - OCTAL_INTEGER, - HEXADECIMAL_INTEGER, - HEXADECIMAL_INTEGER_UPPER, - SCIENTIFIC, - SCIENTIFIC_UPPER, - GENERAL, - GENERAL_UPPER, - DECIMAL_FLOAT, - HEXADECIMAL_FLOAT, - HEXADECIMAL_FLOAT_UPPER, - LINE_SEPARATOR, - PERCENT_SIGN -> true; -default -> false; -}; +if (c >= 'a') { +return switch (c) { +case BOOLEAN, + STRING, + HASHCODE, + CHARACTER, + DECIMAL_INTEGER, + OCTAL_INTEGER, + HEXADECIMAL_INTEGER, + SCIENTIFIC, + GENERAL, + DECIMAL_FLOAT, + HEXADECIMAL_FLOAT, + LINE_SEPARATOR -> true; +default -> false; +}; +} else { +return switch (c) { +case BOOLEAN_UPPER, + STRING_UPPER, + HASHCODE_UPPER, + CHARACTER_UPPER, + HEXADECIMAL_INTEGER_UPPER, + SCIENTIFIC_UPPER, + GENERAL_UPPER, + HEXADECIMAL_FLOAT_UPPER, + PERCENT_SIGN -> true; +default -> false; +}; +} } // Returns true iff the Conversion is applicable to all objects. `javap -v java.lang.Formatter$Conversion` static boolean isValid(char); descriptor: (C)Z flags: (0x0008) ACC_STATIC Code: stack=2, locals=1, args_size=1 0: iload_0 1: bipush97 3: if_icmplt 122 6: iload_0 7: tableswitch { // 97 to 120 97: 116 98: 116 99: 116 100: 116 101: 116 102: 116 103: 116 104: 116 105: 120 106: 120 107: 120 108: 120 109: 120 110: 116 111: 116 112: 120 113: 120 114: 120 115: 116 116: 120 117: 120 118: 120 119: 120 120: 116 default: 120 } 116: iconst_1 117: goto 121 120: iconst_0 121: ireturn
Re: RFR: 8335182: Consolidate and streamline String concat code shapes
On Thu, 27 Jun 2024 16:05:09 GMT, Chen Liang wrote: >> This PR attains a speed-up on some microbenchmarks by simplifying how we >> build up the MH combinator tree shape >> (only use prependers with prefix, always add a suffix to the newArray >> combinator), then simplifying/inlining some of the code in the helper >> functions. >> >> Many simple concatenation expressions stay performance neutral, while the >> win comes from enabling C2 to better optimize more complex shapes >> (concat13String, concatMix4String, concatConst6String see relatively large >> improvements): >> >> >> NameCnt Base Error Test >> Error Unit Change >> StringConcat.concat13String 50 66.380 ± 0.18953.017 ± >> 0.241 ns/op 1.25x (p = 0.000*) >> StringConcat.concat4String 50 13.620 ± 0.05312.382 ± >> 0.089 ns/op 1.10x (p = 0.000*) >> StringConcat.concat6String 50 17.168 ± 0.07016.046 ± >> 0.064 ns/op 1.07x (p = 0.000*) >> StringConcat.concatConst2String 509.820 ± 0.081 8.158 ± >> 0.041 ns/op 1.20x (p = 0.000*) >> StringConcat.concatConst4String 50 14.938 ± 0.12412.800 ± >> 0.049 ns/op 1.17x (p = 0.000*) >> StringConcat.concatConst6Object 50 56.115 ± 0.28854.046 ± >> 0.214 ns/op 1.04x (p = 0.000*) >> StringConcat.concatConst6String 50 19.032 ± 0.07316.213 ± >> 0.093 ns/op 1.17x (p = 0.000*) >> StringConcat.concatConstBoolByte 508.486 ± 0.066 8.556 ± >> 0.050 ns/op 0.99x (p = 0.004*) >> StringConcat.concatConstInt 508.942 ± 0.052 7.677 ± >> 0.029 ns/op 1.16x (p = 0.000*) >> StringConcat.concatConstIntConstInt 50 12.883 ± 0.07012.431 ± >> 0.070 ns/op 1.04x (p = 0.000*) >> StringConcat.concatConstString 507.523 ± 0.050 7.486 ± >> 0.044 ns/op 1.00x (p = 0.058 ) >> StringConcat.concatConstStringConstInt 50 11.961 ± 0.03211.699 ± >> 0.049 ns/op 1.02x (p = 0.000*) >> StringConcat.concatEmptyConstInt 507.778 ± 0.038 7.723 ± >> 0.037 ns/op 1.01x (p = 0.000*) >> StringConcat.concatEmptyConstString 503.506 ± 0.026 3.505 ± >> 0.015 ns/op 1.00x (p = 0.930 ) >> StringConcat.concatEmptyLeft 503.573 ± 0.075 3.518 ± >> 0.057 ns/op 1.02x (p = 0.044 ) >> StringConcat.concatEmptyRight503.713 ± 0.049 3.622 ± >> 0.053 ns/op 1.02x (p = 0.000*) >> StringConcat.concatMethodConstString 507.418 ± 0.030 7.478 ± >> 0.066 ns/op 0.99x (p... > > src/java.base/share/classes/java/lang/StringConcatHelper.java line 157: > >> 155: } >> 156: index -= prefix.length(); >> 157: prefix.getBytes(buf, index, String.LATIN1); > > Since we are now passing in a lot of empty prefix, I wonder how this call is > elided; is there some specific JIT intrinsic? The java code has no shortcut > for `length == 0`. Remember that the prefixes are all constants and bound into the MH at each call site, so one view is that this PR is mostly about tickling the code into something that just-so happens to be easier for the JIT to swallow - PR Review Comment: https://git.openjdk.org/jdk/pull/19927#discussion_r1657513167
RFR: 8335182: Consolidate and streamline String concat code shapes
This PR attains a speed-up on some microbenchmarks by simplifying how we build up the MH combinator tree shape (only use prependers with prefix, always add a suffix to the newArray combinator), then simplifying/inlining some of the code in the helper functions. Many simple concatenation expressions stay performance neutral, while the win comes from enabling C2 to better optimize more complex shapes (concat13String, concatMix4String, concatConst6String see relatively large improvements): NameCnt Base Error Test Error Unit Change StringConcat.concat13String 50 66.380 ± 0.18953.017 ± 0.241 ns/op 1.25x (p = 0.000*) StringConcat.concat4String 50 13.620 ± 0.05312.382 ± 0.089 ns/op 1.10x (p = 0.000*) StringConcat.concat6String 50 17.168 ± 0.07016.046 ± 0.064 ns/op 1.07x (p = 0.000*) StringConcat.concatConst2String 509.820 ± 0.081 8.158 ± 0.041 ns/op 1.20x (p = 0.000*) StringConcat.concatConst4String 50 14.938 ± 0.12412.800 ± 0.049 ns/op 1.17x (p = 0.000*) StringConcat.concatConst6Object 50 56.115 ± 0.28854.046 ± 0.214 ns/op 1.04x (p = 0.000*) StringConcat.concatConst6String 50 19.032 ± 0.07316.213 ± 0.093 ns/op 1.17x (p = 0.000*) StringConcat.concatConstBoolByte 508.486 ± 0.066 8.556 ± 0.050 ns/op 0.99x (p = 0.004*) StringConcat.concatConstInt 508.942 ± 0.052 7.677 ± 0.029 ns/op 1.16x (p = 0.000*) StringConcat.concatConstIntConstInt 50 12.883 ± 0.07012.431 ± 0.070 ns/op 1.04x (p = 0.000*) StringConcat.concatConstString 507.523 ± 0.050 7.486 ± 0.044 ns/op 1.00x (p = 0.058 ) StringConcat.concatConstStringConstInt 50 11.961 ± 0.03211.699 ± 0.049 ns/op 1.02x (p = 0.000*) StringConcat.concatEmptyConstInt 507.778 ± 0.038 7.723 ± 0.037 ns/op 1.01x (p = 0.000*) StringConcat.concatEmptyConstString 503.506 ± 0.026 3.505 ± 0.015 ns/op 1.00x (p = 0.930 ) StringConcat.concatEmptyLeft 503.573 ± 0.075 3.518 ± 0.057 ns/op 1.02x (p = 0.044 ) StringConcat.concatEmptyRight503.713 ± 0.049 3.622 ± 0.053 ns/op 1.02x (p = 0.000*) StringConcat.concatMethodConstString 507.418 ± 0.030 7.478 ± 0.066 ns/op 0.99x (p = 0.005*) StringConcat.concatMix4String50 89.243 ± 0.43671.866 ± 0.894 ns/op 1.24x (p = 0.000*) StringConcatStartup.MixedLarge.run 10 655.436 ± 29.787 649.730 ± 26.053 ms/op 1.01x (p = 0.500 ) StringConcatStartup.MixedSmall.run 20 51.676 ± 2.32450.724 ± 5.050 ms/op 1.02x (p = 0.512 ) StringConcatStartup.StringLarge.run 10 166.022 ± 15.672 165.300 ± 14.433 ms/op 1.00x (p = 0.873 ) StringConcatStartup.StringSingle.run 400.168 ± 0.016 0.178 ± 0.024 ms/op 0.94x (p = 0.234 ) * = significant Startup-wise it's more or less neutral as evidenced by the added `StringConcatStartup` micro (a simplified and JMH:ified version of some startup stress tests I've been tinkering with over the years). This PR does not change the total number of classes generated and loaded by this test (3359 total, 2499 generated). - Commit messages: - Add StringConcatStartup JMH - Merge branch 'master' into consolidated_prependers - Simplify - Remove getBytes changes - Streamline newArray - Consolidate to always end with newArrayWithSuffix - Merge branch 'master' into consolidated_prependers - Streamline prependers - Fix compilation errors from removing non-prefixed prepend methods - Remove non-prefix prepender methods, inline logic into prefixed variant - ... and 1 more: https://git.openjdk.org/jdk/compare/efb905e5...6525e945 Changes: https://git.openjdk.org/jdk/pull/19927/files Webrev: https://webrevs.openjdk.org/?repo=jdk=19927=00 Issue: https://bugs.openjdk.org/browse/JDK-8335182 Stats: 565 lines in 6 files changed: 412 ins; 114 del; 39 mod Patch: https://git.openjdk.org/jdk/pull/19927.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19927/head:pull/19927 PR: https://git.openjdk.org/jdk/pull/19927
Re: RFR: 8335252: Use ImmutableBitSetPredicate optimize j.u.Formatter.Conversion#isValid
On Thu, 27 Jun 2024 11:48:56 GMT, Shaojin Wen wrote: > * byte codes of `Conversion.isValid` It's surprising that javac generates synthetic cases identical to the default case for every char in the 37 to 120 range (the bounds seem to be the lowest and highest of the constants). Is there a benchmark where you see a benefit from this? - PR Comment: https://git.openjdk.org/jdk/pull/19926#issuecomment-2194540522
Re: RFR: 8333893: Optimization for StringBuilder append boolean & null [v4]
On Thu, 20 Jun 2024 14:32:25 GMT, Shaojin Wen wrote: >> @cl4es @wenshao I think we should probably work on `putChar`, or at least >> figure out what blocks `MergeStore` for `putChar`. Can someone produce a >> simple stand-alone `.java` file that uses `putChar`, so that I can >> investigate why `MergeStore` does not work for it? >> >> Otherwise, I agree with @cl4es : the code here may be ok for now, but we >> would have to revert it again later, should `MergeStore` eventually do the >> trick. > > @eme64 > > simple stand-alone java > > import jdk.internal.misc.Unsafe; > > public class PutCharTest { > static final Unsafe UNSAFE = Unsafe.getUnsafe(); > > static void putCharsAt(byte[] val, int index, int c1, int c2, int c3, int > c4) { > putChar(val, index, (char)(c1)); > putChar(val, index + 1, (char)(c2)); > putChar(val, index + 2, (char)(c3)); > putChar(val, index + 3, (char)(c4)); > } > > static void putChar(byte[] bytes, int index, int c) { > UNSAFE.putChar(bytes, Unsafe.ARRAY_BYTE_BASE_OFFSET + (index << 1), > (char)(c)); > } > > static void putChar0(byte[] bytes, int index, int c) { > UNSAFE.putByte(bytes, Unsafe.ARRAY_BYTE_BASE_OFFSET + (index << 1), > (byte)(c)); > UNSAFE.putByte(bytes, Unsafe.ARRAY_BYTE_BASE_OFFSET + (index << 1) + > 1, (byte)(c << 8)); > } > > static void putNull(byte[] bytes, int index) { > putCharsAt(bytes, index, 'n', 'u', 'l', 'l'); > } > > public static void main(String[] args) { > for (int i = 0; i < 5; i++) { > testNull(); > } > > System.out.println("done"); > } > > private static void testNull() { > byte[] bytes = new byte[8192]; > > for (int i = 0; i < 1000; i++) { > int index = 0; > for (int j = 0; j < 1024; j++) { > putNull(bytes, index); > index += 4; > } > } > } > } As long as @wenshao is OK with that I'm happy to let this wait. I'll have limited availability myself in the coming weeks. - PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2192459124
Re: RFR: 8332522: SwitchBootstraps::mappedEnumLookup constructs unused array
On Wed, 26 Jun 2024 13:02:51 GMT, Jan Lahoda wrote: >> test/micro/org/openjdk/bench/java/lang/runtime/SwitchEnum.java line 57: >> >>> 55: for (E e : inputs) { >>> 56: sum += switch (e) { >>> 57: case null -> -1; >> >> As this `null` case adds a case relative to the `-Traditional` test then >> maybe removing one of the `E0, E1, ...` cases would make the test a little >> bit more apples-to-apples? > > Using `case null -> ` will push javac to use the new code, but all switches > do some kind of null check for the selector value. The difference is that if > there's no `case null`, there will be `Objects.requireNonNull` generated for > the selector value (which will throw an NPE if the value is `null`), while > here it will jump to the given case. > > So, `case null` does not have the same weight as a normal case, so I don't > think it would be fair to remove a full case to compensate for it. Fair enough, and I guess ~1.6ns/op is reasonable overhead for the added semantics. - PR Review Comment: https://git.openjdk.org/jdk/pull/19906#discussion_r1655364063
Re: RFR: 8334328: Reduce object allocation for FloatToDecimal and DoubleToDecimal [v20]
On Wed, 26 Jun 2024 14:55:44 GMT, Shaojin Wen wrote: >> The current versions of FloatToDecimal and DoubleToDecimal allocate >> additional objects. Reducing these allocations can improve the performance >> of Float/Double.toString and AbstractStringBuilder's append(float/double). >> >> This patch is just a code refactoring to reduce object allocation, but does >> not change the Float/Double to decimal algorithm. >> >> The following code comments the allocated objects to be removed. >> >> >> class FloatToDecimal { >> public static String toString(float v) { >> // allocate object FloatToDecimal >> return new FloatToDecimal().toDecimalString(v); >> } >> >> public static Appendable appendTo(float v, Appendable app) >> throws IOException { >> // allocate object FloatToDecimal >> return new FloatToDecimal().appendDecimalTo(v, app); >> } >> >> private Appendable appendDecimalTo(float v, Appendable app) >> throws IOException { >> switch (toDecimal(v)) { >> case NON_SPECIAL: >> // allocate object char[] >> char[] chars = new char[index + 1]; >> for (int i = 0; i < chars.length; ++i) { >> chars[i] = (char) bytes[i]; >> } >> if (app instanceof StringBuilder builder) { >> return builder.append(chars); >> } >> if (app instanceof StringBuffer buffer) { >> return buffer.append(chars); >> } >> for (char c : chars) { >> app.append(c); >> } >> return app; >> // ... >> } >> } >> } >> >> class DoubleToDecimal { >> public static String toString(double v) { >> // allocate object DoubleToDecimal >> return new DoubleToDecimal(false).toDecimalString(v); >> } >> >> public static Appendable appendTo(double v, Appendable app) >> throws IOException { >> // allocate object DoubleToDecimal >> return new DoubleToDecimal(false).appendDecimalTo(v, app); >> } >> >> private Appendable appendDecimalTo(double v, Appendable app) >> throws IOException { >> switch (toDecimal(v, null)) { >> case NON_SPECIAL: >> // allocate object char[] >> char[] chars = new char[index + 1]; >> for (int i = 0; i < chars.length; ++i) { >> chars[i] = (char) bytes[i]; >> } >> ... > > Shaojin Wen has updated the pull request incrementally with one additional > commit since the last revision: > > 1. revert code change. > 2. comment remove space LGTM2 - Marked as reviewed by redestad (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19730#pullrequestreview-2142672406
Re: RFR: 8334437: De-duplicate ProxyMethod list creation
On Tue, 18 Jun 2024 07:41:26 GMT, Claes Redestad wrote: > Simple refactoring to extract identical, simple lambda expressions. Improve > code clarity and reduce classes generated. Thanks! - PR Comment: https://git.openjdk.org/jdk/pull/19760#issuecomment-2191894639
Integrated: 8334437: De-duplicate ProxyMethod list creation
On Tue, 18 Jun 2024 07:41:26 GMT, Claes Redestad wrote: > Simple refactoring to extract identical, simple lambda expressions. Improve > code clarity and reduce classes generated. This pull request has now been integrated. Changeset: 5883a20b Author: Claes Redestad URL: https://git.openjdk.org/jdk/commit/5883a20b822bb8acb719076e4f7abee8403061cb Stats: 10 lines in 1 file changed: 4 ins; 4 del; 2 mod 8334437: De-duplicate ProxyMethod list creation Reviewed-by: asotona, liach - PR: https://git.openjdk.org/jdk/pull/19760
Re: RFR: 8333893: Optimization for StringBuilder append boolean & null [v4]
On Tue, 25 Jun 2024 17:19:28 GMT, Emanuel Peter wrote: > I filed: > > [JDK-8335113](https://bugs.openjdk.org/browse/JDK-8335113): C2 MergeStores: > allow merging of putChar and other larger stores on smaller array elements Great! Should we hold off on this optimization and see if we can avoid the need for `Unsafe` here - or go ahead integrating this PR and leave it to JDK-8335113 to revert any no-longer-needed changes made here? - PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2191878373
Re: RFR: 8334328: Reduce object allocation for FloatToDecimal and DoubleToDecimal [v18]
On Wed, 26 Jun 2024 13:23:14 GMT, Shaojin Wen wrote: >> src/java.base/share/classes/jdk/internal/math/ToDecimal.java line 47: >> >>> 45: static final int NAN = 5 << 8; >>> 46: >>> 47: static final byte LATIN1 = 0; >> >> I think this somewhat unnecessarily copies names and internal implementation >> details from `String` (where the `coder` value is used both as a >> pseudo-boolean and as a shift operand, which is not applicable here). In >> this context we could use something like `private final boolean latin1;` >> (all uses of `coder` are just `if (coder == LATIN1)` so it would be >> simplified to `if (latin1)`) > > I have written a version using boolean locally, but because this class is > mainly used by String and StringBuilder, it uses the same style as > String/StringBuilder. Yeah, I'm just musing about the perils of leaking/replicating those implementation details, but as you say this is internal code only used by `String` and `StringBuilder` to it's loosely part of that complex. Not requesting any changes here for now.. - PR Review Comment: https://git.openjdk.org/jdk/pull/19730#discussion_r1654924418
Re: RFR: 8334328: Reduce object allocation for FloatToDecimal and DoubleToDecimal [v18]
On Tue, 25 Jun 2024 17:29:04 GMT, Shaojin Wen wrote: >> The current versions of FloatToDecimal and DoubleToDecimal allocate >> additional objects. Reducing these allocations can improve the performance >> of Float/Double.toString and AbstractStringBuilder's append(float/double). >> >> This patch is just a code refactoring to reduce object allocation, but does >> not change the Float/Double to decimal algorithm. >> >> The following code comments the allocated objects to be removed. >> >> >> class FloatToDecimal { >> public static String toString(float v) { >> // allocate object FloatToDecimal >> return new FloatToDecimal().toDecimalString(v); >> } >> >> public static Appendable appendTo(float v, Appendable app) >> throws IOException { >> // allocate object FloatToDecimal >> return new FloatToDecimal().appendDecimalTo(v, app); >> } >> >> private Appendable appendDecimalTo(float v, Appendable app) >> throws IOException { >> switch (toDecimal(v)) { >> case NON_SPECIAL: >> // allocate object char[] >> char[] chars = new char[index + 1]; >> for (int i = 0; i < chars.length; ++i) { >> chars[i] = (char) bytes[i]; >> } >> if (app instanceof StringBuilder builder) { >> return builder.append(chars); >> } >> if (app instanceof StringBuffer buffer) { >> return buffer.append(chars); >> } >> for (char c : chars) { >> app.append(c); >> } >> return app; >> // ... >> } >> } >> } >> >> class DoubleToDecimal { >> public static String toString(double v) { >> // allocate object DoubleToDecimal >> return new DoubleToDecimal(false).toDecimalString(v); >> } >> >> public static Appendable appendTo(double v, Appendable app) >> throws IOException { >> // allocate object DoubleToDecimal >> return new DoubleToDecimal(false).appendDecimalTo(v, app); >> } >> >> private Appendable appendDecimalTo(double v, Appendable app) >> throws IOException { >> switch (toDecimal(v, null)) { >> case NON_SPECIAL: >> // allocate object char[] >> char[] chars = new char[index + 1]; >> for (int i = 0; i < chars.length; ++i) { >> chars[i] = (char) bytes[i]; >> } >> ... > > Shaojin Wen has updated the pull request incrementally with one additional > commit since the last revision: > > from @liach: use s.getBytes for performance src/java.base/share/classes/jdk/internal/math/ToDecimal.java line 47: > 45: static final int NAN = 5 << 8; > 46: > 47: static final byte LATIN1 = 0; I think this somewhat unnecessarily copies names and internal implementation details from `String` (where the `coder` value is used both as a pseudo-boolean and as a shift operand, which is not applicable here). In this context we could use something like `private final boolean latin1;` (all uses of `coder` are just `if (coder == LATIN1)` so it would be simplified to `if (latin1)`) - PR Review Comment: https://git.openjdk.org/jdk/pull/19730#discussion_r1654812101
Re: RFR: 8332522: SwitchBootstraps::mappedEnumLookup constructs unused array
On Wed, 26 Jun 2024 12:32:20 GMT, Jan Lahoda wrote: > For general pattern matching switches, the `SwitchBootstraps` class currently > generates a cascade of `if`-like statements, computing the correct target > case index for the given input. > > There is one special case which permits a relatively easy faster handling, > and that is when all the case labels case enum constants (but the switch is > still a "pattern matching" switch, as tranditional enum switches do not go > through `SwitchBootstraps`). Like: > > > enum E {A, B, C} > E e = ...; > switch (e) { > case null -> {} > case A a -> {} > case C c -> {} > case B b -> {} > } > > > We can create an array mapping the runtime ordinal to the appropriate case > number, which is somewhat similar to what javac is doing for ordinary > switches over enums. > > The `SwitchBootstraps` class was trying to do so, when the restart index is > zero, but failed to do so properly, so that code is not used (and does not > actually work properly). > > This patch is trying to fix that - when all the case labels are enum > constants, an array mapping the runtime enum ordinals to the case number will > be created (lazily), for restart index == 0. And this map will then be used > to quickly produce results for the given input. E.g. for the case above, the > mapping will be `{0 -> 0, 1 -> 2, 2 -> 1}` (meaning `{A -> 0, B -> 2, C -> > 1}`). > > When the restart index is != 0 (i.e. when there's a guard in the switch, and > the guard returned `false`), the if cascade will be generated lazily and > used, as in the general case. If it would turn out there are significant > enum-only switches with guards/restart index != 0, we could improve there as > well, by generating separate mappings for every (used) restart index. > > I believe the current tests cover the code functionally sufficiently - see > `SwitchBootstrapsTest.testEnums`. It is only that the tests do not (and > regression tests cannot easily, I think) differentiate whether the > special-case or generic implementation is used. > > I've added a new microbenchmark attempting to demonstrate the difference. > There are two benchmarks, both having only enum constants as case labels. > One, `enumSwitchTraditional` is an "old" switch, desugared fully by javac, > the other, `enumSwitchWithBootstrap` is an equivalent switch that uses the > `SwitchBootstraps`. Before this patch, I was getting values like: > > Benchmark Mode Cnt Score Error Units > SwitchEnum.enumSwitchTraditionalavgt 15 11.719 ± 0.333 ns/op > SwitchEnum.enumSwitchWithBootstrap avgt 15 24.668 ± 1.037 ... Marked as reviewed by redestad (Reviewer). src/java.base/share/classes/java/lang/runtime/SwitchBootstraps.java line 280: > 278: > 279: Class enumClass = invocationType.parameterType(0); > 280: labels = Stream.of(labels).map(l -> convertEnumConstants(lookup, > enumClass, l)).toArray(); You could create `EnumDesc[] enumDescLabels` here and remove the `Arrays.copyOf` on line 290. The `labels.clone()` on line 277 also seem redundant since we only iterate over the `labels` argument once. While this case is likely fine I generally recommend using a minimal amount of streams/lambdas in bootstrap sensitive code like these dynamic bootstraps methods. - PR Review: https://git.openjdk.org/jdk/pull/19906#pullrequestreview-2141731748 PR Review Comment: https://git.openjdk.org/jdk/pull/19906#discussion_r1654771207
Re: RFR: 8332522: SwitchBootstraps::mappedEnumLookup constructs unused array
On Wed, 26 Jun 2024 12:32:20 GMT, Jan Lahoda wrote: > For general pattern matching switches, the `SwitchBootstraps` class currently > generates a cascade of `if`-like statements, computing the correct target > case index for the given input. > > There is one special case which permits a relatively easy faster handling, > and that is when all the case labels case enum constants (but the switch is > still a "pattern matching" switch, as tranditional enum switches do not go > through `SwitchBootstraps`). Like: > > > enum E {A, B, C} > E e = ...; > switch (e) { > case null -> {} > case A a -> {} > case C c -> {} > case B b -> {} > } > > > We can create an array mapping the runtime ordinal to the appropriate case > number, which is somewhat similar to what javac is doing for ordinary > switches over enums. > > The `SwitchBootstraps` class was trying to do so, when the restart index is > zero, but failed to do so properly, so that code is not used (and does not > actually work properly). > > This patch is trying to fix that - when all the case labels are enum > constants, an array mapping the runtime enum ordinals to the case number will > be created (lazily), for restart index == 0. And this map will then be used > to quickly produce results for the given input. E.g. for the case above, the > mapping will be `{0 -> 0, 1 -> 2, 2 -> 1}` (meaning `{A -> 0, B -> 2, C -> > 1}`). > > When the restart index is != 0 (i.e. when there's a guard in the switch, and > the guard returned `false`), the if cascade will be generated lazily and > used, as in the general case. If it would turn out there are significant > enum-only switches with guards/restart index != 0, we could improve there as > well, by generating separate mappings for every (used) restart index. > > I believe the current tests cover the code functionally sufficiently - see > `SwitchBootstrapsTest.testEnums`. It is only that the tests do not (and > regression tests cannot easily, I think) differentiate whether the > special-case or generic implementation is used. > > I've added a new microbenchmark attempting to demonstrate the difference. > There are two benchmarks, both having only enum constants as case labels. > One, `enumSwitchTraditional` is an "old" switch, desugared fully by javac, > the other, `enumSwitchWithBootstrap` is an equivalent switch that uses the > `SwitchBootstraps`. Before this patch, I was getting values like: > > Benchmark Mode Cnt Score Error Units > SwitchEnum.enumSwitchTraditionalavgt 15 11.719 ± 0.333 ns/op > SwitchEnum.enumSwitchWithBootstrap avgt 15 24.668 ± 1.037 ... test/micro/org/openjdk/bench/java/lang/runtime/SwitchEnum.java line 57: > 55: for (E e : inputs) { > 56: sum += switch (e) { > 57: case null -> -1; As this `null` case adds a case relative to the `-Traditional` test then maybe removing one of the `E0, E1, ...` cases would make the test a little bit more apples-to-apples? - PR Review Comment: https://git.openjdk.org/jdk/pull/19906#discussion_r1654753822
Re: RFR: 8334437: De-duplicate ProxyMethod list creation
On Tue, 18 Jun 2024 07:41:26 GMT, Claes Redestad wrote: > Simple refactoring to extract identical, simple lambda expressions. Improve > code clarity and reduce classes generated. Friendly reminder that this trivial improvement needs a reviewer. - PR Comment: https://git.openjdk.org/jdk/pull/19760#issuecomment-2191352440
Re: RFR: 8333265: De-duplicate method references in java.util.stream.FindOps [v3]
On Tue, 18 Jun 2024 10:00:46 GMT, Claes Redestad wrote: >> Extracting duplicate method references to static field reduce proxy class >> spinning and loading. In this case 2 less classes loaded when using >> `findAny()` on each type of stream. > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Fixes Friendly reminder that this improvement needs a Reviewer. While it'd be great if javac can make this redundant this is adding overhead in the meantime, and the temporary workaround doesn't obfuscate the code. - PR Comment: https://git.openjdk.org/jdk/pull/19477#issuecomment-2191356136
Re: RFR: 8334872: BigEndian: java/lang/invoke/condy Tests failing since JDK-8294960
On Mon, 24 Jun 2024 16:01:41 GMT, Adam Sotona wrote: > After JDK-8294960 is java.lang.invoke.ClassSpecializer using lamdas for code > generation and unfortunately it causes StackOverflow on BigEndian platforms. > > This patch converts all lambdas in ClassSpecializer into anonymous inner > classes. > > Please review and test on a BigEndian platform. > > Thanks, > Adam Looks like there is a bootstrap cycle initializing this lambda in ClassBuilder: https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/classfile/ClassBuilder.java#L202 I'm not sure, but it looks like the PPC VM might be passing args to the bootstrap method differently, which triggers adapting code in BootstrapMethodInvoker::pushMePullYou - this might be enough to start this cycle. - PR Comment: https://git.openjdk.org/jdk/pull/19863#issuecomment-2188298757
Re: RFR: 8334872: BigEndian: java/lang/invoke/condy Tests failing since JDK-8294960
On Mon, 24 Jun 2024 16:01:41 GMT, Adam Sotona wrote: > After JDK-8294960 is java.lang.invoke.ClassSpecializer using lamdas for code > generation and unfortunately it causes StackOverflow on BigEndian platforms. > > This patch converts all lambdas in ClassSpecializer into anonymous inner > classes. > > Please review and test on a BigEndian platform. > > Thanks, > Adam Marked as reviewed by redestad (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/19863#pullrequestreview-2136378062
Re: RFR: 8291966: SwitchBootstrap.typeSwitch could be faster [v3]
On Wed, 3 Apr 2024 16:17:57 GMT, ExE Boss wrote: >> Jan Lahoda has updated the pull request with a new target base due to a >> merge or a rebase. The pull request now contains six commits: >> >> - Reflecting review feedback. >> - Merge branch 'master' into JDK-8291966 >> - Adding comments >> - Improving performance >> - Merge branch 'master' into JDK-8291966 >> - 8291966: SwitchBootstrap.typeSwitch could be faster > > src/java.base/share/classes/java/lang/runtime/SwitchBootstraps.java line 389: > >> 387: } >> 388: } >> 389: return enumMap.map[value.ordinal()]; > > `enumMap.map` never gets set before this line. There's a bug filed for this already: https://bugs.openjdk.org/browse/JDK-8332522 @lahodaj explained that this broken code is part of an optimization which is never attempted (IIRC due to the bug you noted on line 327). JDK-833522 seem like a good place to continue this conversation..? - PR Review Comment: https://git.openjdk.org/jdk/pull/9779#discussion_r1649761928
Re: RFR: 8333893: Optimization for StringBuilder append boolean & null [v4]
On Thu, 13 Jun 2024 07:36:34 GMT, Emanuel Peter wrote: >> Thanks to @eme64 's patience and help, I found a way to use MergeStore >> without doing boundary checking. >> >> It would be even better if Unsafe.putChar could be used for MergeStore in >> the future. >> >> ## 1. JavaCode >> >> class StringUTF16 { >> public static void putCharsAt(byte[] value, int i, char c1, char c2, >> char c3, char c4) { >> // Don't use the putChar method, Its instrinsic will cause C2 unable >> to combining values into larger stores. >> putChar1(value, i, c1); >> putChar1(value, i + 1, c2); >> putChar1(value, i + 2, c3); >> putChar1(value, i + 3, c4); >> } >> >> public static void putCharsAt(byte[] value, int i, char c1, char c2, >> char c3, char c4, char c5) { >> // Don't use the putChar method, Its instrinsic will cause C2 unable >> to combining values into larger stores. >> putChar1(value, i, c1); >> putChar1(value, i + 1, c2); >> putChar1(value, i + 2, c3); >> putChar1(value, i + 3, c4); >> putChar1(value, i + 4, c5); >> } >> >> static void putChar1(byte[] value, int i, char c) { >> int address = Unsafe.ARRAY_BYTE_BASE_OFFSET + (i << 1); >> Unsafe.getUnsafe().putByte(value, address, (byte)(c >> >> HI_BYTE_SHIFT)); >> Unsafe.getUnsafe().putByte(value, address + 1, (byte)(c >> >> LO_BYTE_SHIFT)); >> } >> } >> >> >> ## 2. Benchmark Numbers >> The performance numbers under MacBookPro M1 Max are as follows: >> >> -Benchmark Mode Cnt Score Error Units >> -StringBuilders.toStringCharWithNull8Latin1 avgt 15 7.073 ? 0.010 ns/op >> -StringBuilders.toStringCharWithNull8Utf16 avgt 15 9.298 ? 0.015 ns/op >> -StringBuilders.toStringCharWithBool8Latin1 avgt 15 7.387 ? 0.021 ns/op >> -StringBuilders.toStringCharWithBool8Utf16 avgt 15 9.615 ? 0.011 ns/op >> >> +Benchmark Mode Cnt Score Error Units >> +StringBuilders.toStringCharWithNull8Latin1 avgt 15 5.628 ? 0.017 ns/op >> +25.67% >> +StringBuilders.toStringCharWithNull8Utf16 avgt 15 6.873 ? 0.026 ns/op >> +35.28% >> +StringBuilders.toStringCharWithBool8Latin1 avgt 15 6.480 ? 0.077 ns/op >> +13.99% >> +StringBuilders.toStringCharWithBool8Utf16 avgt 15 7.456 ? 0.059 ns/op >> +28.95% >> >> >> ## 3. TraceMergeStores >> >> [TraceMergeStores]: Replace >> 65 StoreB === 54 7 64 12 [[ 87 ]] @byte[int:>=0] >> (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=4; unsafe >> Memory: @byte[int:>=0] (java/la... > > @wenshao I'm glad we figured it out a bit, and I hope you learned a thing or > two :) > >> It would be even better if Unsafe.putChar could be used for MergeStore in >> the future. > > The `_putCharStringU` intrinsic uses > `LibraryCallKit::inline_string_char_access`. It seems to generate IR nodes, I > speculate it could be `StoreC`. We'd have to do more digging to see why that > pattern is not accepted by `MergeStore`. > > @wenshao I'm not sure if everything is right with the bounds checking, but I > leave this to the library folks to review. What you will definately need is > benchmarking not just on `M1`, but also `x86-64` and other `aarch64` > architectures. I'm not opposed to accepting this patch as-is, but I think we should do so with an eye towards reverting if we figure out a way to improve the `putChar` intrinsic so that it doesn't block merge store optimization. What do you think @eme64? As the need for the changes in [b5ad8e7](https://github.com/openjdk/jdk/pull/19626/commits/b5ad8e70928c547d134d0e4a532441cad9a7e4a2) showed added code complexity (like here) can be detrimental to performance, and if the `putChar` can be improved we might see benefits in more places. - PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2180308686
Re: RFR: 8333893: Optimization for StringBuilder append boolean & null [v12]
On Mon, 17 Jun 2024 05:53:45 GMT, Shaojin Wen wrote: >> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into >> primitive arrays by combining values into larger stores. >> >> This PR rewrites the code of appendNull and append(boolean) methods so that >> these two methods can be optimized by C2. > > Shaojin Wen has updated the pull request incrementally with one additional > commit since the last revision: > > Utf16 case remove `append first utf16 char` src/java.base/share/classes/java/lang/StringLatin1.java line 832: > 830: // Don't use the putChar method, Its instrinsic will cause C2 > unable to combining values into larger stores. > 831: long address = Unsafe.ARRAY_BYTE_BASE_OFFSET + index; > 832: Unsafe UNSAFE = Unsafe.getUnsafe(); Perhaps better to put in a `private static final` field - PR Review Comment: https://git.openjdk.org/jdk/pull/19626#discussion_r1647313459
Re: RFR: 8294960: Convert java.base/java.lang.invoke package to use the Classfile API to generate lambdas and method handles [v25]
On Wed, 19 Jun 2024 09:08:35 GMT, Adam Sotona wrote: >> java.base java.lang.invoke package heavily uses ASM to generate lambdas and >> method handles. >> >> This patch converts ASM calls to Classfile API. >> >> This PR is continuation of https://github.com/openjdk/jdk/pull/12945 >> >> Any comments and suggestions are welcome. >> >> Please review. >> >> Thank you, >> Adam > > Adam Sotona has updated the pull request incrementally with two additional > commits since the last revision: > > - removed empty line > - problem-listed runtime/ClassInitErrors/TestStackOverflowDuringInit.java No objections as long as there's an understanding that we'll need to work on some of the startup/warmup regressions this will cause. We have preliminary data on a subset of benchmarks which suggests it's a combination of loading more classes and spreading the work across a larger set of methods, which means we need to warm and JIT more methods to reach peak performance. ASM is a smaller library with a more minimalist approach, so it loads faster and gets up and running a bit faster. We'll probably need to file regressions as they are detected for tracking purposes, but I expect we'll continue chipping away at and improving the classfile API in the months to come. - Marked as reviewed by redestad (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17108#pullrequestreview-2127764187
Re: RFR: 8333854: IllegalAccessError with proxies after JDK-8332457 [v5]
On Thu, 13 Jun 2024 14:11:48 GMT, Chen Liang wrote: >> Please review this patch that fixes a critical issue that breaks some Proxy >> usages. >> >> Since the problematic patch from before cannot be backed out, this patch >> aims to emulate the old behavior before. A diff between before the >> problematic patch and this patch is available at >> https://gist.github.com/7565b2091008f561eb0ada019bc5e517, generated by >> running `git diff 326dbb1b139dd1ec1b8605339b91697cdf49da9a -- >> src/java.base/share/classes/java/lang/reflect/ProxyGenerator.java`. > > Chen Liang has updated the pull request incrementally with one additional > commit since the last revision: > > methodField can be initialized eagerly Marked as reviewed by redestad (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/19615#pullrequestreview-2125576655
Re: RFR: 8294960: Convert java.base/java.lang.invoke package to use the Classfile API to generate lambdas and method handles [v21]
On Tue, 18 Jun 2024 13:22:45 GMT, Adam Sotona wrote: >> java.base java.lang.invoke package heavily uses ASM to generate lambdas and >> method handles. >> >> This patch converts ASM calls to Classfile API. >> >> This PR is continuation of https://github.com/openjdk/jdk/pull/12945 >> >> Any comments and suggestions are welcome. >> >> Please review. >> >> Thank you, >> Adam > > Adam Sotona has updated the pull request incrementally with four additional > commits since the last revision: > > - Merge pull request #8 from cl4es/serialization_hostile > >SerializationHostileMethod > - Reduce gratuitous code movement > - Inline Consumer into generateSer.. method, move seldom-used > serialization support constants to new holder > - SerializationHostileMethod I've done a final pass over this. Code changes overall look good - great even - and the only caveat is that there's going to be some added startup overhead in some apps from integrating this. A minimal app with a lambda expression might see a 2-3ms hit, for example. While there are ideas on how to trim down such overheads further I think this is a good time to draw a line in the sand and get this change integrated and exposed to a larger set of testing. - Marked as reviewed by redestad (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17108#pullrequestreview-2125569570
Re: RFR: 8294960: Convert java.base/java.lang.invoke package to use the Classfile API to generate lambdas and method handles [v21]
On Tue, 18 Jun 2024 13:22:45 GMT, Adam Sotona wrote: >> java.base java.lang.invoke package heavily uses ASM to generate lambdas and >> method handles. >> >> This patch converts ASM calls to Classfile API. >> >> This PR is continuation of https://github.com/openjdk/jdk/pull/12945 >> >> Any comments and suggestions are welcome. >> >> Please review. >> >> Thank you, >> Adam > > Adam Sotona has updated the pull request incrementally with four additional > commits since the last revision: > > - Merge pull request #8 from cl4es/serialization_hostile > >SerializationHostileMethod > - Reduce gratuitous code movement > - Inline Consumer into generateSer.. method, move seldom-used > serialization support constants to new holder > - SerializationHostileMethod src/java.base/share/classes/java/lang/invoke/InnerClassLambdaMetafactory.java line 108: > 106: > 107: // condy to load implMethod from class data > 108: implMethodCondy = DynamicConstantDesc.ofNamed(BSM_CLASS_DATA, > DEFAULT_NAME, CD_MethodHandle); Pre-existing tiny wart, but this one seem to be used only exceptionally (see the comment/code around line 183) so it's probably better to inline the code at the usage site rather than have a constant. - PR Review Comment: https://git.openjdk.org/jdk/pull/17108#discussion_r1644473614
Re: RFR: 8294960: Convert java.base/java.lang.invoke package to use the Classfile API to generate lambdas and method handles [v18]
On Mon, 17 Jun 2024 11:21:37 GMT, Adam Sotona wrote: >> LMF will BMH and `ClassSpecializer`, but does not call into this code >> normally as the simple bound method handle needed by LMF is statically >> defined (`BoundMethodHandle.Species_L`). To be perfectly safe - and to keep >> lambda/indy/condy bootstrap overheads to a minumum - I'll continue >> recommending the desugaring of lambdas in java.lang.invoke > > It results in not very nice code, but makes sense. I'll note that replacing a lambda with an anonymous class has a very marginal effect on startup when amortizing the cost of setting up the infrastructure for spinning classes. What can have a decent effect is if we can replace a few lambdas/anon classes with more imperative code that spins up/loads fewer classes, e.g., desugaring streams to for loops. - PR Review Comment: https://git.openjdk.org/jdk/pull/17108#discussion_r1644202229
Re: RFR: 8333265: De-duplicate method references in java.util.stream.FindOps [v3]
> Extracting duplicate method references to static field reduce proxy class > spinning and loading. In this case 2 less classes loaded when using > `findAny()` on each type of stream. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Fixes - Changes: - all: https://git.openjdk.org/jdk/pull/19477/files - new: https://git.openjdk.org/jdk/pull/19477/files/a223e608..d83a1c96 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=19477=02 - incr: https://webrevs.openjdk.org/?repo=jdk=19477=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19477.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19477/head:pull/19477 PR: https://git.openjdk.org/jdk/pull/19477
Re: RFR: 8333265: De-duplicate method references in java.util.stream.FindOps [v2]
> Extracting duplicate method references to static field reduce proxy class > spinning and loading. In this case 2 less classes loaded when using > `findAny()` on each type of stream. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: @ExE-Boss review - Changes: - all: https://git.openjdk.org/jdk/pull/19477/files - new: https://git.openjdk.org/jdk/pull/19477/files/cff6d034..a223e608 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=19477=01 - incr: https://webrevs.openjdk.org/?repo=jdk=19477=00-01 Stats: 40 lines in 1 file changed: 11 ins; 0 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/19477.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19477/head:pull/19477 PR: https://git.openjdk.org/jdk/pull/19477
RFR: 8334437: De-duplicate ProxyMethod list creation
Simple refactoring to extract identical, simple lambda expressions. Improve code clarity and reduce classes generated. - Commit messages: - Simplify - Refactor ProxyMethod list creation Changes: https://git.openjdk.org/jdk/pull/19760/files Webrev: https://webrevs.openjdk.org/?repo=jdk=19760=00 Issue: https://bugs.openjdk.org/browse/JDK-8334437 Stats: 10 lines in 1 file changed: 4 ins; 4 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19760.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19760/head:pull/19760 PR: https://git.openjdk.org/jdk/pull/19760
Re: RFR: 8334328: Reduce object allocation for FloatToDecimal and DoubleToDecimal [v8]
On Sat, 15 Jun 2024 13:27:41 GMT, Shaojin Wen wrote: >> The current versions of FloatToDecimal and DoubleToDecimal allocate >> additional objects. Reducing these allocations can improve the performance >> of Float/Double.toString and AbstractStringBuilder's append(float/double). >> >> This patch is just a code refactoring to reduce object allocation, but does >> not change the Float/Double to decimal algorithm. >> >> The following code comments the allocated objects to be removed. >> >> >> class FloatToDecimal { >> public static String toString(float v) { >> // allocate object FloatToDecimal >> return new FloatToDecimal().toDecimalString(v); >> } >> >> public static Appendable appendTo(float v, Appendable app) >> throws IOException { >> // allocate object FloatToDecimal >> return new FloatToDecimal().appendDecimalTo(v, app); >> } >> >> private Appendable appendDecimalTo(float v, Appendable app) >> throws IOException { >> switch (toDecimal(v)) { >> case NON_SPECIAL: >> // allocate object char[] >> char[] chars = new char[index + 1]; >> for (int i = 0; i < chars.length; ++i) { >> chars[i] = (char) bytes[i]; >> } >> if (app instanceof StringBuilder builder) { >> return builder.append(chars); >> } >> if (app instanceof StringBuffer buffer) { >> return buffer.append(chars); >> } >> for (char c : chars) { >> app.append(c); >> } >> return app; >> // ... >> } >> } >> } >> >> class DoubleToDecimal { >> public static String toString(double v) { >> // allocate object DoubleToDecimal >> return new DoubleToDecimal(false).toDecimalString(v); >> } >> >> public static Appendable appendTo(double v, Appendable app) >> throws IOException { >> // allocate object DoubleToDecimal >> return new DoubleToDecimal(false).appendDecimalTo(v, app); >> } >> >> private Appendable appendDecimalTo(double v, Appendable app) >> throws IOException { >> switch (toDecimal(v, null)) { >> case NON_SPECIAL: >> // allocate object char[] >> char[] chars = new char[index + 1]; >> for (int i = 0; i < chars.length; ++i) { >> chars[i] = (char) bytes[i]; >> } >> ... > > Shaojin Wen has updated the pull request incrementally with one additional > commit since the last revision: > > code format Just suggesting some improvements src/java.base/share/classes/jdk/internal/math/DoubleToDecimal.java line 185: > 183: * Returns > 184: * Combine type and size, the first byte is size, the second > byte is type > 185: * Suggestion: * Returns size in the lower byte, type in the high byte, where type is src/java.base/share/classes/jdk/internal/math/DoubleToDecimal.java line 190: > 188: * PLUS_INFiff v is POSITIVE_INFINITY > 189: * MINUS_INF iff v is NEGATIVE_INFINITY > 190: * NAN iff v is NaN Suggestion: * NAN iff v is NaN * otherwise NON_SPECIAL src/java.base/share/classes/jdk/internal/math/ToDecimal.java line 169: > 167: } > 168: > 169: /* Using the deprecated constructor enhances performance */ Enhances performance over what, `new String(str, 0, index, StandardCharsets.US_ASCII)`? Please specify. src/java.base/share/classes/jdk/internal/math/ToDecimal.java line 171: > 169: /* Using the deprecated constructor enhances performance */ > 170: @SuppressWarnings("deprecation") > 171: static String charsToString(byte[] str, int index) { This should be named something like `asciiBytesToString` now. Perhaps `ToDecimal.LATIN1` can be renamed `ASCII`, too. test/micro/org/openjdk/bench/java/lang/StringBuilders.java line 244: > 242: > 243: @Benchmark > 244: public int appendWithFloat8Latin1() { It would be interesting with a mixed micro to explore effects of increased number of implementors (looping over an array with `sbLatin1` and `sbUtf16`, and appending both floats and doubles in the same loop) `buf.delete(0, buf.length())` could be replaced with `buf.setLength(0)` in these. Might reduce a bit of overhead and focus micros . - Changes requested by redestad (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19730#pullrequestreview-2121584681 PR Review Comment: https://git.openjdk.org/jdk/pull/19730#discussion_r1641984296 PR Review Comment: https://git.openjdk.org/jdk/pull/19730#discussion_r1641984363 PR Review Comment: https://git.openjdk.org/jdk/pull/19730#discussion_r1641979604 PR Review Comment:
Re: RFR: 8333893: Optimization for StringBuilder append boolean & null [v8]
On Thu, 13 Jun 2024 02:06:42 GMT, Shaojin Wen wrote: >> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into >> primitive arrays by combining values into larger stores. >> >> This PR rewrites the code of appendNull and append(boolean) methods so that >> these two methods can be optimized by C2. > > Shaojin Wen has updated the pull request incrementally with two additional > commits since the last revision: > > - rename benchmark > - code format & use long address Replacing `StringUTF16.putChar` with an `Unsafe.putByte` construct seems safe since the former already depends on bounds checks having been done elsewhere. Replacing `byte[]` stores with `Unsafe.putByte` requires a thorough examination as that would drop implicit bounds checks. On my M1 laptop I get similar numbers: Name Cnt Base Error Test Error Unit Change StringBuilders.appendWithBool8Latin1 15 7,500 ± 0,072 6,112 ± 0,049 ns/op 1,23x (p = 0,000*) StringBuilders.appendWithBool8Utf16 15 9,650 ± 0,021 7,304 ± 0,047 ns/op 1,32x (p = 0,000*) StringBuilders.appendWithNull8Latin1 15 7,118 ± 0,033 5,158 ± 0,062 ns/op 1,38x (p = 0,000*) StringBuilders.appendWithNull8Utf16 15 9,336 ± 0,083 6,870 ± 0,069 ns/op 1,36x (p = 0,000*) * = significant On a linux-x64 workstation some of the micros regress, though: Name Cnt BaseErrorTest Error Unit Change StringBuilders.appendWithBool8Latin1 15 10.781 ± 0.236 16.279 ± 1.087 ns/op 0.66x (p = 0.000*) StringBuilders.appendWithBool8Utf16 15 22.942 ± 0.405 23.733 ± 0.792 ns/op 0.97x (p = 0.001*) StringBuilders.appendWithNull8Latin1 15 24.313 ± 10.479 24.397 ± 7.483 ns/op 1.00x (p = 0.979 ) StringBuilders.appendWithNull8Utf16 15 38.704 ± 5.972 27.542 ± 5.620 ns/op 1.41x (p = 0.000*) * = significant `-XX:+TraceMergeStores` seem to indicate the merging happens as it should in each case: [TraceMergeStores]: Replace 2493 StoreB === 1387 5370 2980 2939 [[ 1962 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=13; unsafe Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact[2] *, idx=13; !jvms: StringLatin1::putCharsAt @ bci:50 (line 834) AbstractStringBuilder::appendNull @ bci:34 (line 642) AbstractStringBuilder::append @ bci:5 (line 587) StringBuilder::append @ bci:2 (line 179) StringBuilders::appendWithNull8Latin1 @ bci:38 (line 267) StringBuilders_appendWithNull8Latin1_jmhTest::appendWithNull8Latin1_avgt_jmhStub @ bci:17 (line 190) 1962 StoreB === 1387 2493 2494 2442 [[ 1388 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=13; unsafe Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact[2] *, idx=13; !jvms: StringLatin1::putCharsAt @ bci:62 (line 835) AbstractStringBuilder::appendNull @ bci:34 (line 642) AbstractStringBuilder::append @ bci:5 (line 587) StringBuilder::append @ bci:2 (line 179) StringBuilders::appendWithNull8Latin1 @ bci:38 (line 267) StringBuilders_appendWithNull8Latin1_jmhTest::appendWithNull8Latin1_avgt_jmhStub @ bci:17 (line 190) 1388 StoreB === 1387 1962 1963 1341 [[ 833 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=13; unsafe Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact[2] *, idx=13; !jvms: StringLatin1::putCharsAt @ bci:77 (line 836) AbstractStringBuilder::appendNull @ bci:34 (line 642) AbstractStringBuilder::append @ bci:5 (line 587) StringBuilder::append @ bci:2 (line 179) StringBuilders::appendWithNull8Latin1 @ bci:38 (line 267) StringBuilders_appendWithNull8Latin1_jmhTest::appendWithNull8Latin1_avgt_jmhStub @ bci:17 (line 190) 833 StoreB === 1387 1388 1389 1341 [[ 5311 425 1400 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=13; unsafe Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact[2] *, idx=13; !jvms: StringLatin1::putCharsAt @ bci:92 (line 837) AbstractStringBuilder::appendNull @ bci:34 (line 642) AbstractStringBuilder::append @ bci:5 (line 587) StringBuilder::append @ bci:2 (line 179) StringBuilders::appendWithNull8Latin1 @ bci:38 (line 267) StringBuilders_appendWithNull8Latin1_jmhTest::appendWithNull8Latin1_avgt_jmhStub @ bci:17 (line 190) [TraceMergeStores]: with 6207 ConI === 0 [[ 6208 6210 ]] #int:1819047278 6210 StoreI === 1387 5370 2980 6207 [[ ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact[2] *, idx=13; mismatched Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact[2] *, idx=13; - PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2165167049
Re: RFR: 8294960: Convert java.base/java.lang.invoke package to use the Classfile API to generate lambdas and method handles [v18]
On Thu, 6 Jun 2024 11:41:05 GMT, Chen Liang wrote: >> Adam Sotona has updated the pull request incrementally with two additional >> commits since the last revision: >> >> - reverted static initialization of ConstantPoolBuilder and CP entries >> - fixed naming conventions > > src/java.base/share/classes/java/lang/invoke/ClassSpecializer.java line 616: > >> 614: final ClassDesc classDesc = ClassDesc.of(className0); >> 615: final ClassDesc superClassDesc = >> classDesc(speciesData.deriveSuperClass()); >> 616: return ClassFile.of().build(classDesc, clb -> { > > We need a confirmation here that these BMH species are only initialized after > LMF is ready. @cl4es Is a dependency on LMF from BMH safe? LMF will BMH and `ClassSpecializer`, but does not call into this code normally as the simple bound method handle needed by LMF is statically defined (`BoundMethodHandle.Species_L`). To be perfectly safe - and to keep lambda/indy/condy bootstrap overheads to a minumum - I'll continue recommending the desugaring of lambdas in java.lang.invoke - PR Review Comment: https://git.openjdk.org/jdk/pull/17108#discussion_r1637893254
Re: RFR: 8333854: IllegalAccessError with proxies after JDK-8332457 [v4]
On Wed, 12 Jun 2024 22:00:45 GMT, Chen Liang wrote: >> Please review this patch that fixes a critical issue that breaks some Proxy >> usages. >> >> Since the problematic patch from before cannot be backed out, this patch >> aims to emulate the old behavior before. A diff between before the >> problematic patch and this patch is available at >> https://gist.github.com/7565b2091008f561eb0ada019bc5e517, generated by >> running `git diff 326dbb1b139dd1ec1b8605339b91697cdf49da9a -- >> src/java.base/share/classes/java/lang/reflect/ProxyGenerator.java`. > > Chen Liang has updated the pull request incrementally with one additional > commit since the last revision: > > More rename and code style cleanup I think this looks better as you're reverting the problematic changes while retaining most of the benefits of JDK-8332457. src/java.base/share/classes/java/lang/reflect/ProxyGenerator.java line 676: > 674: * @param method The method for which to create a proxy > 675: */ > 676: private ProxyMethod(Method method, String methodFieldName) { Could we pass the `FieldRefEntry` directly? I don't see when we'd create a `ProxyMethod` without later calling `methodField()`. I think this would also remove the need to make `ProxyMethod` non-static. - Marked as reviewed by redestad (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19615#pullrequestreview-2115128124 PR Review Comment: https://git.openjdk.org/jdk/pull/19615#discussion_r1637805452
Re: RFR: 8333854: IllegalAccessError with proxies after JDK-8332457 [v2]
On Mon, 10 Jun 2024 04:08:41 GMT, Chen Liang wrote: >> Please review this patch that fixes a critical issue that breaks some Proxy >> usages. >> >> CONSTANT_Class and CONSTANT_MethodType must fail resolution for inaccessible >> package-private types per JVMS 5.4.3.1 and 5.4.3.5, yet such types can >> appear in method descriptors in the same class. The proposed way to bypass >> is to store the MethodType as its descriptor string in the bootstrap method >> arguments, and use MethodType.fromMethodDescriptorString to restore the >> arguments, which does not have this restriction and does not eagerly load >> the parameter classes. This case isn't covered by existing tests, so a new >> test has been added. > > Chen Liang has updated the pull request incrementally with one additional > commit since the last revision: > > Obtain classloader in security manager friendly code path Agree with Roger that static imports should be used sparingly (possibly never with .*). If other reviewers are happy with the semantic shift in when/where class loading can/will happen and which class loader will be used then I won't object. I'm just not enough of an expert in this area to tell if there might be unintended side-effects we need to account for. The added cost of spinning and calling these clinits means we might be better off reverting from a startup overhead perspective. JDK-8332457 was already a regression for some usage patterns. - PR Comment: https://git.openjdk.org/jdk/pull/19615#issuecomment-2163123616
Re: RFR: 8333893: Optimization for StringBuilder append boolean & null [v4]
On Wed, 12 Jun 2024 13:38:21 GMT, Shaojin Wen wrote: > In the AbstractStringBuilder#appendNull method, is it possible to not check > the bounds based on the information from ensureCapacityInternal? Perhaps you'd need something like a `Preconditions.checkIndex` in or before the 4-char `StringUTF16.putCharsAt` call to help eliminate excess bounds checks. Might work, but would be fragile and probably not easy to get right in all places. - PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2163093085
Re: RFR: 8333893: Optimization for StringBuilder append boolean & null [v4]
On Tue, 11 Jun 2024 11:35:28 GMT, Shaojin Wen wrote: >> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into >> primitive arrays by combining values into larger stores. >> >> This PR rewrites the code of appendNull and append(boolean) methods so that >> these two methods can be optimized by C2. > > Shaojin Wen has updated the pull request incrementally with one additional > commit since the last revision: > > revert Rewriting the intrinsic to emit nodes that don't trip up the MergeStore pass seem like a reasonable thing to reach for, then. Perhaps some of these cases could be helped by a `@ForceInline` or two to get the inlining necessary to allow the JIT to eliminate all but the outer bounds checks, but that is probably too fragile in this area given how performance sensitive many of these methods are. - PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2163065760
Re: RFR: 8333893: Optimization for StringBuilder append boolean & null [v4]
On Tue, 11 Jun 2024 11:35:28 GMT, Shaojin Wen wrote: >> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into >> primitive arrays by combining values into larger stores. >> >> This PR rewrites the code of appendNull and append(boolean) methods so that >> these two methods can be optimized by C2. > > Shaojin Wen has updated the pull request incrementally with one additional > commit since the last revision: > > revert Agreed we should look at obsoleting the `_putCharStringU` intrinsic. This is best done separately. If performance with the intrinsic disabled looks good here then I think this PR is good to go, even if temporarily subpar on the UTF16 tests. The microbenchmark will then serve as an extra verification step to show that disabling/removing the intrinsic is beneficial. - PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2162962297
Re: RFR: 8333893: Optimization for StringBuilder append boolean & null [v4]
On Tue, 11 Jun 2024 11:35:28 GMT, Shaojin Wen wrote: >> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into >> primitive arrays by combining values into larger stores. >> >> This PR rewrites the code of appendNull and append(boolean) methods so that >> these two methods can be optimized by C2. > > Shaojin Wen has updated the pull request incrementally with one additional > commit since the last revision: > > revert Good, disabling the intrinsic appears to enable the MergeStore optimization, and it looks like inlining happens here to get to 8-byte stores. What performance do you get now with that? - PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2162936067
Re: RFR: 8333893: Optimization for StringBuilder append boolean & null [v4]
On Tue, 11 Jun 2024 11:35:28 GMT, Shaojin Wen wrote: >> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into >> primitive arrays by combining values into larger stores. >> >> This PR rewrites the code of appendNull and append(boolean) methods so that >> these two methods can be optimized by C2. > > Shaojin Wen has updated the pull request incrementally with one additional > commit since the last revision: > > revert And what happens when you run that last thing with `-XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_putCharStringU`? I believe the intrinsic is basically a handcrafted merge optimization, and might very well be obsolete with the MergeStores optimization in place. - PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2162893168
Re: RFR: 8333893: Optimization for StringBuilder append boolean & null
On Mon, 10 Jun 2024 12:12:58 GMT, Shaojin Wen wrote: > After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into > primitive arrays by combining values into larger stores. > > This PR rewrites the code of appendNull and append(boolean) methods so that > these two methods can be optimized by C2. Whether increments are handled or not, I think the `StringUTF16.putChar` intrinsic might be getting in the way here and prevent merging consecutive "char" stores into 4 or 8 byte stores. It would be interesting to get numbers for the `putChar` version with the intrinsic disabled (`-XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_putCharStringU`). - PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2159509806
Re: RFR: 8333854: IllegalAccessError with proxies after JDK-8332457 [v2]
On Mon, 10 Jun 2024 04:08:41 GMT, Chen Liang wrote: >> Please review this patch that fixes a critical issue that breaks some Proxy >> usages. >> >> CONSTANT_Class and CONSTANT_MethodType must fail resolution for inaccessible >> package-private types per JVMS 5.4.3.1 and 5.4.3.5, yet such types can >> appear in method descriptors in the same class. The proposed way to bypass >> is to store the MethodType as its descriptor string in the bootstrap method >> arguments, and use MethodType.fromMethodDescriptorString to restore the >> arguments, which does not have this restriction and does not eagerly load >> the parameter classes. This case isn't covered by existing tests, so a new >> test has been added. > > Chen Liang has updated the pull request incrementally with one additional > commit since the last revision: > > Obtain classloader in security manager friendly code path I'm wary about the subtle semantic differences that might result from that and would be more comfortable reverting this behavior to what we did before JDK-8332457. Perhaps even backing that one out and re-examine the changes with less urgency. - PR Comment: https://git.openjdk.org/jdk/pull/19615#issuecomment-2158280708
Re: RFR: 8333854: IllegalAccessError with proxies after JDK-8332457 [v2]
On Mon, 10 Jun 2024 04:08:41 GMT, Chen Liang wrote: >> Please review this patch that fixes a critical issue that breaks some Proxy >> usages. >> >> CONSTANT_Class and CONSTANT_MethodType must fail resolution for inaccessible >> package-private types per JVMS 5.4.3.1 and 5.4.3.5, yet such types can >> appear in method descriptors in the same class. The proposed way to bypass >> is to store the MethodType as its descriptor string in the bootstrap method >> arguments, and use MethodType.fromMethodDescriptorString to restore the >> arguments, which does not have this restriction and does not eagerly load >> the parameter classes. This case isn't covered by existing tests, so a new >> test has been added. > > Chen Liang has updated the pull request incrementally with one additional > commit since the last revision: > > Obtain classloader in security manager friendly code path Original code did all the classloading in the `` - should we do that here, too? Can you explain the need to `getPlatformClassLoader` when `classEntry.getClassLoader == null`? Pre-JDK-8332457 did no such thing. I'll park #19598 until this stabilizes. - PR Comment: https://git.openjdk.org/jdk/pull/19615#issuecomment-2158189721
Re: RFR: 8333793: Improve BootstrapMethodInvoker for ConstantBootstraps and ProxyGenerator [v2]
On Fri, 7 Jun 2024 18:58:36 GMT, Claes Redestad wrote: >> This PR refactors type matching in BootstrapMethodInvoker and adds a few >> types, seeking to improve bootstrap overheads of some ConstantBootstraps and >> in particular the ProxyGenerator condys generated for e.g. annotation >> proxies since [JDK-8332457](https://bugs.openjdk.org/browse/JDK-8332457) >> >> I've adjusted the micro-benchmark added by JDK-8332457 to not only generate >> a proxy but also call into one of the proxied methodt (`Object::hashCode`). >> >> Running org.openjdk.bench.java.lang.reflect.ProxyGenBench as a one-off >> startup benchmark sees significant improvement (-9% instructions, -6% >> cycles): >> >> Name Cnt Base ErrorTest >> Error Unit Change >> Perfstartup-JMH 20154,000 ±8,165 148,000 ± >> 23,164ms/op 1,04x (p = 0,352 ) >> :.cycles925335973,200 ± 47147600,262 842221278,800 ± >> 46836254,964 cycles 0,91x (p = 0,000*) >> :.instructions 2101588857,600 ± 81105850,361 1966307798,400 ± >> 22011043,908 instructions 0,94x (p = 0,000*) >> :.taskclock 291,500 ± 16,494 262,000 ± >> 15,328 ms 0,90x (p = 0,000*) >> * = significant >> >> Number of classes loaded drops from 1096 to 1092 >> >> Running the micro regularly shows no significant difference: >> >> Name Cnt Base Error Test Error Unit >> Change >> ProxyGenBench.generateAndProxy100 10 26,827 ± 8,954 26,855 ± 7,531 ms/op >> 1,00x (p = 0,991 ) >> * = significant > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Adress review comments, add ConstantBootstraps#invoke to list of recognized > type signatures Thanks, I'll see if I can get that working and if it has any effect as a follow-up. Testing caught a bug introduced by a copy-paste error in 532966a, re-running a few more tests now. - PR Comment: https://git.openjdk.org/jdk/pull/19598#issuecomment-2157922124
Re: RFR: 8333793: Improve BootstrapMethodInvoker for ConstantBootstraps and ProxyGenerator [v3]
> This PR refactors type matching in BootstrapMethodInvoker and adds a few > types, seeking to improve bootstrap overheads of some ConstantBootstraps and > in particular the ProxyGenerator condys generated for e.g. annotation proxies > since [JDK-8332457](https://bugs.openjdk.org/browse/JDK-8332457) > > I've adjusted the micro-benchmark added by JDK-8332457 to not only generate a > proxy but also call into one of the proxied methodt (`Object::hashCode`). > > Running org.openjdk.bench.java.lang.reflect.ProxyGenBench as a one-off > startup benchmark sees significant improvement (-9% instructions, -6% cycles): > > Name Cnt Base ErrorTest > Error Unit Change > Perfstartup-JMH 20154,000 ±8,165 148,000 ± > 23,164ms/op 1,04x (p = 0,352 ) > :.cycles925335973,200 ± 47147600,262 842221278,800 ± > 46836254,964 cycles 0,91x (p = 0,000*) > :.instructions 2101588857,600 ± 81105850,361 1966307798,400 ± > 22011043,908 instructions 0,94x (p = 0,000*) > :.taskclock 291,500 ± 16,494 262,000 ± > 15,328 ms 0,90x (p = 0,000*) > * = significant > > Number of classes loaded drops from 1096 to 1092 > > Running the micro regularly shows no significant difference: > > Name Cnt Base ErrorTest Error Unit > Change > ProxyGenBench.generateAndProxy100 10 26,827 ± 8,954 26,855 ± 7,531 ms/op > 1,00x (p = 0,991 ) > * = significant Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Copy-paste error; CONDY_INVOKE returns Object. Fixes TestDynamicConstant.java - Changes: - all: https://git.openjdk.org/jdk/pull/19598/files - new: https://git.openjdk.org/jdk/pull/19598/files/532966a4..2d64ae1c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=19598=02 - incr: https://webrevs.openjdk.org/?repo=jdk=19598=01-02 Stats: 5 lines in 1 file changed: 0 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19598.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19598/head:pull/19598 PR: https://git.openjdk.org/jdk/pull/19598
Integrated: 8333824: Unused ClassValue in VarHandles
On Mon, 10 Jun 2024 08:30:59 GMT, Claes Redestad wrote: > Trivially removing the a leftover, unused `ClassValue` from > `java.lang.invoke.VarHandle` This pull request has now been integrated. Changeset: 7b43a8cd Author: Claes Redestad URL: https://git.openjdk.org/jdk/commit/7b43a8cd7c663facbe490f889838d7ead0eba0f9 Stats: 10 lines in 1 file changed: 0 ins; 10 del; 0 mod 8333824: Unused ClassValue in VarHandles Reviewed-by: mcimadamore - PR: https://git.openjdk.org/jdk/pull/19620
Re: RFR: 8333824: Unused ClassValue in VarHandles
On Mon, 10 Jun 2024 08:57:23 GMT, Maurizio Cimadamore wrote: > Good catch! Thanks! - PR Comment: https://git.openjdk.org/jdk/pull/19620#issuecomment-2157910277
RFR: 8333824: Unused ClassValue in VarHandles
Trivially removing the a leftover, unused `ClassValue` from `java.lang.invoke.VarHandle` - Commit messages: - 8333824: Unused ClassValue in VarHandles Changes: https://git.openjdk.org/jdk/pull/19620/files Webrev: https://webrevs.openjdk.org/?repo=jdk=19620=00 Issue: https://bugs.openjdk.org/browse/JDK-8333824 Stats: 10 lines in 1 file changed: 0 ins; 10 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19620.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19620/head:pull/19620 PR: https://git.openjdk.org/jdk/pull/19620
Re: RFR: 8333833: Remove the use of ByteArrayLittleEndian from UUID::toString [v7]
On Mon, 10 Jun 2024 07:32:26 GMT, Shaojin Wen wrote: >> After PR #16245, C2 optimizes stores into primitive arrays by combining >> values into larger stores. In the UUID.toString method, >> ByteArrayLittleEndian can be removed, making the code more elegant and >> faster. > > Shaojin Wen has updated the pull request incrementally with one additional > commit since the last revision: > > Update src/java.base/share/classes/jdk/internal/util/HexDigits.java > > Co-authored-by: Claes Redestad Great, go ahead and /integrate again and I'll sponsor. - PR Comment: https://git.openjdk.org/jdk/pull/19610#issuecomment-2157627942
Re: RFR: 8333793: Improve BootstrapMethodInvoker for ConstantBootstraps and ProxyGenerator [v2]
On Fri, 7 Jun 2024 18:58:36 GMT, Claes Redestad wrote: >> This PR refactors type matching in BootstrapMethodInvoker and adds a few >> types, seeking to improve bootstrap overheads of some ConstantBootstraps and >> in particular the ProxyGenerator condys generated for e.g. annotation >> proxies since [JDK-8332457](https://bugs.openjdk.org/browse/JDK-8332457) >> >> I've adjusted the micro-benchmark added by JDK-8332457 to not only generate >> a proxy but also call into one of the proxied methodt (`Object::hashCode`). >> >> Running org.openjdk.bench.java.lang.reflect.ProxyGenBench as a one-off >> startup benchmark sees significant improvement (-9% instructions, -6% >> cycles): >> >> Name Cnt Base ErrorTest >> Error Unit Change >> Perfstartup-JMH 20154,000 ±8,165 148,000 ± >> 23,164ms/op 1,04x (p = 0,352 ) >> :.cycles925335973,200 ± 47147600,262 842221278,800 ± >> 46836254,964 cycles 0,91x (p = 0,000*) >> :.instructions 2101588857,600 ± 81105850,361 1966307798,400 ± >> 22011043,908 instructions 0,94x (p = 0,000*) >> :.taskclock 291,500 ± 16,494 262,000 ± >> 15,328 ms 0,90x (p = 0,000*) >> * = significant >> >> Number of classes loaded drops from 1096 to 1092 >> >> Running the micro regularly shows no significant difference: >> >> Name Cnt Base Error Test Error Unit >> Change >> ProxyGenBench.generateAndProxy100 10 26,827 ± 8,954 26,855 ± 7,531 ms/op >> 1,00x (p = 0,991 ) >> * = significant > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Adress review comments, add ConstantBootstraps#invoke to list of recognized > type signatures Sure. If you can show how you'd do that I'd be happy to test it. Maybe there are excess native linkers I'm not seeing from using `invokeExact`, but AFAICT there are no linkers being spun when the type match is made exact via static casts (which is what these exact matches in BMI achieves). So I remain a bit confused about what we would be able to improve. Thanks for reviewing! - PR Comment: https://git.openjdk.org/jdk/pull/19598#issuecomment-2157595699
Re: RFR: 8333833: UUID toString removes the use of ByteArrayLittleEndian [v7]
On Mon, 10 Jun 2024 07:32:26 GMT, Shaojin Wen wrote: >> After PR #16245, C2 optimizes stores into primitive arrays by combining >> values into larger stores. In the UUID.toString method, >> ByteArrayLittleEndian can be removed, making the code more elegant and >> faster. > > Shaojin Wen has updated the pull request incrementally with one additional > commit since the last revision: > > Update src/java.base/share/classes/jdk/internal/util/HexDigits.java > > Co-authored-by: Claes Redestad Thanks. I hate to nitpick, but is it OK if I rename the RFE as "Remove the use of ByteArrayLittleEndian from UUID::toString" (the PR need to follow suit). I think the current name might be read as doing something completely different. - PR Comment: https://git.openjdk.org/jdk/pull/19610#issuecomment-2157580938
Re: RFR: 8333833: UUID toString removes the use of ByteArrayLittleEndian [v6]
On Sun, 9 Jun 2024 22:45:26 GMT, Shaojin Wen wrote: >> After PR #16245, C2 optimizes stores into primitive arrays by combining >> values into larger stores. In the UUID.toString method, >> ByteArrayLittleEndian can be removed, making the code more elegant and >> faster. > > Shaojin Wen has updated the pull request incrementally with one additional > commit since the last revision: > > better parameter names src/java.base/share/classes/jdk/internal/util/HexDigits.java line 118: > 116: /** > 117: * Insert the unsigned 2-byte integer into the buffer as 4 > hexadecimal digit ASCII bytes, > 118: * only least significant 16 bits of {@code i} are used. Suggestion: * only least significant 16 bits of {@code value} are used. - PR Review Comment: https://git.openjdk.org/jdk/pull/19610#discussion_r1632711983
Re: RFR: 8333833: UUID toString removes the use of ByteArrayLittleEndian [v5]
On Sun, 9 Jun 2024 07:22:39 GMT, Shaojin Wen wrote: >> After PR #16245, C2 optimizes stores into primitive arrays by combining >> values into larger stores. In the UUID.toString method, >> ByteArrayLittleEndian can be removed, making the code more elegant and >> faster. > > Shaojin Wen has updated the pull request incrementally with two additional > commits since the last revision: > > - rename putHex4 to put4 > - update copy right year Glad to see #16245 in action, enabling simpler code with equal or better performance. src/java.base/share/classes/jdk/internal/util/HexDigits.java line 123: > 121: * @param i to convert > 122: */ > 123: public static void put4(byte[] buffer, int off, int i) { Perhaps `index` and `value` are better names than `off` and `i`, respectively. - Marked as reviewed by redestad (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19610#pullrequestreview-2106355754 PR Review Comment: https://git.openjdk.org/jdk/pull/19610#discussion_r1632380045
Re: RFR: 8333793: Improve BootstrapMethodInvoker for ConstantBootstraps and ProxyGenerator [v2]
On Fri, 7 Jun 2024 19:22:43 GMT, Chen Liang wrote: >> It's not the intent of this PR to exhaustively enumerate all methods in >> `ConstantBootstraps`, primarily those shown to be bootstrap sensitive in >> some app. I've so far never seen a use of `primitiveClass` (and I admit >> being ignorant as to why this even exists as a BSM) so I don't have any >> reason to believe special-casing it will carry its own weight. > > For its existence, I think it was the same as null, that they weren't > representable by cp entries for bootstrap method args. They are now backed by > PrimitiveClassDescImpl, a type of condy, and ClassFile API currently writes > loadConstant(primitiveCd) as a condy. I am sure there's a perfectly good explanation why we've done it like that rather than translating things like javac would (AFAICT a load from `Integer/Short/Byte/...TYPE`). - PR Review Comment: https://git.openjdk.org/jdk/pull/19598#discussion_r1632041009
Re: RFR: 8333793: Improve BootstrapMethodInvoker for ConstantBootstraps and ProxyGenerator
On Fri, 7 Jun 2024 13:46:22 GMT, Jorn Vernee wrote: > As an aside: I suppose we pre-generate all the linkers that get spun up for > these `invokeExact`calls already. But, we might still get a little bit of > mileage out of switching to `invokeBasic`, which bypasses the linkers > altogether. With the caveat that we have to be very careful that the call > site type is correct. I'm not sure I see what linkers you're alluding to here in profiles (and I don't think we have code pre-generating all of them), and I don't see how to use `invokeBasic` here instead. Care to elaborate? - PR Comment: https://git.openjdk.org/jdk/pull/19598#issuecomment-2155357245
Re: RFR: 8333793: Improve BootstrapMethodInvoker for ConstantBootstraps and ProxyGenerator [v2]
> This PR refactors type matching in BootstrapMethodInvoker and adds a few > types, seeking to improve bootstrap overheads of some ConstantBootstraps and > in particular the ProxyGenerator condys generated for e.g. annotation proxies > since [JDK-8332457](https://bugs.openjdk.org/browse/JDK-8332457) > > I've adjusted the micro-benchmark added by JDK-8332457 to not only generate a > proxy but also call into one of the proxied methodt (`Object::hashCode`). > > Running org.openjdk.bench.java.lang.reflect.ProxyGenBench as a one-off > startup benchmark sees significant improvement (-9% instructions, -6% cycles): > > Name Cnt Base ErrorTest > Error Unit Change > Perfstartup-JMH 20154,000 ±8,165 148,000 ± > 23,164ms/op 1,04x (p = 0,352 ) > :.cycles925335973,200 ± 47147600,262 842221278,800 ± > 46836254,964 cycles 0,91x (p = 0,000*) > :.instructions 2101588857,600 ± 81105850,361 1966307798,400 ± > 22011043,908 instructions 0,94x (p = 0,000*) > :.taskclock 291,500 ± 16,494 262,000 ± > 15,328 ms 0,90x (p = 0,000*) > * = significant > > Number of classes loaded drops from 1096 to 1092 > > Running the micro regularly shows no significant difference: > > Name Cnt Base ErrorTest Error Unit > Change > ProxyGenBench.generateAndProxy100 10 26,827 ± 8,954 26,855 ± 7,531 ms/op > 1,00x (p = 0,991 ) > * = significant Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Adress review comments, add ConstantBootstraps#invoke to list of recognized type signatures - Changes: - all: https://git.openjdk.org/jdk/pull/19598/files - new: https://git.openjdk.org/jdk/pull/19598/files/21f945b5..532966a4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=19598=01 - incr: https://webrevs.openjdk.org/?repo=jdk=19598=00-01 Stats: 29 lines in 1 file changed: 17 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/19598.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19598/head:pull/19598 PR: https://git.openjdk.org/jdk/pull/19598
Re: RFR: 8333793: Improve BootstrapMethodInvoker for ConstantBootstraps and ProxyGenerator
On Fri, 7 Jun 2024 18:38:52 GMT, ExE Boss wrote: >> src/java.base/share/classes/java/lang/invoke/BootstrapMethodInvoker.java >> line 289: >> >>> 287: * bootstrap method. >>> 288: */ >>> 289: private static final MethodType CONDY_GET_STATIC_FINAL_MT = >>> MethodType.methodType(Object.class, >> >> This type should just be called `CONDY_NO_ARG` as it's the most common form >> of condy that doesn't take any arg (`nullConstant` `primitiveClass` etc.) > > Note that [`ConstantBootstraps::primitiveClass(Lookup, String, Class)`] has > a static return type of `Class<?>`, so the following is also needed: > > /** > * Exact type used of the {@link ConstantBootstraps#primitiveClass(Lookup, > String, Class)} > * bootstrap method. > */ > private static final MethodType CONDY_CLASS_NO_ARG_MT > = CONDY_GET_STATIC_FINAL_MT.changeReturnType(Class.class); > > > [`ConstantBootstraps::primitiveClass(Lookup, String, Class)`]: > https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/invoke/ConstantBootstraps.html#primitiveClass(java.lang.invoke.MethodHandles.Lookup,java.lang.String,java.lang.Class) It's not the intent of this PR to exhaustively enumerate all methods in `ConstantBootstraps`, primarily those shown to be bootstrap sensitive in some app. I've so far never seen a use of `primitiveClass` (and I admit being ignorant as to why this even exists as a BSM) so I don't have any reason to believe special-casing it will carry its own weight. - PR Review Comment: https://git.openjdk.org/jdk/pull/19598#discussion_r1631578674
Re: RFR: 8333774: Avoid eagerly loading various EmptySpliterator classes [v3]
On Fri, 7 Jun 2024 13:08:24 GMT, Claes Redestad wrote: >> Trivially move a few private static finals to their respective source type >> to avoid eagerly loading a few small classes. > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Update src/java.base/share/classes/java/util/Spliterators.java > > Co-authored-by: Chen Liang Thanks! Yes, the best holder for a singleton is typically the class itself - PR Comment: https://git.openjdk.org/jdk/pull/19591#issuecomment-2155003099
Integrated: 8333774: Avoid eagerly loading various EmptySpliterator classes
On Fri, 7 Jun 2024 08:21:58 GMT, Claes Redestad wrote: > Trivially move a few private static finals to their respective source type to > avoid eagerly loading a few small classes. This pull request has now been integrated. Changeset: d744059b Author: Claes Redestad URL: https://git.openjdk.org/jdk/commit/d744059b5b3e944bee53536de6f404666e45e8e5 Stats: 33 lines in 1 file changed: 12 ins; 12 del; 9 mod 8333774: Avoid eagerly loading various EmptySpliterator classes Reviewed-by: liach, pminborg - PR: https://git.openjdk.org/jdk/pull/19591
Re: RFR: 8333749: Consolidate ConstantDesc conversion in java.base [v6]
On Fri, 7 Jun 2024 13:56:24 GMT, Chen Liang wrote: >> In java.base, especially in bytecode generators, we have many different >> methods converting known valid Class and MethodType into ClassDesc and >> MethodTypeDesc. These conversions should be consolidated into the same >> utility method for the ease of future maintenance. > > Chen Liang has updated the pull request incrementally with one additional > commit since the last revision: > > ExE-Boss review, go through validated path in Class Good incremental improvements. - Marked as reviewed by redestad (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19585#pullrequestreview-2104694382
Re: RFR: 8333774: Avoid eagerly loading various EmptySpliterator classes [v3]
> Trivially move a few private static finals to their respective source type to > avoid eagerly loading a few small classes. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Update src/java.base/share/classes/java/util/Spliterators.java Co-authored-by: Chen Liang - Changes: - all: https://git.openjdk.org/jdk/pull/19591/files - new: https://git.openjdk.org/jdk/pull/19591/files/9d19bbe1..10611afd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=19591=02 - incr: https://webrevs.openjdk.org/?repo=jdk=19591=01-02 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19591.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19591/head:pull/19591 PR: https://git.openjdk.org/jdk/pull/19591
Re: RFR: 8333774: Avoid eagerly loading various EmptySpliterator classes [v2]
> Trivially move a few private static finals to their respective source type to > avoid eagerly loading a few small classes. Claes Redestad has updated the pull request incrementally with two additional commits since the last revision: - revert typo - Copyright - Changes: - all: https://git.openjdk.org/jdk/pull/19591/files - new: https://git.openjdk.org/jdk/pull/19591/files/5eb2ddd8..9d19bbe1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=19591=01 - incr: https://webrevs.openjdk.org/?repo=jdk=19591=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19591.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19591/head:pull/19591 PR: https://git.openjdk.org/jdk/pull/19591
RFR: 8333793: Improve BootstrapMethodInvoker for ConstantBootstraps and ProxyGenerator
This PR refactors type matching in BootstrapMethodInvoker and adds a few types, seeking to improve bootstrap overheads of some ConstantBootstraps and in particular the ProxyGenerator condys generated for e.g. annotation proxies since [JDK-8332457](https://bugs.openjdk.org/browse/JDK-8332457) I've adjusted the micro-benchmark added by JDK-8332457 to not only generate a proxy but also call into one of the proxied methodt (`Object::hashCode`). Running org.openjdk.bench.java.lang.reflect.ProxyGenBench as a one-off startup benchmark sees significant improvement (-9% instructions, -6% cycles): Name Cnt Base ErrorTest Error Unit Change Perfstartup-JMH 20154,000 ±8,165 148,000 ± 23,164ms/op 1,04x (p = 0,352 ) :.cycles925335973,200 ± 47147600,262 842221278,800 ± 46836254,964 cycles 0,91x (p = 0,000*) :.instructions 2101588857,600 ± 81105850,361 1966307798,400 ± 22011043,908 instructions 0,94x (p = 0,000*) :.taskclock 291,500 ± 16,494 262,000 ± 15,328 ms 0,90x (p = 0,000*) * = significant Number of classes loaded drops from 1096 to 1092 Running the micro regularly shows no significant difference: Name Cnt Base ErrorTest Error Unit Change ProxyGenBench.generateAndProxy100 10 26,827 ± 8,954 26,855 ± 7,531 ms/op 1,00x (p = 0,991 ) * = significant - Commit messages: - Rename microbenchmark - Ensure proxying for hashCode etc is 'implemented' - Ensure proxying for hashCode etc is 'implemented' - Change ProxyGenBench to provoke proxy condy - Improve BootstrapMethodInvoker for ConstantBootstraps and ProxyGenerator Changes: https://git.openjdk.org/jdk/pull/19598/files Webrev: https://webrevs.openjdk.org/?repo=jdk=19598=00 Issue: https://bugs.openjdk.org/browse/JDK-8333793 Stats: 89 lines in 2 files changed: 35 ins; 24 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/19598.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19598/head:pull/19598 PR: https://git.openjdk.org/jdk/pull/19598
Re: RFR: 8332249: Micro-optimize Method.hashCode [v2]
On Wed, 5 Jun 2024 13:45:46 GMT, Per Minborg wrote: >> As a note, If we would have access to the contemplated `StableValue` and >> `hash` was declared even as an _instance variable_ `StableValue` in >> a regular class, then it would be trusted and would be eligible for constant >> folding due to the VM will have special rules for fields of StableValue >> type. > > Ahh. There was a benchmark in the initial message. Sorry. We think this is caused by the membars being conservatively emitted at method exit, and more precis membar control should help ARM. Filed: https://bugs.openjdk.org/browse/JDK-8333791 - PR Review Comment: https://git.openjdk.org/jdk/pull/19433#discussion_r1631071572
Re: RFR: 8332457: Examine startup overheads from JDK-8294961 [v19]
On Wed, 5 Jun 2024 12:39:13 GMT, Chen Liang wrote: >> Adam Sotona has updated the pull request incrementally with one additional >> commit since the last revision: >> >> An assortment of potential improvements >> >> Co-authored-by: Claes Redestad > > src/java.base/share/classes/java/lang/constant/ConstantDescs.java line 356: > >> 354: ClassDesc[] fullParamTypes = new ClassDesc[paramTypes.length + >> prefixLen]; >> 355: System.arraycopy(INDY_BOOTSTRAP_ARGS, 0, fullParamTypes, 0, >> prefixLen); >> 356: System.arraycopy(paramTypes, 0, fullParamTypes, prefixLen, >> paramTypes.length); > > I'm thinking about creating a basic MethodTypeDesc like `INDY_BOOTSTRAP_TYPE > = MethodTypeDesc.ofValidated(CD_Object, INDY_BOOTSTRAP_ARGS)`, and we derive > bsm with > > INDY_BOOTSTRAP_TYPE > .insertParameterTypes(INDY_BOOTSTRAP_TYPE.parameterCount(), paramTypes) > .changeReturnType(returnType) > > which creates two MTD wrappers but they share the same parameter array That'd arguably be a bit cleaner. Might allocate an extra MTD (unless/until EA kicks in), but that might not matter really. - PR Review Comment: https://git.openjdk.org/jdk/pull/19410#discussion_r1630868455
RFR: 8333774: Avoid eagerly loading various EmptySpliterator classes
Trivially move a few private static finals to their respective source type to avoid eagerly loading a few small classes. - Commit messages: - 8333774: Avoid eagerly loading various EmptySpliterator classes Changes: https://git.openjdk.org/jdk/pull/19591/files Webrev: https://webrevs.openjdk.org/?repo=jdk=19591=00 Issue: https://bugs.openjdk.org/browse/JDK-8333774 Stats: 31 lines in 1 file changed: 12 ins; 11 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/19591.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19591/head:pull/19591 PR: https://git.openjdk.org/jdk/pull/19591
Re: RFR: 8333749: Consolidate ConstantDesc conversion in java.base [v3]
On Thu, 6 Jun 2024 19:12:35 GMT, Chen Liang wrote: >> src/java.base/share/classes/java/lang/runtime/SwitchBootstraps.java line 469: >> >>> 467: // o instanceof Wrapped(float) >>> 468: cb.aload(SELECTOR_OBJ); >>> 469: >>> cb.instanceOf(classDesc(Wrapper.forBasicType(classLabel) >> >> I have a patch somewhere to cache the wrapper class desc in >> `sun.invoke.util.Wrapper`, both as a micro-optimization and to help >> disambigutate the unfortunately named (my bad) `Wrapper::classDescriptor` >> method. Maybe we can roll that into this..? > > Feel free, or you can get your patch merged first and then I base off yours > if yours is ready. Created a PR you could fold into this, if you'd like: https://github.com/liachmodded/jdk/pull/1 - PR Review Comment: https://git.openjdk.org/jdk/pull/19585#discussion_r1630816783
Re: RFR: 8333749: Consolidate ConstantDesc conversion in java.base
On Thu, 6 Jun 2024 18:35:01 GMT, Chen Liang wrote: > In java.base, especially in bytecode generators, we have many different > methods converting known valid Class and MethodType into ClassDesc and > MethodTypeDesc. These conversions should be consolidated into the same > utility method for the ease of future maintenance. Looks good to me. Probably should get someone to OK changes in foreign/abi (@JornVernee perhaps?). There are some pre-existing places where we call `ReferenceClassDescImpl.ofValidated` directly that could probably be switched over to `ConstantUtils.classDesc` for slightly nicer code. Or - if it matters - add a `referenceClassDesc` which avoids the `type.isPrimitive` test. src/java.base/share/classes/java/lang/invoke/MethodHandleProxies.java line 256: > 254: // the field name holding the method handle for this method > 255: String fieldName = "m" + count++; > 256: var mt = methodDesc(m.getReturnType(), > JLRA.getExecutableSharedParameterTypes(m)); Suggestion: var md = methodDesc(m.getReturnType(), JLRA.getExecutableSharedParameterTypes(m)); src/java.base/share/classes/java/lang/runtime/SwitchBootstraps.java line 469: > 467: // o instanceof Wrapped(float) > 468: cb.aload(SELECTOR_OBJ); > 469: > cb.instanceOf(classDesc(Wrapper.forBasicType(classLabel) I have a patch somewhere to cache the wrapper class desc in `sun.invoke.util.Wrapper`, both as a micro-optimization and to help disambigutate the unfortunately named (my bad) `Wrapper::classDescriptor` method. Maybe we can roll that into this..? src/java.base/share/classes/jdk/internal/foreign/abi/BindingSpecializer.java line 471: > 469: case Allocate allocate -> > emitAllocBuffer(allocate); > 470: case BoxAddress boxAddress -> > emitBoxAddress(boxAddress); > 471: case SegmentBase _ -> emitSegmentBase(); Seem a bit too far detached from the changes at hand for a drive-by code cleanup? - Marked as reviewed by redestad (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19585#pullrequestreview-2102953208 PR Review Comment: https://git.openjdk.org/jdk/pull/19585#discussion_r1630049028 PR Review Comment: https://git.openjdk.org/jdk/pull/19585#discussion_r1630061889 PR Review Comment: https://git.openjdk.org/jdk/pull/19585#discussion_r1630074187
Re: RFR: 8332457: Examine startup overheads from JDK-8294961 [v19]
On Wed, 5 Jun 2024 12:00:25 GMT, Adam Sotona wrote: >> [JDK-8294961](https://bugs.openjdk.org/browse/JDK-8294961) changed to use >> classfile API for reflection proxy-generation. Actual implementation of >> `ProxyGenerator` is focused on performance, however it causes JDK bootstrap >> regressions. `ProxyGenerator.TEMPLATE` class model is statically created and >> each proxy class is transformed from the template. >> >> This patch is intended to examine plain proxy generation impact on >> performance and JDK bootstrap (vs proxy transformation from template). >> >> The generated proxy is migrated from static initialization to CONDY >> bootstrap. >> >> Please review. >> >> Thank you, >> Adam > > Adam Sotona has updated the pull request incrementally with one additional > commit since the last revision: > > An assortment of potential improvements > > Co-authored-by: Claes Redestad Looks good. Generation of proxy classes gets a nice boost this way. The condy bsm calls that may happen later takes a small hit which we can improve in a follow-up. - Marked as reviewed by redestad (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19410#pullrequestreview-2098990024
Re: RFR: 8332457: Examine startup overheads from JDK-8294961 [v14]
On Mon, 3 Jun 2024 11:09:31 GMT, Adam Sotona wrote: >> [JDK-8294961](https://bugs.openjdk.org/browse/JDK-8294961) changed to use >> classfile API for reflection proxy-generation. Actual implementation of >> `ProxyGenerator` is focused on performance, however it causes JDK bootstrap >> regressions. `ProxyGenerator.TEMPLATE` class model is statically created and >> each proxy class is transformed from the template. >> >> This patch is intended to examine plain proxy generation impact on >> performance and JDK bootstrap (vs proxy transformation from template). >> >> The generated proxy is migrated from static initialization to CONDY >> bootstrap. >> >> Please review. >> >> Thank you, >> Adam > > Adam Sotona has updated the pull request incrementally with one additional > commit since the last revision: > > ProxyGenBench simplification test/micro/org/openjdk/bench/java/lang/reflect/Proxy/ProxyGenBench.java line 23: > 21: * questions. > 22: */ > 23: package org.openjdk.bench.java.lang.reflect.Proxy; Package name needs to be lowercase. Not sure why the folder name is uppercase Proxy, but the two pre-existing benchmarks both have lower case package declarations. Uppercase letters in package names may subtly break a few tools - PR Review Comment: https://git.openjdk.org/jdk/pull/19410#discussion_r1624270828
Re: RFR: 8332457: Examine startup overheads from JDK-8294961 [v13]
On Mon, 3 Jun 2024 10:11:34 GMT, Adam Sotona wrote: >> [JDK-8294961](https://bugs.openjdk.org/browse/JDK-8294961) changed to use >> classfile API for reflection proxy-generation. Actual implementation of >> `ProxyGenerator` is focused on performance, however it causes JDK bootstrap >> regressions. `ProxyGenerator.TEMPLATE` class model is statically created and >> each proxy class is transformed from the template. >> >> This patch is intended to examine plain proxy generation impact on >> performance and JDK bootstrap (vs proxy transformation from template). >> >> The generated proxy is migrated from static initialization to CONDY >> bootstrap. >> >> Please review. >> >> Thank you, >> Adam > > Adam Sotona has updated the pull request incrementally with one additional > commit since the last revision: > > added ProxyGenBench JMH micro benchmark test/micro/org/openjdk/bench/java/lang/reflect/Proxy/ProxyGenBench.java line 66: > 64: ClassDesc tempDesc = > ClassDesc.ofDescriptor(Interfaze.class.descriptorString()); > 65: loader = new ClsLoader(); > 66: clsMap = new HashMap<>(100); Suggestion: clsMap = HashMap.newHashMap(100); - PR Review Comment: https://git.openjdk.org/jdk/pull/19410#discussion_r1624192732
Re: RFR: 8333265: De-duplicate method references in java.util.stream.FindOps
On Thu, 30 May 2024 12:50:36 GMT, Claes Redestad wrote: > Extracting duplicate method references to static field reduce proxy class > spinning and loading. In this case 2 less classes loaded when using > `findAny()` on each type of stream. Vicente filed a bug on javac to investigate this: https://bugs.openjdk.org/browse/JDK-8333278 I wouldn't mind using condy for constant aka non-capturing lambdas. I recall we had a prototype for that years ago, but switching over was shelved for some reason. - PR Comment: https://git.openjdk.org/jdk/pull/19477#issuecomment-2139703977
Re: RFR: 8333265: De-duplicate method references in java.util.stream.FindOps
On Thu, 30 May 2024 13:04:46 GMT, Chen Liang wrote: > Should we extract them to private static utility methods for potential > laziness support in the future? Ideally we shouldn't have to do any of this. I thought such method references were already de-duplicated in javac - at least within the same compilation units - but I saw this in a startup profile and was surprised myself that the demonstrated manual de-duplication has an observable effect. - PR Comment: https://git.openjdk.org/jdk/pull/19477#issuecomment-2139576911
RFR: 8333265: De-duplicate method references in java.util.stream.FindOps
Extracting duplicate method references to static field reduce proxy class spinning and loading. In this case 2 less classes loaded when using `findAny()` on each type of stream. - Commit messages: - De-duplicate identical lambdas in FindOps Changes: https://git.openjdk.org/jdk/pull/19477/files Webrev: https://webrevs.openjdk.org/?repo=jdk=19477=00 Issue: https://bugs.openjdk.org/browse/JDK-8333265 Stats: 35 lines in 2 files changed: 17 ins; 7 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/19477.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19477/head:pull/19477 PR: https://git.openjdk.org/jdk/pull/19477
Re: RFR: 8333103: Re-examine the console provider loading
On Thu, 30 May 2024 10:02:25 GMT, Alan Bateman wrote: > Would it be possible to provide more context/background here? This is not > code that is used during startup. Is there benchmark data to share for first > use of java.io.IO ? I think this was brought to the fore by https://openjdk.org/jeps/477 where running the example implicit main: void main() { println("Hello, World!"); } .. brings in this code. This PR is one of several to try and smoothen a few rough edges that makes the startup of that quite a bit heavier than the baseline non-implicit Hello World. - PR Comment: https://git.openjdk.org/jdk/pull/19467#issuecomment-2139218517
Re: RFR: 8332826: Make hashCode methods in ArraysSupport friendlier [v2]
On Wed, 29 May 2024 09:18:51 GMT, Pavel Rappo wrote: >> The non-constant test was added because that very bailout caused a crash. >> The other test is actually less interesting since it'll likely be covered >> indirectly by regular use. But as we are hiding these away this gets ever >> more obscure and perhaps the test could be dropped entirely. > > @cl4es, do you want me to delete that test file altogether? I thought you verified that the non-constant type test still provoke a crash (on x86) if you back out the code changes in https://github.com/openjdk/jdk/commit/969f6a37e4649079c7acea1952f5537fd9ba2f0a ? If so that test is still somewhat useful to guard against future coding mistakes by verifying that the bail out doesn't mess things up. The constant type tests have less utility, perhaps. I'd keep it as is unless there's a strong desire to reduce test runtime (these should be pretty quick). - PR Review Comment: https://git.openjdk.org/jdk/pull/19414#discussion_r1620259115
Re: RFR: 8333103: Re-examine the console provider loading
On Wed, 29 May 2024 22:25:09 GMT, Naoto Sato wrote: >> Yes, `break` guarantees that the search completes one way or another once >> the module name has been matched. This is not how it used to be done. > > Right. Since `findAny` is after the module name matching, there is at most 1 > match. In the case we didn't find any, the final `orElse(null)` eventually > returns null. So the refactored code is semantically the same. It's only semantically the same if we assume a module can only provide a single `JdkConsoleProvider`, no? The `break;` disallows multiple providers (for disjoint sets of charsets) in a single module. - PR Review Comment: https://git.openjdk.org/jdk/pull/19467#discussion_r1620164006
Re: RFR: 8333103: Re-examine the console provider loading
On Wed, 29 May 2024 19:51:36 GMT, Naoto Sato wrote: > There is an initialization code in `Console` class that searches for the > Console implementations. Refactoring the init code not to use lambda/stream > would reduce the (initial) number of loaded classes by about 100 for > java.base implementations. This would become relevant when the java.io.IO > (JEP 477) uses Console as the underlying framework. Marked as reviewed by redestad (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/19467#pullrequestreview-2086537783
Re: RFR: 8332457: Examine startup overheads from JDK-8294961 [v12]
On Wed, 29 May 2024 07:17:38 GMT, Adam Sotona wrote: >> [JDK-8294961](https://bugs.openjdk.org/browse/JDK-8294961) changed to use >> classfile API for reflection proxy-generation. Actual implementation of >> `ProxyGenerator` is focused on performance, however it causes JDK bootstrap >> regressions. `ProxyGenerator.TEMPLATE` class model is statically created and >> each proxy class is transformed from the template. >> >> This patch is intended to examine plain proxy generation impact on >> performance and JDK bootstrap (vs proxy transformation from template). >> >> The generated proxy is migrated from static initialization to CONDY >> bootstrap. >> >> Please review. >> >> Thank you, >> Adam > > Adam Sotona has updated the pull request with a new target base due to a > merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains 13 additional > commits since the last revision: > > - obsolete import > - Merge branch 'master' into JDK-8332457-proxy-startup > - missing bracket > - Update src/java.base/share/classes/java/lang/reflect/ProxyGenerator.java > >Co-authored-by: liach <7806504+li...@users.noreply.github.com> > - removed obsolete entry > - MTD_ cleanup > - fixed javadoc > - CONDY implementation of ProxyGenerator > - simplification of the proxy class loading > - more improvements > - ... and 3 more: https://git.openjdk.org/jdk/compare/62348071...942342d5 The condy bootstrapping adds a little noise when running this through an affected startup benchmark. Overall the net impact is small or even negative. This is a nice cleanup, though, but perhaps we ought to keep JDK-8332457 open (or file a new bug to keep investigating overheads). - PR Comment: https://git.openjdk.org/jdk/pull/19410#issuecomment-2137310415
Re: RFR: 8332826: Make hashCode methods in ArraysSupport friendlier [v2]
On Mon, 27 May 2024 20:55:29 GMT, Pavel Rappo wrote: >> Please review this PR, which supersedes a now withdrawn >> https://github.com/openjdk/jdk/pull/14831. >> >> This PR replaces `ArraysSupport.vectorizedHashCode` with a set of more >> user-friendly methods. Here's a summary: >> >> - Made the operand constants (i.e. `T_BOOLEAN` and friends) and the >> `vectorizedHashCode` method private >> >> - Made the `vectorizedHashCode` method private, but didn't rename it. >> Renaming would dramatically increase this PR review cost, because that >> method's name is used by a lot of VM code. On a bright side, since the >> method is now private, it's no longer callable by clients of >> `ArraysSupport`, thus a problem of an inaccurate name is less severe. >> >> - Made the `ArraysSupport.utf16HashCode` method private >> >> - Moved tiny cases (i.e. 0, 1, 2) to `ArraysSupport` > > Pavel Rappo has updated the pull request incrementally with one additional > commit since the last revision: > > Fix incorrect utf16 hashCode adaptation Marked as reviewed by redestad (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/19414#pullrequestreview-2084714609
Re: RFR: 8332826: Make hashCode methods in ArraysSupport friendlier [v2]
On Wed, 29 May 2024 03:16:02 GMT, Chen Liang wrote: >> In fact, if I change it to `2`, the following tests will fail: >> >> - `jdk/jdk/classfile/Utf8EntryTest.java` >> - `jdk/java/util/zip/ZipCoding.java` >> - `jdk/java/text/Format/MessageFormat/MessageRegression.java` > > Indeed, the actual length passed at call site is `value.length >> 1` instead > of `value.length`; this adjusted char-length carries on to > `vectorizedHashCode` call. Ah, sneaky. Might affect scores for the zero and one-char cases since the shift now happens unconditionally, but probably doesn't matter in practice. - PR Review Comment: https://git.openjdk.org/jdk/pull/19414#discussion_r1618456668
Re: RFR: 8332826: Make hashCode methods in ArraysSupport friendlier [v2]
On Tue, 28 May 2024 19:13:30 GMT, Jorn Vernee wrote: >> Pavel Rappo has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Fix incorrect utf16 hashCode adaptation > > src/java.base/share/classes/jdk/internal/util/ArraysSupport.java line 275: > >> 273: return switch (length) { >> 274: case 0 -> initialValue; >> 275: case 1 -> 31 * initialValue + (a[fromIndex] & 0xff); > > For clarity, if you think it helps: > Suggestion: > > case 1 -> 31 * initialValue + Byte.toUnsignedInt(a[fromIndex]); I don't care as long as microbenchmarks don't get a hiccup. > src/java.base/share/classes/jdk/internal/util/ArraysSupport.java line 301: > >> 299: return switch (length) { >> 300: case 0 -> initialValue; >> 301: case 1 -> 31 * initialValue + JLA.getUTF16Char(a, >> fromIndex); > > There seems to be a mismatch here with the original code in StringUTF16, > where the length that is tested for is `2` instead of `1`. Yes, should be `2` (`a` is semantically a `char[]`). This typo likely pass functional testing since `1` can never happen in practice, and the default case should work for any value. There might be a String microbenchmark out there that might be slightly unhappy, though. - PR Review Comment: https://git.openjdk.org/jdk/pull/19414#discussion_r1617867797 PR Review Comment: https://git.openjdk.org/jdk/pull/19414#discussion_r1617865658
Re: RFR: 8332826: Make hashCode methods in ArraysSupport friendlier [v2]
On Tue, 28 May 2024 19:19:51 GMT, Jorn Vernee wrote: >> Pavel Rappo has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Fix incorrect utf16 hashCode adaptation > > test/hotspot/jtreg/compiler/intrinsics/TestArraysHashCode.java line 88: > >> 86: private static int testIntrinsic(byte[] bytes, int type) >> 87: throws InvocationTargetException, IllegalAccessException { >> 88: return (int) vectorizedHashCode.invoke(null, bytes, 0, 256, 1, >> type); > > Better to just call `hashCodeOfUnsigned` here I think. > > The test for the non-constant type could be dropped. That is no longer a part > of the 'API' of `ArraySupport`. It looks like the intrinsic bails out when > the basic type is not constant any ways: > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/library_call.cpp#L6401-L6404 The non-constant test was added because that very bailout caused a crash. The other test is actually less interesting since it'll likely be covered indirectly by regular use. But as we are hiding these away this gets ever more obscure and perhaps the test could be dropped entirely. - PR Review Comment: https://git.openjdk.org/jdk/pull/19414#discussion_r1617848032