Re: RFR: 8336856: Optimize String Concat [v15]

2024-07-25 Thread Claes Redestad
On Thu, 25 Jul 2024 08:58:14 GMT, Shaojin Wen  wrote:

>> The current implementation of StringConcat is to mix the coder and length 
>> into a long. This operation will have some overhead for int/long/boolean 
>> types. We can separate the calculation of the coder from the calculation of 
>> the length, which can improve the performance in the scenario of concat 
>> int/long/boolean.
>> 
>> This idea comes from the suggestion of @l4es in the discussion of PR 
>> https://github.com/openjdk/jdk/pull/20253#issuecomment-2240412866
>
> Shaojin Wen has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   add benchmark

test/micro/org/openjdk/bench/java/lang/StringConcatGenerate.java line 47:

> 45: @Measurement(iterations = 5, time = 1000, timeUnit = 
> TimeUnit.MILLISECONDS)
> 46: @Fork(value = 3, jvmArgsAppend = 
> "-Djava.lang.invoke.StringConcat.highArityThreshold=0")
> 47: public class StringConcatGenerate extends StringConcat {

Adding a subclass with an overridden `@Fork` to pass a different 
`jvmArgsAppend` is a reasonable trick, but could be moved to a nested class 
within `StringConcat` to keep it in the same context. I'm not sure if this 
micro brings a persistent added value, though: for experimentation we can just 
run `StringConcat` with different 
`-Djava.lang.invoke.StringConcat.highArityThreshold` settings, while for 
continuous regression testing we're more interested in validating the default 
settings. Supplying the new `main` method doesn't add anything here, either, 
since a standalone execution wouldn't pick up the `jvmArgsAppend` value.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20273#discussion_r1691205621


Re: RFR: 8336856: Optimize String Concat [v10]

2024-07-24 Thread Claes Redestad
On Wed, 24 Jul 2024 10:18:03 GMT, Shaojin Wen  wrote:

>> The current implementation of StringConcat is to mix the coder and length 
>> into a long. This operation will have some overhead for int/long/boolean 
>> types. We can separate the calculation of the coder from the calculation of 
>> the length, which can improve the performance in the scenario of concat 
>> int/long/boolean.
>> 
>> This idea comes from the suggestion of @l4es in the discussion of PR 
>> https://github.com/openjdk/jdk/pull/20253#issuecomment-2240412866
>
> Shaojin Wen has updated the pull request incrementally with ten additional 
> commits since the last revision:
> 
>  - minor refactor
>  - minor refactor
>  - reduce change
>  - copyright
>  - reduce change
>  - refactor based on 8335182
>  - use Float.toString & Double.toString
>  - remove ForceInline
>  - fix build error

This is looking much better!

An idea: evolve this PR to first and foremost be a replacement for the current 
`SimpleStringBuilderStrategy`. If you can prove the new strategy is always 
superior to the baseline `SimpleStringBuilderStrategy` (performance close to 
`generateMHInlineCopy` but without the problematic scaling JIT overheads for 
high-arity expressions) then we have a good, incremental improvement with 
relatively low risk.

We can then follow-up with making this better for low-arity expressions by 
implementing a sharing strategy
(need to build classes where we inject rather than embed constants and cache 
them, e.g. in a `ReferencedKeyMap`) as a follow-up. 

One issue for both this and that future PR is that the hidden classes are 
generated into `java.lang` with `Lookup.ClassOption.STRONG`. This means such 
classes effectively can never be unloaded. That's a blocker. We need to get rid 
of that `STRONG` coupling, add tests to ensure generated classes can be 
unloaded and validate that this doesn't add too much overhead. 

For startup verification I added a `StringConcatStartup` JMH in #19927. You 
could extend that with a few tests that exercise high-arity expressions.

src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 153:

> 151: @SuppressWarnings("removal")
> 152: private static boolean isGenerateInlineCopy() {
> 153: return GENERATE_INLINE_COPY && System.getSecurityManager() == 
> null;

This security manager hack is unfortunate. Likely some bootstrapping issue due 
use of the classfile API, which might be pulling in lambdas in the paths 
travelled by this patch.

src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1253:

> 1251: }
> 1252: 
> 1253: int lengthCoderSloat = paramSlotsTotalSize;

Suggestion:

int lengthCoderSlot = paramSlotsTotalSize;

-

PR Review: https://git.openjdk.org/jdk/pull/20273#pullrequestreview-2196340289
PR Review Comment: https://git.openjdk.org/jdk/pull/20273#discussion_r1689583138
PR Review Comment: https://git.openjdk.org/jdk/pull/20273#discussion_r1689578966


Re: RFR: 8336856: Optimize String Concat [v5]

2024-07-23 Thread Claes Redestad
On Tue, 23 Jul 2024 12:22:57 GMT, Shaojin Wen  wrote:

>> The current implementation of StringConcat is to mix the coder and length 
>> into a long. This operation will have some overhead for int/long/boolean 
>> types. We can separate the calculation of the coder from the calculation of 
>> the length, which can improve the performance in the scenario of concat 
>> int/long/boolean.
>> 
>> This idea comes from the suggestion of @l4es in the discussion of PR 
>> https://github.com/openjdk/jdk/pull/20253#issuecomment-2240412866
>
> Shaojin Wen has updated the pull request with a new target base due to a 
> merge or a rebase. The pull request now contains six commits:
> 
>  - Merge remote-tracking branch 'origin/optim_simple_concat_202407' into 
> optim_concat_factory_202407
>
># Conflicts:
>#  src/java.base/share/classes/java/lang/StringConcatHelper.java
>#  src/java.base/share/classes/java/lang/System.java
>#  src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java
>  - reduce change & refactor
>  - fix TRUSTED not work
>  - non-MH-based implementation
>  - non-MH-based implementation
>  - optimize string concat

While a reasonable start I'd suggest:

- adding a flag to switch over to the new implementation and leaving the 
existing strategy in place initially, then when we have something that is 
superior in every way we can clean out the old strategy
- AFAICT this mimics the `SimpleStringBuilderStrategy` and generates and 
installs a new class for each expression. This is likely fine for high-arity 
concatenations which are likely to be one-of-a-kind (the sort of expressions 
the `SimpleStringBuilderStrategy` was brought back to life for), but could 
generate a proliferation of classes for low-arity expressions. Instead we 
probably need to aim for sharing
- such shareable classes could be made simpler if we put them in java.lang so 
that they can link directly with package private methods in 
`StringConcatHelper` and `String` rather than using trusted `MethodHandles` to 
cross such boundaries

Again: I am working on and have a prototype for this which I aim to present and 
discuss in two weeks as JVMLS. I appreciate the effort here, but I'll need to 
focus on my talk and prototype for now.

-

PR Comment: https://git.openjdk.org/jdk/pull/20273#issuecomment-2245279590


Re: RFR: 8336831: Optimize StringConcatHelper.simpleConcat [v5]

2024-07-23 Thread Claes Redestad
On Tue, 23 Jul 2024 12:49:52 GMT, Shaojin Wen  wrote:

>> Yes, this isn't beholden to JLS 15.18.1, and it's already specified that 
>> `foo.concat("")` returns `foo` - so why shouldn't `"".concat(foo)` return 
>> `foo`? But still it's an observable semantic change so some care needs to be 
>> taken - possibly even a CSR is warranted. Doing `return new String(str)` to 
>> retain behavior avoids that headache for a fringe case.
>
> Adding isEmpty check is not related to this PR and should be a new PR.

Thanks. Removing this check will make the `"".append(foo)` edge case a little 
bit slower (likely inconsequential), but removes a branch and might thus even 
be an optimization in the common case.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20253#discussion_r1688032897


Re: RFR: 8336831: Optimize StringConcatHelper.simpleConcat [v7]

2024-07-23 Thread Claes Redestad
On Tue, 23 Jul 2024 12:56:07 GMT, Shaojin Wen  wrote:

>> Currently simpleConcat is implemented using mix and prepend, but in this 
>> simple scenario, it can be implemented in a simpler way and can improve 
>> performance.
>
> Shaojin Wen has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   reduce change

Marked as reviewed by redestad (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/20253#pullrequestreview-2193939149


Re: RFR: 8336831: Optimize StringConcatHelper.simpleConcat [v5]

2024-07-23 Thread Claes Redestad
On Tue, 23 Jul 2024 12:12:41 GMT, Chen Liang  wrote:

>> src/java.base/share/classes/java/lang/String.java line 2991:
>> 
>>> 2989: }
>>> 2990: if (isEmpty()) {
>>> 2991: return str;
>> 
>> This case should probably be reflected more precisely in the specification, 
>> or do `return new String(str);` to avoid changing semantics for this 
>> corner-case.
>
> The spec for concat only asked for equal representation instead of the 
> existence of a new identity. Thus this should be fine.

Yes, this isn't beholden to JLS 15.18.1, and it's already specified that 
`foo.concat("")` returns `foo` - so why shouldn't `"".concat(foo)` return 
`foo`? But still it's an observable semantic change so some care needs to be 
taken - possibly even a CSR is warranted. Doing `return new String(str)` to 
retain behavior avoids that headache for a fringe case.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20253#discussion_r1687990148


Re: RFR: 8336831: Optimize StringConcatHelper.simpleConcat [v5]

2024-07-23 Thread Claes Redestad
On Fri, 19 Jul 2024 21:42:09 GMT, Shaojin Wen  wrote:

>> Currently simpleConcat is implemented using mix and prepend, but in this 
>> simple scenario, it can be implemented in a simpler way and can improve 
>> performance.
>
> Shaojin Wen has updated the pull request incrementally with two additional 
> commits since the last revision:
> 
>  - Update src/java.base/share/classes/java/lang/String.java
>
>Co-authored-by: Chen Liang 
>  - add comments

This change seems reasonable, and we will need the `newArray(int)` method in 
future work. This patch dials back on the idea that `simpleConcat` is an 
"explainer" for what `StringConcatFactory` is doing, but as it was already 
imprecise in that regard we should favor simplicity and performance here.

src/java.base/share/classes/java/lang/String.java line 2991:

> 2989: }
> 2990: if (isEmpty()) {
> 2991: return str;

This case should probably be reflected more precisely in the specification, or 
do `return new String(str);` to avoid changing semantics for this corner-case.

-

Marked as reviewed by redestad (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20253#pullrequestreview-2193780530
PR Review Comment: https://git.openjdk.org/jdk/pull/20253#discussion_r1687937671


Re: RFR: 8335182: Consolidate and streamline String concat code shapes [v3]

2024-07-23 Thread Claes Redestad
On Tue, 9 Jul 2024 21:16:33 GMT, Claes Redestad  wrote:

>> This PR attains a speed-up on some microbenchmarks by simplifying how we 
>> build up the MH combinator tree shape
>> (only use prependers with prefix, always add a suffix to the newArray 
>> combinator), then simplifying/inlining some of the code in the helper 
>> functions. 
>> 
>> Many simple concatenation expressions stay performance neutral, while the 
>> win comes from enabling C2 to better optimize more complex shapes 
>> (concat13String, concatMix4String, concatConst6String see relatively large 
>> improvements):
>> 
>> 
>> NameCnt Base Error  Test 
>> Error  Unit  Change
>> StringConcat.concat13String  50   66.380 ±   0.18953.017 ±   
>> 0.241 ns/op   1.25x (p = 0.000*)
>> StringConcat.concat4String   50   13.620 ±   0.05312.382 ±   
>> 0.089 ns/op   1.10x (p = 0.000*)
>> StringConcat.concat6String   50   17.168 ±   0.07016.046 ±   
>> 0.064 ns/op   1.07x (p = 0.000*)
>> StringConcat.concatConst2String  509.820 ±   0.081 8.158 ±   
>> 0.041 ns/op   1.20x (p = 0.000*)
>> StringConcat.concatConst4String  50   14.938 ±   0.12412.800 ±   
>> 0.049 ns/op   1.17x (p = 0.000*)
>> StringConcat.concatConst6Object  50   56.115 ±   0.28854.046 ±   
>> 0.214 ns/op   1.04x (p = 0.000*)
>> StringConcat.concatConst6String  50   19.032 ±   0.07316.213 ±   
>> 0.093 ns/op   1.17x (p = 0.000*)
>> StringConcat.concatConstBoolByte 508.486 ±   0.066 8.556 ±   
>> 0.050 ns/op   0.99x (p = 0.004*)
>> StringConcat.concatConstInt  508.942 ±   0.052 7.677 ±   
>> 0.029 ns/op   1.16x (p = 0.000*)
>> StringConcat.concatConstIntConstInt  50   12.883 ±   0.07012.431 ±   
>> 0.070 ns/op   1.04x (p = 0.000*)
>> StringConcat.concatConstString   507.523 ±   0.050 7.486 ±   
>> 0.044 ns/op   1.00x (p = 0.058 )
>> StringConcat.concatConstStringConstInt   50   11.961 ±   0.03211.699 ±   
>> 0.049 ns/op   1.02x (p = 0.000*)
>> StringConcat.concatEmptyConstInt 507.778 ±   0.038 7.723 ±   
>> 0.037 ns/op   1.01x (p = 0.000*)
>> StringConcat.concatEmptyConstString  503.506 ±   0.026 3.505 ±   
>> 0.015 ns/op   1.00x (p = 0.930 )
>> StringConcat.concatEmptyLeft 503.573 ±   0.075 3.518 ±   
>> 0.057 ns/op   1.02x (p = 0.044 )
>> StringConcat.concatEmptyRight503.713 ±   0.049 3.622 ±   
>> 0.053 ns/op   1.02x (p = 0.000*)
>> StringConcat.concatMethodConstString 507.418 ±   0.030 7.478 ±   
>> 0.066 ns/op   0.99x (p...
>
> Claes Redestad has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Excess blankspace

Thanks for reviews and patience - went on vacation for a couple of weeks and 
didn't want to integrate this and leave any bug trail for others to deal with. 
Back part-time for a while so let's go!

-

PR Comment: https://git.openjdk.org/jdk/pull/19927#issuecomment-2245022856


Integrated: 8335182: Consolidate and streamline String concat code shapes

2024-07-23 Thread Claes Redestad
On Thu, 27 Jun 2024 12:27:26 GMT, Claes Redestad  wrote:

> This PR attains a speed-up on some microbenchmarks by simplifying how we 
> build up the MH combinator tree shape
> (only use prependers with prefix, always add a suffix to the newArray 
> combinator), then simplifying/inlining some of the code in the helper 
> functions. 
> 
> Many simple concatenation expressions stay performance neutral, while the win 
> comes from enabling C2 to better optimize more complex shapes 
> (concat13String, concatMix4String, concatConst6String see relatively large 
> improvements):
> 
> 
> NameCnt Base Error  Test 
> Error  Unit  Change
> StringConcat.concat13String  50   66.380 ±   0.18953.017 ±   
> 0.241 ns/op   1.25x (p = 0.000*)
> StringConcat.concat4String   50   13.620 ±   0.05312.382 ±   
> 0.089 ns/op   1.10x (p = 0.000*)
> StringConcat.concat6String   50   17.168 ±   0.07016.046 ±   
> 0.064 ns/op   1.07x (p = 0.000*)
> StringConcat.concatConst2String  509.820 ±   0.081 8.158 ±   
> 0.041 ns/op   1.20x (p = 0.000*)
> StringConcat.concatConst4String  50   14.938 ±   0.12412.800 ±   
> 0.049 ns/op   1.17x (p = 0.000*)
> StringConcat.concatConst6Object  50   56.115 ±   0.28854.046 ±   
> 0.214 ns/op   1.04x (p = 0.000*)
> StringConcat.concatConst6String  50   19.032 ±   0.07316.213 ±   
> 0.093 ns/op   1.17x (p = 0.000*)
> StringConcat.concatConstBoolByte 508.486 ±   0.066 8.556 ±   
> 0.050 ns/op   0.99x (p = 0.004*)
> StringConcat.concatConstInt  508.942 ±   0.052 7.677 ±   
> 0.029 ns/op   1.16x (p = 0.000*)
> StringConcat.concatConstIntConstInt  50   12.883 ±   0.07012.431 ±   
> 0.070 ns/op   1.04x (p = 0.000*)
> StringConcat.concatConstString   507.523 ±   0.050 7.486 ±   
> 0.044 ns/op   1.00x (p = 0.058 )
> StringConcat.concatConstStringConstInt   50   11.961 ±   0.03211.699 ±   
> 0.049 ns/op   1.02x (p = 0.000*)
> StringConcat.concatEmptyConstInt 507.778 ±   0.038 7.723 ±   
> 0.037 ns/op   1.01x (p = 0.000*)
> StringConcat.concatEmptyConstString  503.506 ±   0.026 3.505 ±   
> 0.015 ns/op   1.00x (p = 0.930 )
> StringConcat.concatEmptyLeft 503.573 ±   0.075 3.518 ±   
> 0.057 ns/op   1.02x (p = 0.044 )
> StringConcat.concatEmptyRight503.713 ±   0.049 3.622 ±   
> 0.053 ns/op   1.02x (p = 0.000*)
> StringConcat.concatMethodConstString 507.418 ±   0.030 7.478 ±   
> 0.066 ns/op   0.99x (p = 0.005*)
> StringConcat.concatMix4String...

This pull request has now been integrated.

Changeset: e83b4b23
Author:Claes Redestad 
URL:   
https://git.openjdk.org/jdk/commit/e83b4b236eca48d0b75094006f7f888398194fe4
Stats: 566 lines in 6 files changed: 412 ins; 114 del; 40 mod

8335182: Consolidate and streamline String concat code shapes

Reviewed-by: liach, jvernee

-

PR: https://git.openjdk.org/jdk/pull/19927


Re: RFR: 8336856: Optimize String Concat

2024-07-21 Thread Claes Redestad
On Sun, 21 Jul 2024 12:25:36 GMT, Shaojin Wen  wrote:

> The current implementation of StringConcat is to mix the coder and length 
> into a long. This operation will have some overhead for int/long/boolean 
> types. We can separate the calculation of the coder from the calculation of 
> the length, which can improve the performance in the scenario of concat 
> int/long/boolean.
> 
> This idea comes from the suggestion of @l4es in the discussion of PR 
> https://github.com/openjdk/jdk/pull/20253#issuecomment-2240412866

This reverts startup optimizations which are important for the overall 
performance of the current MH-based implementation. I did not suggest splitting 
apart length and coder in the context of this current implementation, but that 
in a non-MH-based implementation that combination trick makes little sense. 

I am trying to find time to work out the kinks of a bytecode-spinning solution 
and would appreciate it if we leave the venerable `StringConcatFactory` alone 
for the time being - at least until after JVMLS (Aug 5-7).


(I noticed in some tests that splitting apart like this seem to only affect 
M1/aarch64, not x64, suggesting some JIT compiler deficiency on aarch64. 
Perhaps that would be a useful side track to explore.)

-

PR Comment: https://git.openjdk.org/jdk/pull/20273#issuecomment-2241602473


Re: RFR: 8336831: Optimize StringConcatHelper.simpleConcat [v5]

2024-07-19 Thread Claes Redestad
On Fri, 19 Jul 2024 21:42:09 GMT, Shaojin Wen  wrote:

>> Currently simpleConcat is implemented using mix and prepend, but in this 
>> simple scenario, it can be implemented in a simpler way and can improve 
>> performance.
>
> Shaojin Wen has updated the pull request incrementally with two additional 
> commits since the last revision:
> 
>  - Update src/java.base/share/classes/java/lang/String.java
>
>Co-authored-by: Chen Liang 
>  - add comments

No, my idea is to rework and have SCF spin hidden classes using the new 
classfile API which will emit code or less in the style you are doing here for 
`simpleConcat`, but generalized. This may or may not work out beautifully. I'm 
coming back from a short vacation now and will work on having something to 
present on this topic at JVMLS two and a half weeks from now.

FTR I don't mind if you get this optimization in here. I'll just rebase my 
prototype, and it shouldn't alter things too much.

-

PR Comment: https://git.openjdk.org/jdk/pull/20253#issuecomment-2240728186


Re: RFR: 8336831: Optimize StringConcatHelper.simpleConcat [v5]

2024-07-19 Thread Claes Redestad
On Fri, 19 Jul 2024 21:42:09 GMT, Shaojin Wen  wrote:

>> Currently simpleConcat is implemented using mix and prepend, but in this 
>> simple scenario, it can be implemented in a simpler way and can improve 
>> performance.
>
> Shaojin Wen has updated the pull request incrementally with two additional 
> commits since the last revision:
> 
>  - Update src/java.base/share/classes/java/lang/String.java
>
>Co-authored-by: Chen Liang 
>  - add comments

FWIW one of the ideas when implementing `StringConcatHelper.simpleConcat` was 
that by using the primitives used by the `StringConcatFactory` as 
straightforwardly as possible the method acts as a documentation-of-sorts or 
guide to understand how the `SCF` expression trees are built up. It's not 
perfect, though. Concatenation of a `String` + constant would be handled 
differently for one. So perhaps the value as a guide is not high. 

I'm also experimenting with replacing the MH-based strategy with spinning 
hidden, shareable classes instead. In that case we might actually be better off 
getting rid of the `long indexCoder` hacks and generate code more similar to 
the `doConcat` you've come up with here. The main benefit of combining `coder` 
and `index/length` into a single `long` was to reduce the MH combinator 
overheads, but if we're spinning code with access to optimized primitives then 
that isn't really needed.

-

PR Comment: https://git.openjdk.org/jdk/pull/20253#issuecomment-2240412866


Re: RFR: 8335182: Consolidate and streamline String concat code shapes [v3]

2024-07-09 Thread Claes Redestad
> This PR attains a speed-up on some microbenchmarks by simplifying how we 
> build up the MH combinator tree shape
> (only use prependers with prefix, always add a suffix to the newArray 
> combinator), then simplifying/inlining some of the code in the helper 
> functions. 
> 
> Many simple concatenation expressions stay performance neutral, while the win 
> comes from enabling C2 to better optimize more complex shapes 
> (concat13String, concatMix4String, concatConst6String see relatively large 
> improvements):
> 
> 
> NameCnt Base Error  Test 
> Error  Unit  Change
> StringConcat.concat13String  50   66.380 ±   0.18953.017 ±   
> 0.241 ns/op   1.25x (p = 0.000*)
> StringConcat.concat4String   50   13.620 ±   0.05312.382 ±   
> 0.089 ns/op   1.10x (p = 0.000*)
> StringConcat.concat6String   50   17.168 ±   0.07016.046 ±   
> 0.064 ns/op   1.07x (p = 0.000*)
> StringConcat.concatConst2String  509.820 ±   0.081 8.158 ±   
> 0.041 ns/op   1.20x (p = 0.000*)
> StringConcat.concatConst4String  50   14.938 ±   0.12412.800 ±   
> 0.049 ns/op   1.17x (p = 0.000*)
> StringConcat.concatConst6Object  50   56.115 ±   0.28854.046 ±   
> 0.214 ns/op   1.04x (p = 0.000*)
> StringConcat.concatConst6String  50   19.032 ±   0.07316.213 ±   
> 0.093 ns/op   1.17x (p = 0.000*)
> StringConcat.concatConstBoolByte 508.486 ±   0.066 8.556 ±   
> 0.050 ns/op   0.99x (p = 0.004*)
> StringConcat.concatConstInt  508.942 ±   0.052 7.677 ±   
> 0.029 ns/op   1.16x (p = 0.000*)
> StringConcat.concatConstIntConstInt  50   12.883 ±   0.07012.431 ±   
> 0.070 ns/op   1.04x (p = 0.000*)
> StringConcat.concatConstString   507.523 ±   0.050 7.486 ±   
> 0.044 ns/op   1.00x (p = 0.058 )
> StringConcat.concatConstStringConstInt   50   11.961 ±   0.03211.699 ±   
> 0.049 ns/op   1.02x (p = 0.000*)
> StringConcat.concatEmptyConstInt 507.778 ±   0.038 7.723 ±   
> 0.037 ns/op   1.01x (p = 0.000*)
> StringConcat.concatEmptyConstString  503.506 ±   0.026 3.505 ±   
> 0.015 ns/op   1.00x (p = 0.930 )
> StringConcat.concatEmptyLeft 503.573 ±   0.075 3.518 ±   
> 0.057 ns/op   1.02x (p = 0.044 )
> StringConcat.concatEmptyRight503.713 ±   0.049 3.622 ±   
> 0.053 ns/op   1.02x (p = 0.000*)
> StringConcat.concatMethodConstString 507.418 ±   0.030 7.478 ±   
> 0.066 ns/op   0.99x (p = 0.005*)
> StringConcat.concatMix4String...

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Excess blankspace

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/19927/files
  - new: https://git.openjdk.org/jdk/pull/19927/files/7c50d7dd..58fe4b12

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=19927=02
 - incr: https://webrevs.openjdk.org/?repo=jdk=19927=01-02

  Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod
  Patch: https://git.openjdk.org/jdk/pull/19927.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19927/head:pull/19927

PR: https://git.openjdk.org/jdk/pull/19927


Re: RFR: 8335366: Improve String.format performance with fastpath [v9]

2024-07-01 Thread Claes Redestad
On Mon, 1 Jul 2024 06:18:55 GMT, Chen Liang  wrote:

>> Shaojin Wen has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   code style
>
> As promising as the performance number is, I think we need to ensure two 
> things:
> 1. Correctness: this patch adds a lot of special cases; not sure if the 
> current test cases already cover all of them. In addition, format is i18n 
> stuff, which will need extra review besides the review for performance gains.
> 2. Validity: the existing benchmarks don't have profile pollution: see 
> `ReflectionSpeedBenchmark` 
> https://github.com/openjdk/jdk/blob/d9bcf061450ebfb7fe02b5a50c855db1d9178e5d/test/micro/org/openjdk/bench/java/lang/reflect/ReflectionSpeedBenchmark.java#L291
>  where the `Method.invoke` and `Constructor.newInstance` are called to tamper 
> JIT's profiling, as JIT can conclude that only one format shape is ever used 
> in your benchmark, which is unlikely in production. You should call 
> `String.format` and `String.formatted` with varied format strings and 
> arguments in setup for profile pollution.
>   An extreme example would be at 
> https://github.com/openjdk/jdk/pull/14944#issuecomment-1644050455, where 
> `Arrays.hashCode(Object[])` where every element is an `Integer` is extremely 
> fast, but slows down drastically once different arrays are passed.

Fully agree with the points @liach made above that this needs to be scrutinized 
for correctness and validity. For example we need to better explore cases where 
detection for the fast path might be costlier, for example a longer format 
string with or without some acceptable specifiers early, then something that 
forces us into the slow path only at the end.

-

PR Comment: https://git.openjdk.org/jdk/pull/19956#issuecomment-2199478317


Re: RFR: 8335252: Reduce size of j.u.Formatter.Conversion#isValid [v4]

2024-06-29 Thread Claes Redestad
On Fri, 28 Jun 2024 12:46:49 GMT, Shaojin Wen  wrote:

>> Currently, the java.util.Formatter$Conversion::isValid method is implemented 
>> based on switch, which cannot be inlined because codeSize > 325. This 
>> problem can be avoided by implementing it with ImmutableBitSetPredicate.
>> 
>> use `-XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining` to see the master 
>> branch:
>> 
>> @ 109   java.util.Formatter$Conversion::isValid (358 bytes)   failed to 
>> inline: hot method too big
>> 
>> 
>> current version
>> 
>> @ 109   java.util.Formatter$Conversion::isValid (10 bytes)   inline (hot)
>>   @ 4   
>> jdk.internal.util.ImmutableBitSetPredicate$SmallImmutableBitSetPredicate::test
>>  (50 bytes)   inline (hot)
>
> Shaojin Wen has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   fix comment

I agree with improving the legacy `String.format` code, within reason due it's 
wide use and appeal. Doing so does not detract from the goal of providing 
smarter and faster formatting via `StringTemplates`.

-

PR Comment: https://git.openjdk.org/jdk/pull/19926#issuecomment-2198045830


Re: RFR: 8335252: Reduce size of j.u.Formatter.Conversion#isValid [v4]

2024-06-28 Thread Claes Redestad
On Fri, 28 Jun 2024 12:46:49 GMT, Shaojin Wen  wrote:

>> Currently, the java.util.Formatter$Conversion::isValid method is implemented 
>> based on switch, which cannot be inlined because codeSize > 325. This 
>> problem can be avoided by implementing it with ImmutableBitSetPredicate.
>> 
>> use `-XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining` to see the master 
>> branch:
>> 
>> @ 109   java.util.Formatter$Conversion::isValid (358 bytes)   failed to 
>> inline: hot method too big
>> 
>> 
>> current version
>> 
>> @ 109   java.util.Formatter$Conversion::isValid (10 bytes)   inline (hot)
>>   @ 4   
>> jdk.internal.util.ImmutableBitSetPredicate$SmallImmutableBitSetPredicate::test
>>  (50 bytes)   inline (hot)
>
> Shaojin Wen has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   fix comment

Thanks for the updates. I think this is now a harmless and straightforward 
improvement.

-

Marked as reviewed by redestad (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19926#pullrequestreview-2147942903


Re: RFR: 8335182: Consolidate and streamline String concat code shapes [v2]

2024-06-28 Thread Claes Redestad
> This PR attains a speed-up on some microbenchmarks by simplifying how we 
> build up the MH combinator tree shape
> (only use prependers with prefix, always add a suffix to the newArray 
> combinator), then simplifying/inlining some of the code in the helper 
> functions. 
> 
> Many simple concatenation expressions stay performance neutral, while the win 
> comes from enabling C2 to better optimize more complex shapes 
> (concat13String, concatMix4String, concatConst6String see relatively large 
> improvements):
> 
> 
> NameCnt Base Error  Test 
> Error  Unit  Change
> StringConcat.concat13String  50   66.380 ±   0.18953.017 ±   
> 0.241 ns/op   1.25x (p = 0.000*)
> StringConcat.concat4String   50   13.620 ±   0.05312.382 ±   
> 0.089 ns/op   1.10x (p = 0.000*)
> StringConcat.concat6String   50   17.168 ±   0.07016.046 ±   
> 0.064 ns/op   1.07x (p = 0.000*)
> StringConcat.concatConst2String  509.820 ±   0.081 8.158 ±   
> 0.041 ns/op   1.20x (p = 0.000*)
> StringConcat.concatConst4String  50   14.938 ±   0.12412.800 ±   
> 0.049 ns/op   1.17x (p = 0.000*)
> StringConcat.concatConst6Object  50   56.115 ±   0.28854.046 ±   
> 0.214 ns/op   1.04x (p = 0.000*)
> StringConcat.concatConst6String  50   19.032 ±   0.07316.213 ±   
> 0.093 ns/op   1.17x (p = 0.000*)
> StringConcat.concatConstBoolByte 508.486 ±   0.066 8.556 ±   
> 0.050 ns/op   0.99x (p = 0.004*)
> StringConcat.concatConstInt  508.942 ±   0.052 7.677 ±   
> 0.029 ns/op   1.16x (p = 0.000*)
> StringConcat.concatConstIntConstInt  50   12.883 ±   0.07012.431 ±   
> 0.070 ns/op   1.04x (p = 0.000*)
> StringConcat.concatConstString   507.523 ±   0.050 7.486 ±   
> 0.044 ns/op   1.00x (p = 0.058 )
> StringConcat.concatConstStringConstInt   50   11.961 ±   0.03211.699 ±   
> 0.049 ns/op   1.02x (p = 0.000*)
> StringConcat.concatEmptyConstInt 507.778 ±   0.038 7.723 ±   
> 0.037 ns/op   1.01x (p = 0.000*)
> StringConcat.concatEmptyConstString  503.506 ±   0.026 3.505 ±   
> 0.015 ns/op   1.00x (p = 0.930 )
> StringConcat.concatEmptyLeft 503.573 ±   0.075 3.518 ±   
> 0.057 ns/op   1.02x (p = 0.044 )
> StringConcat.concatEmptyRight503.713 ±   0.049 3.622 ±   
> 0.053 ns/op   1.02x (p = 0.000*)
> StringConcat.concatMethodConstString 507.418 ±   0.030 7.478 ±   
> 0.066 ns/op   0.99x (p = 0.005*)
> StringConcat.concatMix4String...

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Copyrights

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/19927/files
  - new: https://git.openjdk.org/jdk/pull/19927/files/6525e945..7c50d7dd

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=19927=01
 - incr: https://webrevs.openjdk.org/?repo=jdk=19927=00-01

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/19927.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19927/head:pull/19927

PR: https://git.openjdk.org/jdk/pull/19927


Re: RFR: 8335252: ForceInline j.u.Formatter.Conversion#isValid [v3]

2024-06-28 Thread Claes Redestad
On Fri, 28 Jun 2024 11:27:47 GMT, Shaojin Wen  wrote:

>> Currently, the java.util.Formatter$Conversion::isValid method is implemented 
>> based on switch, which cannot be inlined because codeSize > 325. This 
>> problem can be avoided by implementing it with ImmutableBitSetPredicate.
>> 
>> use `-XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining` to see the master 
>> branch:
>> 
>> @ 109   java.util.Formatter$Conversion::isValid (358 bytes)   failed to 
>> inline: hot method too big
>> 
>> 
>> current version
>> 
>> @ 109   java.util.Formatter$Conversion::isValid (10 bytes)   inline (hot)
>>   @ 4   
>> jdk.internal.util.ImmutableBitSetPredicate$SmallImmutableBitSetPredicate::test
>>  (50 bytes)   inline (hot)
>
> Shaojin Wen has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   remove ForceInline

This looks better. Avoids a branch up front, minimized code changes. The 
comment is good, but perhaps move it down into the default case? 

Update bug description to something like "Reduce size of 
j.u.Formatter.Conversion#isValid" ?

-

PR Comment: https://git.openjdk.org/jdk/pull/19926#issuecomment-2196808696


Re: RFR: 8335252: ForceInline j.u.Formatter.Conversion#isValid [v2]

2024-06-28 Thread Claes Redestad
On Thu, 27 Jun 2024 14:12:36 GMT, Shaojin Wen  wrote:

>> Currently, the java.util.Formatter$Conversion::isValid method is implemented 
>> based on switch, which cannot be inlined because codeSize > 325. This 
>> problem can be avoided by implementing it with ImmutableBitSetPredicate.
>> 
>> use `-XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining` to see the master 
>> branch:
>> 
>> @ 109   java.util.Formatter$Conversion::isValid (358 bytes)   failed to 
>> inline: hot method too big
>> 
>> 
>> current version
>> 
>> @ 109   java.util.Formatter$Conversion::isValid (10 bytes)   inline (hot)
>>   @ 4   
>> jdk.internal.util.ImmutableBitSetPredicate$SmallImmutableBitSetPredicate::test
>>  (50 bytes)   inline (hot)
>
> Shaojin Wen has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   revert & use `@ForceInline`

So the compound issue here is that there's a heuristic which prevents inlining 
of large methods, and that a tableswitch on inputs spanning from `'%' (37)` to 
`'x' (120)` become expansive enough (350bytes) to make the JIT enforce that 
rule.

As an alternative to `@ForceInline` I tested splitting uppercase and lowercase 
chars into two switches. This looks a bit nasty, but actually reduces the size 
of the method:

diff --git a/src/java.base/share/classes/java/util/Formatter.java 
b/src/java.base/share/classes/java/util/Formatter.java
index a7d95ee5780..50419f1ea23 100644
--- a/src/java.base/share/classes/java/util/Formatter.java
+++ b/src/java.base/share/classes/java/util/Formatter.java
@@ -4947,30 +4947,36 @@ static class Conversion {
 static final char PERCENT_SIGN= '%';

 static boolean isValid(char c) {
-return switch (c) {
-case BOOLEAN,
- BOOLEAN_UPPER,
- STRING,
- STRING_UPPER,
- HASHCODE,
- HASHCODE_UPPER,
- CHARACTER,
- CHARACTER_UPPER,
- DECIMAL_INTEGER,
- OCTAL_INTEGER,
- HEXADECIMAL_INTEGER,
- HEXADECIMAL_INTEGER_UPPER,
- SCIENTIFIC,
- SCIENTIFIC_UPPER,
- GENERAL,
- GENERAL_UPPER,
- DECIMAL_FLOAT,
- HEXADECIMAL_FLOAT,
- HEXADECIMAL_FLOAT_UPPER,
- LINE_SEPARATOR,
- PERCENT_SIGN -> true;
-default -> false;
-};
+if (c >= 'a') {
+return switch (c) {
+case BOOLEAN,
+ STRING,
+ HASHCODE,
+ CHARACTER,
+ DECIMAL_INTEGER,
+ OCTAL_INTEGER,
+ HEXADECIMAL_INTEGER,
+ SCIENTIFIC,
+ GENERAL,
+ DECIMAL_FLOAT,
+ HEXADECIMAL_FLOAT,
+ LINE_SEPARATOR -> true;
+default -> false;
+};
+} else {
+return switch (c) {
+case BOOLEAN_UPPER,
+ STRING_UPPER,
+ HASHCODE_UPPER,
+ CHARACTER_UPPER,
+ HEXADECIMAL_INTEGER_UPPER,
+ SCIENTIFIC_UPPER,
+ GENERAL_UPPER,
+ HEXADECIMAL_FLOAT_UPPER,
+ PERCENT_SIGN -> true;
+default -> false;
+};
+}
 }

 // Returns true iff the Conversion is applicable to all objects.

`javap -v java.lang.Formatter$Conversion`

  static boolean isValid(char);
descriptor: (C)Z
flags: (0x0008) ACC_STATIC
Code:
  stack=2, locals=1, args_size=1
 0: iload_0
 1: bipush97
 3: if_icmplt 122
 6: iload_0
 7: tableswitch   { // 97 to 120
  97: 116
  98: 116
  99: 116
 100: 116
 101: 116
 102: 116
 103: 116
 104: 116
 105: 120
 106: 120
 107: 120
 108: 120
 109: 120
 110: 116
 111: 116
 112: 120
 113: 120
 114: 120
 115: 116
 116: 120
 117: 120
 118: 120
 119: 120
 120: 116
 default: 120
}
   116: iconst_1
   117: goto  121
   120: iconst_0
   121: ireturn
  

Re: RFR: 8335182: Consolidate and streamline String concat code shapes

2024-06-27 Thread Claes Redestad
On Thu, 27 Jun 2024 16:05:09 GMT, Chen Liang  wrote:

>> This PR attains a speed-up on some microbenchmarks by simplifying how we 
>> build up the MH combinator tree shape
>> (only use prependers with prefix, always add a suffix to the newArray 
>> combinator), then simplifying/inlining some of the code in the helper 
>> functions. 
>> 
>> Many simple concatenation expressions stay performance neutral, while the 
>> win comes from enabling C2 to better optimize more complex shapes 
>> (concat13String, concatMix4String, concatConst6String see relatively large 
>> improvements):
>> 
>> 
>> NameCnt Base Error  Test 
>> Error  Unit  Change
>> StringConcat.concat13String  50   66.380 ±   0.18953.017 ±   
>> 0.241 ns/op   1.25x (p = 0.000*)
>> StringConcat.concat4String   50   13.620 ±   0.05312.382 ±   
>> 0.089 ns/op   1.10x (p = 0.000*)
>> StringConcat.concat6String   50   17.168 ±   0.07016.046 ±   
>> 0.064 ns/op   1.07x (p = 0.000*)
>> StringConcat.concatConst2String  509.820 ±   0.081 8.158 ±   
>> 0.041 ns/op   1.20x (p = 0.000*)
>> StringConcat.concatConst4String  50   14.938 ±   0.12412.800 ±   
>> 0.049 ns/op   1.17x (p = 0.000*)
>> StringConcat.concatConst6Object  50   56.115 ±   0.28854.046 ±   
>> 0.214 ns/op   1.04x (p = 0.000*)
>> StringConcat.concatConst6String  50   19.032 ±   0.07316.213 ±   
>> 0.093 ns/op   1.17x (p = 0.000*)
>> StringConcat.concatConstBoolByte 508.486 ±   0.066 8.556 ±   
>> 0.050 ns/op   0.99x (p = 0.004*)
>> StringConcat.concatConstInt  508.942 ±   0.052 7.677 ±   
>> 0.029 ns/op   1.16x (p = 0.000*)
>> StringConcat.concatConstIntConstInt  50   12.883 ±   0.07012.431 ±   
>> 0.070 ns/op   1.04x (p = 0.000*)
>> StringConcat.concatConstString   507.523 ±   0.050 7.486 ±   
>> 0.044 ns/op   1.00x (p = 0.058 )
>> StringConcat.concatConstStringConstInt   50   11.961 ±   0.03211.699 ±   
>> 0.049 ns/op   1.02x (p = 0.000*)
>> StringConcat.concatEmptyConstInt 507.778 ±   0.038 7.723 ±   
>> 0.037 ns/op   1.01x (p = 0.000*)
>> StringConcat.concatEmptyConstString  503.506 ±   0.026 3.505 ±   
>> 0.015 ns/op   1.00x (p = 0.930 )
>> StringConcat.concatEmptyLeft 503.573 ±   0.075 3.518 ±   
>> 0.057 ns/op   1.02x (p = 0.044 )
>> StringConcat.concatEmptyRight503.713 ±   0.049 3.622 ±   
>> 0.053 ns/op   1.02x (p = 0.000*)
>> StringConcat.concatMethodConstString 507.418 ±   0.030 7.478 ±   
>> 0.066 ns/op   0.99x (p...
>
> src/java.base/share/classes/java/lang/StringConcatHelper.java line 157:
> 
>> 155: }
>> 156: index -= prefix.length();
>> 157: prefix.getBytes(buf, index, String.LATIN1);
> 
> Since we are now passing in a lot of empty prefix, I wonder how this call is 
> elided; is there some specific JIT intrinsic? The java code has no shortcut 
> for `length == 0`.

Remember that the prefixes are all constants and bound into the MH at each call 
site, so one view is that this PR is mostly about tickling the code into 
something that just-so happens to be easier for the JIT to swallow

-

PR Review Comment: https://git.openjdk.org/jdk/pull/19927#discussion_r1657513167


RFR: 8335182: Consolidate and streamline String concat code shapes

2024-06-27 Thread Claes Redestad
This PR attains a speed-up on some microbenchmarks by simplifying how we build 
up the MH combinator tree shape
(only use prependers with prefix, always add a suffix to the newArray 
combinator), then simplifying/inlining some of the code in the helper 
functions. 

Many simple concatenation expressions stay performance neutral, while the win 
comes from enabling C2 to better optimize more complex shapes (concat13String, 
concatMix4String, concatConst6String see relatively large improvements):


NameCnt Base Error  Test 
Error  Unit  Change
StringConcat.concat13String  50   66.380 ±   0.18953.017 ±   
0.241 ns/op   1.25x (p = 0.000*)
StringConcat.concat4String   50   13.620 ±   0.05312.382 ±   
0.089 ns/op   1.10x (p = 0.000*)
StringConcat.concat6String   50   17.168 ±   0.07016.046 ±   
0.064 ns/op   1.07x (p = 0.000*)
StringConcat.concatConst2String  509.820 ±   0.081 8.158 ±   
0.041 ns/op   1.20x (p = 0.000*)
StringConcat.concatConst4String  50   14.938 ±   0.12412.800 ±   
0.049 ns/op   1.17x (p = 0.000*)
StringConcat.concatConst6Object  50   56.115 ±   0.28854.046 ±   
0.214 ns/op   1.04x (p = 0.000*)
StringConcat.concatConst6String  50   19.032 ±   0.07316.213 ±   
0.093 ns/op   1.17x (p = 0.000*)
StringConcat.concatConstBoolByte 508.486 ±   0.066 8.556 ±   
0.050 ns/op   0.99x (p = 0.004*)
StringConcat.concatConstInt  508.942 ±   0.052 7.677 ±   
0.029 ns/op   1.16x (p = 0.000*)
StringConcat.concatConstIntConstInt  50   12.883 ±   0.07012.431 ±   
0.070 ns/op   1.04x (p = 0.000*)
StringConcat.concatConstString   507.523 ±   0.050 7.486 ±   
0.044 ns/op   1.00x (p = 0.058 )
StringConcat.concatConstStringConstInt   50   11.961 ±   0.03211.699 ±   
0.049 ns/op   1.02x (p = 0.000*)
StringConcat.concatEmptyConstInt 507.778 ±   0.038 7.723 ±   
0.037 ns/op   1.01x (p = 0.000*)
StringConcat.concatEmptyConstString  503.506 ±   0.026 3.505 ±   
0.015 ns/op   1.00x (p = 0.930 )
StringConcat.concatEmptyLeft 503.573 ±   0.075 3.518 ±   
0.057 ns/op   1.02x (p = 0.044 )
StringConcat.concatEmptyRight503.713 ±   0.049 3.622 ±   
0.053 ns/op   1.02x (p = 0.000*)
StringConcat.concatMethodConstString 507.418 ±   0.030 7.478 ±   
0.066 ns/op   0.99x (p = 0.005*)
StringConcat.concatMix4String50   89.243 ±   0.43671.866 ±   
0.894 ns/op   1.24x (p = 0.000*)
StringConcatStartup.MixedLarge.run   10  655.436 ±  29.787   649.730 ±  
26.053 ms/op   1.01x (p = 0.500 )
StringConcatStartup.MixedSmall.run   20   51.676 ±   2.32450.724 ±   
5.050 ms/op   1.02x (p = 0.512 )
StringConcatStartup.StringLarge.run  10  166.022 ±  15.672   165.300 ±  
14.433 ms/op   1.00x (p = 0.873 )
StringConcatStartup.StringSingle.run 400.168 ±   0.016 0.178 ±   
0.024 ms/op   0.94x (p = 0.234 )
* = significant


Startup-wise it's more or less neutral as evidenced by the added 
`StringConcatStartup` micro (a simplified and JMH:ified version of some startup 
stress tests I've been tinkering with over the years). This PR does not change 
the total number of classes generated and loaded by this test (3359 total, 2499 
generated).

-

Commit messages:
 - Add StringConcatStartup JMH
 - Merge branch 'master' into consolidated_prependers
 - Simplify
 - Remove getBytes changes
 - Streamline newArray
 - Consolidate to always end with newArrayWithSuffix
 - Merge branch 'master' into consolidated_prependers
 - Streamline prependers
 - Fix compilation errors from removing non-prefixed prepend methods
 - Remove non-prefix prepender methods, inline logic into prefixed variant
 - ... and 1 more: https://git.openjdk.org/jdk/compare/efb905e5...6525e945

Changes: https://git.openjdk.org/jdk/pull/19927/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk=19927=00
  Issue: https://bugs.openjdk.org/browse/JDK-8335182
  Stats: 565 lines in 6 files changed: 412 ins; 114 del; 39 mod
  Patch: https://git.openjdk.org/jdk/pull/19927.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19927/head:pull/19927

PR: https://git.openjdk.org/jdk/pull/19927


Re: RFR: 8335252: Use ImmutableBitSetPredicate optimize j.u.Formatter.Conversion#isValid

2024-06-27 Thread Claes Redestad
On Thu, 27 Jun 2024 11:48:56 GMT, Shaojin Wen  wrote:

> * byte codes of `Conversion.isValid`

It's surprising that javac generates synthetic cases identical to the default 
case for every char in the 37 to 120 range (the bounds seem to be the lowest 
and highest of the constants). 

Is there a benchmark where you see a benefit from this?

-

PR Comment: https://git.openjdk.org/jdk/pull/19926#issuecomment-2194540522


Re: RFR: 8333893: Optimization for StringBuilder append boolean & null [v4]

2024-06-26 Thread Claes Redestad
On Thu, 20 Jun 2024 14:32:25 GMT, Shaojin Wen  wrote:

>> @cl4es  @wenshao I think we should probably work on `putChar`, or at least 
>> figure out what blocks `MergeStore` for `putChar`. Can someone produce a 
>> simple stand-alone `.java` file that uses `putChar`, so that I can 
>> investigate why `MergeStore` does not work for it?
>> 
>> Otherwise, I agree with @cl4es : the code here may be ok for now, but we 
>> would have to revert it again later, should `MergeStore` eventually do the 
>> trick.
>
> @eme64 
> 
> simple stand-alone java
> 
> import jdk.internal.misc.Unsafe;
> 
> public class PutCharTest {
> static final Unsafe UNSAFE = Unsafe.getUnsafe();
> 
> static void putCharsAt(byte[] val, int index, int c1, int c2, int c3, int 
> c4) {
> putChar(val, index, (char)(c1));
> putChar(val, index + 1, (char)(c2));
> putChar(val, index + 2, (char)(c3));
> putChar(val, index + 3, (char)(c4));
> }
> 
> static void putChar(byte[] bytes, int index, int c) {
> UNSAFE.putChar(bytes, Unsafe.ARRAY_BYTE_BASE_OFFSET + (index << 1), 
> (char)(c));
> }
> 
> static void putChar0(byte[] bytes, int index, int c) {
> UNSAFE.putByte(bytes, Unsafe.ARRAY_BYTE_BASE_OFFSET + (index << 1), 
> (byte)(c));
> UNSAFE.putByte(bytes, Unsafe.ARRAY_BYTE_BASE_OFFSET + (index << 1) + 
> 1, (byte)(c << 8));
> }
> 
> static void putNull(byte[] bytes, int index) {
> putCharsAt(bytes, index, 'n', 'u', 'l', 'l');
> }
> 
> public static void main(String[] args) {
> for (int i = 0; i < 5; i++) {
> testNull();
> }
> 
> System.out.println("done");
> }
> 
> private static void testNull() {
> byte[] bytes = new byte[8192];
> 
> for (int i = 0; i < 1000; i++) {
> int index = 0;
> for (int j = 0; j < 1024; j++) {
> putNull(bytes, index);
> index += 4;
> }
> }
> }
> }

As long as @wenshao is OK with that I'm happy to let this wait. I'll have 
limited availability myself in the coming weeks.

-

PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2192459124


Re: RFR: 8332522: SwitchBootstraps::mappedEnumLookup constructs unused array

2024-06-26 Thread Claes Redestad
On Wed, 26 Jun 2024 13:02:51 GMT, Jan Lahoda  wrote:

>> test/micro/org/openjdk/bench/java/lang/runtime/SwitchEnum.java line 57:
>> 
>>> 55: for (E e : inputs) {
>>> 56: sum += switch (e) {
>>> 57: case null -> -1;
>> 
>> As this `null` case adds a case relative to the `-Traditional` test then 
>> maybe removing one of the `E0, E1, ...` cases would make the test a little 
>> bit more apples-to-apples?
>
> Using `case null -> ` will push javac to use the new code, but all switches 
> do some kind of null check for the selector value. The difference is that if 
> there's no `case null`, there will be `Objects.requireNonNull` generated for 
> the selector value (which will throw an NPE if the value is `null`), while 
> here it will jump to the given case.
> 
> So, `case null` does not have the same weight as a normal case, so I don't 
> think it would be fair to remove a full case to compensate for it.

Fair enough, and I guess ~1.6ns/op is reasonable overhead for the added 
semantics.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/19906#discussion_r1655364063


Re: RFR: 8334328: Reduce object allocation for FloatToDecimal and DoubleToDecimal [v20]

2024-06-26 Thread Claes Redestad
On Wed, 26 Jun 2024 14:55:44 GMT, Shaojin Wen  wrote:

>> The current versions of FloatToDecimal and DoubleToDecimal allocate 
>> additional objects. Reducing these allocations can improve the performance 
>> of Float/Double.toString and AbstractStringBuilder's append(float/double).
>> 
>> This patch is just a code refactoring to reduce object allocation, but does 
>> not change the Float/Double to decimal algorithm.
>> 
>> The following code comments the allocated objects to be removed.
>> 
>> 
>> class FloatToDecimal {
>> public static String toString(float v) {
>> // allocate object FloatToDecimal
>> return new FloatToDecimal().toDecimalString(v);
>> }
>> 
>> public static Appendable appendTo(float v, Appendable app)
>> throws IOException {
>> // allocate object FloatToDecimal
>> return new FloatToDecimal().appendDecimalTo(v, app);
>> }
>> 
>> private Appendable appendDecimalTo(float v, Appendable app)
>> throws IOException {
>> switch (toDecimal(v)) {
>> case NON_SPECIAL:
>> // allocate object char[]
>> char[] chars = new char[index + 1];
>> for (int i = 0; i < chars.length; ++i) {
>> chars[i] = (char) bytes[i];
>> }
>> if (app instanceof StringBuilder builder) {
>> return builder.append(chars);
>> }
>> if (app instanceof StringBuffer buffer) {
>> return buffer.append(chars);
>> }
>> for (char c : chars) {
>> app.append(c);
>> }
>> return app;
>> // ...
>> }
>> }
>> }
>> 
>> class DoubleToDecimal {
>> public static String toString(double v) {
>> // allocate object DoubleToDecimal
>> return new DoubleToDecimal(false).toDecimalString(v);
>> }
>> 
>> public static Appendable appendTo(double v, Appendable app)
>> throws IOException {
>> // allocate object DoubleToDecimal
>> return new DoubleToDecimal(false).appendDecimalTo(v, app);
>> }
>> 
>> private Appendable appendDecimalTo(double v, Appendable app)
>> throws IOException {
>> switch (toDecimal(v, null)) {
>> case NON_SPECIAL:
>> // allocate object char[]
>> char[] chars = new char[index + 1];
>> for (int i = 0; i < chars.length; ++i) {
>> chars[i] = (char) bytes[i];
>> }
>> ...
>
> Shaojin Wen has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   1. revert code change.
>   2. comment remove space

LGTM2

-

Marked as reviewed by redestad (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19730#pullrequestreview-2142672406


Re: RFR: 8334437: De-duplicate ProxyMethod list creation

2024-06-26 Thread Claes Redestad
On Tue, 18 Jun 2024 07:41:26 GMT, Claes Redestad  wrote:

> Simple refactoring to extract identical, simple lambda expressions. Improve 
> code clarity and reduce classes generated.

Thanks!

-

PR Comment: https://git.openjdk.org/jdk/pull/19760#issuecomment-2191894639


Integrated: 8334437: De-duplicate ProxyMethod list creation

2024-06-26 Thread Claes Redestad
On Tue, 18 Jun 2024 07:41:26 GMT, Claes Redestad  wrote:

> Simple refactoring to extract identical, simple lambda expressions. Improve 
> code clarity and reduce classes generated.

This pull request has now been integrated.

Changeset: 5883a20b
Author:    Claes Redestad 
URL:   
https://git.openjdk.org/jdk/commit/5883a20b822bb8acb719076e4f7abee8403061cb
Stats: 10 lines in 1 file changed: 4 ins; 4 del; 2 mod

8334437: De-duplicate ProxyMethod list creation

Reviewed-by: asotona, liach

-

PR: https://git.openjdk.org/jdk/pull/19760


Re: RFR: 8333893: Optimization for StringBuilder append boolean & null [v4]

2024-06-26 Thread Claes Redestad
On Tue, 25 Jun 2024 17:19:28 GMT, Emanuel Peter  wrote:

> I filed:
> 
> [JDK-8335113](https://bugs.openjdk.org/browse/JDK-8335113): C2 MergeStores: 
> allow merging of putChar and other larger stores on smaller array elements

Great! Should we hold off on this optimization and see if we can avoid the need 
for `Unsafe` here - or go ahead integrating this PR and leave it to JDK-8335113 
to revert any no-longer-needed changes made here?

-

PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2191878373


Re: RFR: 8334328: Reduce object allocation for FloatToDecimal and DoubleToDecimal [v18]

2024-06-26 Thread Claes Redestad
On Wed, 26 Jun 2024 13:23:14 GMT, Shaojin Wen  wrote:

>> src/java.base/share/classes/jdk/internal/math/ToDecimal.java line 47:
>> 
>>> 45: static final int NAN = 5 << 8;
>>> 46: 
>>> 47: static final byte LATIN1 = 0;
>> 
>> I think this somewhat unnecessarily copies names and internal implementation 
>> details from `String` (where the `coder` value is used both as a 
>> pseudo-boolean and as a shift operand, which is not applicable here). In 
>> this context we could use something like `private final boolean latin1;` 
>> (all uses of `coder` are just `if (coder == LATIN1)` so it would be 
>> simplified to `if (latin1)`)
>
> I have written a version using boolean locally, but because this class is 
> mainly used by String and StringBuilder, it uses the same style as 
> String/StringBuilder.

Yeah, I'm just musing about the perils of leaking/replicating those 
implementation details, but as you say this is internal code only used by 
`String` and `StringBuilder` to it's loosely part of that complex. Not 
requesting any changes here for now..

-

PR Review Comment: https://git.openjdk.org/jdk/pull/19730#discussion_r1654924418


Re: RFR: 8334328: Reduce object allocation for FloatToDecimal and DoubleToDecimal [v18]

2024-06-26 Thread Claes Redestad
On Tue, 25 Jun 2024 17:29:04 GMT, Shaojin Wen  wrote:

>> The current versions of FloatToDecimal and DoubleToDecimal allocate 
>> additional objects. Reducing these allocations can improve the performance 
>> of Float/Double.toString and AbstractStringBuilder's append(float/double).
>> 
>> This patch is just a code refactoring to reduce object allocation, but does 
>> not change the Float/Double to decimal algorithm.
>> 
>> The following code comments the allocated objects to be removed.
>> 
>> 
>> class FloatToDecimal {
>> public static String toString(float v) {
>> // allocate object FloatToDecimal
>> return new FloatToDecimal().toDecimalString(v);
>> }
>> 
>> public static Appendable appendTo(float v, Appendable app)
>> throws IOException {
>> // allocate object FloatToDecimal
>> return new FloatToDecimal().appendDecimalTo(v, app);
>> }
>> 
>> private Appendable appendDecimalTo(float v, Appendable app)
>> throws IOException {
>> switch (toDecimal(v)) {
>> case NON_SPECIAL:
>> // allocate object char[]
>> char[] chars = new char[index + 1];
>> for (int i = 0; i < chars.length; ++i) {
>> chars[i] = (char) bytes[i];
>> }
>> if (app instanceof StringBuilder builder) {
>> return builder.append(chars);
>> }
>> if (app instanceof StringBuffer buffer) {
>> return buffer.append(chars);
>> }
>> for (char c : chars) {
>> app.append(c);
>> }
>> return app;
>> // ...
>> }
>> }
>> }
>> 
>> class DoubleToDecimal {
>> public static String toString(double v) {
>> // allocate object DoubleToDecimal
>> return new DoubleToDecimal(false).toDecimalString(v);
>> }
>> 
>> public static Appendable appendTo(double v, Appendable app)
>> throws IOException {
>> // allocate object DoubleToDecimal
>> return new DoubleToDecimal(false).appendDecimalTo(v, app);
>> }
>> 
>> private Appendable appendDecimalTo(double v, Appendable app)
>> throws IOException {
>> switch (toDecimal(v, null)) {
>> case NON_SPECIAL:
>> // allocate object char[]
>> char[] chars = new char[index + 1];
>> for (int i = 0; i < chars.length; ++i) {
>> chars[i] = (char) bytes[i];
>> }
>> ...
>
> Shaojin Wen has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   from @liach: use s.getBytes for performance

src/java.base/share/classes/jdk/internal/math/ToDecimal.java line 47:

> 45: static final int NAN = 5 << 8;
> 46: 
> 47: static final byte LATIN1 = 0;

I think this somewhat unnecessarily copies names and internal implementation 
details from `String` (where the `coder` value is used both as a pseudo-boolean 
and as a shift operand, which is not applicable here). In this context we could 
use something like `private final boolean latin1;` (all uses of `coder` are 
just `if (coder == LATIN1)` so it would be simplified to `if (latin1)`)

-

PR Review Comment: https://git.openjdk.org/jdk/pull/19730#discussion_r1654812101


Re: RFR: 8332522: SwitchBootstraps::mappedEnumLookup constructs unused array

2024-06-26 Thread Claes Redestad
On Wed, 26 Jun 2024 12:32:20 GMT, Jan Lahoda  wrote:

> For general pattern matching switches, the `SwitchBootstraps` class currently 
> generates a cascade of `if`-like statements, computing the correct target 
> case index for the given input.
> 
> There is one special case which permits a relatively easy faster handling, 
> and that is when all the case labels case enum constants (but the switch is 
> still a "pattern matching" switch, as tranditional enum switches do not go 
> through `SwitchBootstraps`). Like:
> 
> 
> enum E {A, B, C}
> E e = ...;
> switch (e) {
>  case null -> {}
>  case A a -> {}
>  case C c -> {}
>  case B b -> {}
> }
> 
> 
> We can create an array mapping the runtime ordinal to the appropriate case 
> number, which is somewhat similar to what javac is doing for ordinary 
> switches over enums.
> 
> The `SwitchBootstraps` class was trying to do so, when the restart index is 
> zero, but failed to do so properly, so that code is not used (and does not 
> actually work properly).
> 
> This patch is trying to fix that - when all the case labels are enum 
> constants, an array mapping the runtime enum ordinals to the case number will 
> be created (lazily), for restart index == 0. And this map will then be used 
> to quickly produce results for the given input. E.g. for the case above, the 
> mapping will be `{0 -> 0, 1 -> 2, 2 -> 1}` (meaning `{A -> 0, B -> 2, C -> 
> 1}`).
> 
> When the restart index is != 0 (i.e. when there's a guard in the switch, and 
> the guard returned `false`), the if cascade will be generated lazily and 
> used, as in the general case. If it would turn out there are significant 
> enum-only switches with guards/restart index != 0, we could improve there as 
> well, by generating separate mappings for every (used) restart index.
> 
> I believe the current tests cover the code functionally sufficiently - see 
> `SwitchBootstrapsTest.testEnums`. It is only that the tests do not (and 
> regression tests cannot easily, I think) differentiate whether the 
> special-case or generic implementation is used.
> 
> I've added a new microbenchmark attempting to demonstrate the difference. 
> There are two benchmarks, both having only enum constants as case labels. 
> One, `enumSwitchTraditional` is an "old" switch, desugared fully by javac, 
> the other, `enumSwitchWithBootstrap` is an equivalent switch that uses the 
> `SwitchBootstraps`. Before this patch, I was getting values like:
> 
> Benchmark   Mode  Cnt   Score   Error  Units
> SwitchEnum.enumSwitchTraditionalavgt   15  11.719 ± 0.333  ns/op
> SwitchEnum.enumSwitchWithBootstrap  avgt   15  24.668 ± 1.037  ...

Marked as reviewed by redestad (Reviewer).

src/java.base/share/classes/java/lang/runtime/SwitchBootstraps.java line 280:

> 278: 
> 279: Class enumClass = invocationType.parameterType(0);
> 280: labels = Stream.of(labels).map(l -> convertEnumConstants(lookup, 
> enumClass, l)).toArray();

You could create `EnumDesc[] enumDescLabels` here and remove the 
`Arrays.copyOf` on line 290. The `labels.clone()` on line 277 also seem 
redundant since we only iterate over the `labels` argument once.

While this case is likely fine I generally recommend using a minimal amount of 
streams/lambdas in bootstrap sensitive code like these dynamic bootstraps 
methods.

-

PR Review: https://git.openjdk.org/jdk/pull/19906#pullrequestreview-2141731748
PR Review Comment: https://git.openjdk.org/jdk/pull/19906#discussion_r1654771207


Re: RFR: 8332522: SwitchBootstraps::mappedEnumLookup constructs unused array

2024-06-26 Thread Claes Redestad
On Wed, 26 Jun 2024 12:32:20 GMT, Jan Lahoda  wrote:

> For general pattern matching switches, the `SwitchBootstraps` class currently 
> generates a cascade of `if`-like statements, computing the correct target 
> case index for the given input.
> 
> There is one special case which permits a relatively easy faster handling, 
> and that is when all the case labels case enum constants (but the switch is 
> still a "pattern matching" switch, as tranditional enum switches do not go 
> through `SwitchBootstraps`). Like:
> 
> 
> enum E {A, B, C}
> E e = ...;
> switch (e) {
>  case null -> {}
>  case A a -> {}
>  case C c -> {}
>  case B b -> {}
> }
> 
> 
> We can create an array mapping the runtime ordinal to the appropriate case 
> number, which is somewhat similar to what javac is doing for ordinary 
> switches over enums.
> 
> The `SwitchBootstraps` class was trying to do so, when the restart index is 
> zero, but failed to do so properly, so that code is not used (and does not 
> actually work properly).
> 
> This patch is trying to fix that - when all the case labels are enum 
> constants, an array mapping the runtime enum ordinals to the case number will 
> be created (lazily), for restart index == 0. And this map will then be used 
> to quickly produce results for the given input. E.g. for the case above, the 
> mapping will be `{0 -> 0, 1 -> 2, 2 -> 1}` (meaning `{A -> 0, B -> 2, C -> 
> 1}`).
> 
> When the restart index is != 0 (i.e. when there's a guard in the switch, and 
> the guard returned `false`), the if cascade will be generated lazily and 
> used, as in the general case. If it would turn out there are significant 
> enum-only switches with guards/restart index != 0, we could improve there as 
> well, by generating separate mappings for every (used) restart index.
> 
> I believe the current tests cover the code functionally sufficiently - see 
> `SwitchBootstrapsTest.testEnums`. It is only that the tests do not (and 
> regression tests cannot easily, I think) differentiate whether the 
> special-case or generic implementation is used.
> 
> I've added a new microbenchmark attempting to demonstrate the difference. 
> There are two benchmarks, both having only enum constants as case labels. 
> One, `enumSwitchTraditional` is an "old" switch, desugared fully by javac, 
> the other, `enumSwitchWithBootstrap` is an equivalent switch that uses the 
> `SwitchBootstraps`. Before this patch, I was getting values like:
> 
> Benchmark   Mode  Cnt   Score   Error  Units
> SwitchEnum.enumSwitchTraditionalavgt   15  11.719 ± 0.333  ns/op
> SwitchEnum.enumSwitchWithBootstrap  avgt   15  24.668 ± 1.037  ...

test/micro/org/openjdk/bench/java/lang/runtime/SwitchEnum.java line 57:

> 55: for (E e : inputs) {
> 56: sum += switch (e) {
> 57: case null -> -1;

As this `null` case adds a case relative to the `-Traditional` test then maybe 
removing one of the `E0, E1, ...` cases would make the test a little bit more 
apples-to-apples?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/19906#discussion_r1654753822


Re: RFR: 8334437: De-duplicate ProxyMethod list creation

2024-06-26 Thread Claes Redestad
On Tue, 18 Jun 2024 07:41:26 GMT, Claes Redestad  wrote:

> Simple refactoring to extract identical, simple lambda expressions. Improve 
> code clarity and reduce classes generated.

Friendly reminder that this trivial improvement needs a reviewer.

-

PR Comment: https://git.openjdk.org/jdk/pull/19760#issuecomment-2191352440


Re: RFR: 8333265: De-duplicate method references in java.util.stream.FindOps [v3]

2024-06-26 Thread Claes Redestad
On Tue, 18 Jun 2024 10:00:46 GMT, Claes Redestad  wrote:

>> Extracting duplicate method references to static field reduce proxy class 
>> spinning and loading. In this case 2 less classes loaded when using 
>> `findAny()` on each type of stream.
>
> Claes Redestad has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Fixes

Friendly reminder that this improvement needs a Reviewer. While it'd be great 
if javac can make this redundant this is adding overhead in the meantime, and 
the temporary workaround doesn't obfuscate the code.

-

PR Comment: https://git.openjdk.org/jdk/pull/19477#issuecomment-2191356136


Re: RFR: 8334872: BigEndian: java/lang/invoke/condy Tests failing since JDK-8294960

2024-06-25 Thread Claes Redestad
On Mon, 24 Jun 2024 16:01:41 GMT, Adam Sotona  wrote:

> After JDK-8294960 is java.lang.invoke.ClassSpecializer using lamdas for code 
> generation and unfortunately it causes StackOverflow on BigEndian platforms.
> 
> This patch converts all lambdas in ClassSpecializer into anonymous inner 
> classes.
> 
> Please review and test on a BigEndian platform.
> 
> Thanks,
> Adam

Looks like there is a bootstrap cycle initializing this lambda in ClassBuilder: 
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/classfile/ClassBuilder.java#L202

I'm not sure, but it looks like the PPC VM might be passing args to the 
bootstrap method differently, which triggers adapting code in 
BootstrapMethodInvoker::pushMePullYou - this might be enough to start this 
cycle.

-

PR Comment: https://git.openjdk.org/jdk/pull/19863#issuecomment-2188298757


Re: RFR: 8334872: BigEndian: java/lang/invoke/condy Tests failing since JDK-8294960

2024-06-24 Thread Claes Redestad
On Mon, 24 Jun 2024 16:01:41 GMT, Adam Sotona  wrote:

> After JDK-8294960 is java.lang.invoke.ClassSpecializer using lamdas for code 
> generation and unfortunately it causes StackOverflow on BigEndian platforms.
> 
> This patch converts all lambdas in ClassSpecializer into anonymous inner 
> classes.
> 
> Please review and test on a BigEndian platform.
> 
> Thanks,
> Adam

Marked as reviewed by redestad (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/19863#pullrequestreview-2136378062


Re: RFR: 8291966: SwitchBootstrap.typeSwitch could be faster [v3]

2024-06-22 Thread Claes Redestad
On Wed, 3 Apr 2024 16:17:57 GMT, ExE Boss  wrote:

>> Jan Lahoda has updated the pull request with a new target base due to a 
>> merge or a rebase. The pull request now contains six commits:
>> 
>>  - Reflecting review feedback.
>>  - Merge branch 'master' into JDK-8291966
>>  - Adding comments
>>  - Improving performance
>>  - Merge branch 'master' into JDK-8291966
>>  - 8291966: SwitchBootstrap.typeSwitch could be faster
>
> src/java.base/share/classes/java/lang/runtime/SwitchBootstraps.java line 389:
> 
>> 387: }
>> 388: }
>> 389: return enumMap.map[value.ordinal()];
> 
> `enumMap.map` never gets set before this line.

There's a bug filed for this already: 
https://bugs.openjdk.org/browse/JDK-8332522 

@lahodaj explained that this broken code is part of an optimization which is 
never attempted (IIRC due to the bug you noted on line 327). JDK-833522 seem 
like a good place to continue this conversation..?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/9779#discussion_r1649761928


Re: RFR: 8333893: Optimization for StringBuilder append boolean & null [v4]

2024-06-20 Thread Claes Redestad
On Thu, 13 Jun 2024 07:36:34 GMT, Emanuel Peter  wrote:

>> Thanks to @eme64 's patience and help, I found a way to use MergeStore 
>> without doing boundary checking.
>> 
>> It would be even better if Unsafe.putChar could be used for MergeStore in 
>> the future.
>> 
>> ## 1. JavaCode
>> 
>> class StringUTF16 {
>> public static void putCharsAt(byte[] value, int i, char c1, char c2, 
>> char c3, char c4) {
>> // Don't use the putChar method, Its instrinsic will cause C2 unable 
>> to combining values into larger stores.
>> putChar1(value, i, c1);
>> putChar1(value, i + 1, c2);
>> putChar1(value, i + 2, c3);
>> putChar1(value, i + 3, c4);
>> }
>> 
>> public static void putCharsAt(byte[] value, int i, char c1, char c2, 
>> char c3, char c4, char c5) {
>> // Don't use the putChar method, Its instrinsic will cause C2 unable 
>> to combining values into larger stores.
>> putChar1(value, i, c1);
>> putChar1(value, i + 1, c2);
>> putChar1(value, i + 2, c3);
>> putChar1(value, i + 3, c4);
>> putChar1(value, i + 4, c5);
>> }
>> 
>> static void putChar1(byte[] value, int i, char c) {
>> int address = Unsafe.ARRAY_BYTE_BASE_OFFSET + (i << 1);
>> Unsafe.getUnsafe().putByte(value, address, (byte)(c >> 
>> HI_BYTE_SHIFT));
>> Unsafe.getUnsafe().putByte(value, address + 1, (byte)(c >> 
>> LO_BYTE_SHIFT));
>> }
>> }
>> 
>> 
>> ## 2. Benchmark Numbers
>> The performance numbers under MacBookPro M1 Max are as follows:
>> 
>> -Benchmark   Mode  Cnt  Score   Error  Units
>> -StringBuilders.toStringCharWithNull8Latin1  avgt   15  7.073 ? 0.010  ns/op
>> -StringBuilders.toStringCharWithNull8Utf16   avgt   15  9.298 ? 0.015  ns/op
>> -StringBuilders.toStringCharWithBool8Latin1  avgt   15  7.387 ? 0.021  ns/op
>> -StringBuilders.toStringCharWithBool8Utf16   avgt   15  9.615 ? 0.011  ns/op
>> 
>> +Benchmark   Mode  Cnt  Score   Error  Units
>> +StringBuilders.toStringCharWithNull8Latin1  avgt   15  5.628 ? 0.017  ns/op 
>> +25.67%
>> +StringBuilders.toStringCharWithNull8Utf16   avgt   15  6.873 ? 0.026  ns/op 
>> +35.28%
>> +StringBuilders.toStringCharWithBool8Latin1  avgt   15  6.480 ? 0.077  ns/op 
>> +13.99%
>> +StringBuilders.toStringCharWithBool8Utf16   avgt   15  7.456 ? 0.059  ns/op 
>> +28.95%
>> 
>> 
>> ## 3.  TraceMergeStores
>> 
>> [TraceMergeStores]: Replace
>>  65  StoreB  === 54 7 64 12  [[ 87 ]]  @byte[int:>=0] 
>> (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=4; unsafe  
>> Memory: @byte[int:>=0] (java/la...
>
> @wenshao I'm glad we figured it out a bit, and I hope you learned a thing or 
> two :)
> 
>> It would be even better if Unsafe.putChar could be used for MergeStore in 
>> the future.
> 
> The `_putCharStringU` intrinsic uses 
> `LibraryCallKit::inline_string_char_access`. It seems to generate IR nodes, I 
> speculate it could be `StoreC`. We'd have to do more digging to see why that 
> pattern is not accepted by `MergeStore`.
> 
> @wenshao I'm not sure if everything is right with the bounds checking, but I 
> leave this to the library folks to review. What you will definately need is 
> benchmarking not just on `M1`, but also `x86-64` and other `aarch64` 
> architectures.

I'm not opposed to accepting this patch as-is, but I think we should do so with 
an eye towards reverting if we figure out a way to improve the `putChar` 
intrinsic so that it doesn't block merge store optimization. What do you think 
@eme64?

As the need for the changes in 
[b5ad8e7](https://github.com/openjdk/jdk/pull/19626/commits/b5ad8e70928c547d134d0e4a532441cad9a7e4a2)
 showed added code complexity (like here) can be detrimental to performance, 
and if the `putChar` can be improved we might see benefits in more places.

-

PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2180308686


Re: RFR: 8333893: Optimization for StringBuilder append boolean & null [v12]

2024-06-20 Thread Claes Redestad
On Mon, 17 Jun 2024 05:53:45 GMT, Shaojin Wen  wrote:

>> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into 
>> primitive arrays by combining values ​​into larger stores.
>> 
>> This PR rewrites the code of appendNull and append(boolean) methods so that 
>> these two methods can be optimized by C2.
>
> Shaojin Wen has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Utf16 case remove `append first utf16 char`

src/java.base/share/classes/java/lang/StringLatin1.java line 832:

> 830: // Don't use the putChar method, Its instrinsic will cause C2 
> unable to combining values into larger stores.
> 831: long address  = Unsafe.ARRAY_BYTE_BASE_OFFSET + index;
> 832: Unsafe UNSAFE = Unsafe.getUnsafe();

Perhaps better to put in a `private static final` field

-

PR Review Comment: https://git.openjdk.org/jdk/pull/19626#discussion_r1647313459


Re: RFR: 8294960: Convert java.base/java.lang.invoke package to use the Classfile API to generate lambdas and method handles [v25]

2024-06-19 Thread Claes Redestad
On Wed, 19 Jun 2024 09:08:35 GMT, Adam Sotona  wrote:

>> java.base java.lang.invoke package heavily uses ASM to generate lambdas and 
>> method handles.
>> 
>> This patch converts ASM calls to Classfile API.
>> 
>> This PR is continuation of https://github.com/openjdk/jdk/pull/12945
>> 
>> Any comments and suggestions are welcome.
>> 
>> Please review.
>> 
>> Thank you,
>> Adam
>
> Adam Sotona has updated the pull request incrementally with two additional 
> commits since the last revision:
> 
>  - removed empty line
>  - problem-listed runtime/ClassInitErrors/TestStackOverflowDuringInit.java

No objections as long as there's an understanding that we'll need to work on 
some of the startup/warmup regressions this will cause.

We have preliminary data on a subset of benchmarks which suggests it's a 
combination of loading more classes and spreading the work across a larger set 
of methods, which means we need to warm and JIT more methods to reach peak 
performance. ASM is a smaller library with a more minimalist approach, so it 
loads faster and gets up and running a bit faster. We'll probably need to file 
regressions as they are detected for tracking purposes, but I expect we'll 
continue chipping away at and improving the classfile API in the months to come.

-

Marked as reviewed by redestad (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/17108#pullrequestreview-2127764187


Re: RFR: 8333854: IllegalAccessError with proxies after JDK-8332457 [v5]

2024-06-18 Thread Claes Redestad
On Thu, 13 Jun 2024 14:11:48 GMT, Chen Liang  wrote:

>> Please review this patch that fixes a critical issue that breaks some Proxy 
>> usages.
>> 
>> Since the problematic patch from before cannot be backed out, this patch 
>> aims to emulate the old behavior before. A diff between before the 
>> problematic patch and this patch is available at 
>> https://gist.github.com/7565b2091008f561eb0ada019bc5e517, generated by 
>> running `git diff 326dbb1b139dd1ec1b8605339b91697cdf49da9a -- 
>> src/java.base/share/classes/java/lang/reflect/ProxyGenerator.java`.
>
> Chen Liang has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   methodField can be initialized eagerly

Marked as reviewed by redestad (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/19615#pullrequestreview-2125576655


Re: RFR: 8294960: Convert java.base/java.lang.invoke package to use the Classfile API to generate lambdas and method handles [v21]

2024-06-18 Thread Claes Redestad
On Tue, 18 Jun 2024 13:22:45 GMT, Adam Sotona  wrote:

>> java.base java.lang.invoke package heavily uses ASM to generate lambdas and 
>> method handles.
>> 
>> This patch converts ASM calls to Classfile API.
>> 
>> This PR is continuation of https://github.com/openjdk/jdk/pull/12945
>> 
>> Any comments and suggestions are welcome.
>> 
>> Please review.
>> 
>> Thank you,
>> Adam
>
> Adam Sotona has updated the pull request incrementally with four additional 
> commits since the last revision:
> 
>  - Merge pull request #8 from cl4es/serialization_hostile
>
>SerializationHostileMethod
>  - Reduce gratuitous code movement
>  - Inline Consumer into generateSer.. method, move seldom-used 
> serialization support constants to new holder
>  - SerializationHostileMethod

I've done a final pass over this. Code changes overall look good - great even - 
and the only caveat is that there's going to be some added startup overhead in 
some apps from integrating this. A minimal app with a lambda expression might 
see a 2-3ms hit, for example.

While there are ideas on how to trim down such overheads further I think this 
is a good time to draw a line in the sand and get this change integrated and 
exposed to a larger set of testing.

-

Marked as reviewed by redestad (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/17108#pullrequestreview-2125569570


Re: RFR: 8294960: Convert java.base/java.lang.invoke package to use the Classfile API to generate lambdas and method handles [v21]

2024-06-18 Thread Claes Redestad
On Tue, 18 Jun 2024 13:22:45 GMT, Adam Sotona  wrote:

>> java.base java.lang.invoke package heavily uses ASM to generate lambdas and 
>> method handles.
>> 
>> This patch converts ASM calls to Classfile API.
>> 
>> This PR is continuation of https://github.com/openjdk/jdk/pull/12945
>> 
>> Any comments and suggestions are welcome.
>> 
>> Please review.
>> 
>> Thank you,
>> Adam
>
> Adam Sotona has updated the pull request incrementally with four additional 
> commits since the last revision:
> 
>  - Merge pull request #8 from cl4es/serialization_hostile
>
>SerializationHostileMethod
>  - Reduce gratuitous code movement
>  - Inline Consumer into generateSer.. method, move seldom-used 
> serialization support constants to new holder
>  - SerializationHostileMethod

src/java.base/share/classes/java/lang/invoke/InnerClassLambdaMetafactory.java 
line 108:

> 106: 
> 107: // condy to load implMethod from class data
> 108: implMethodCondy = DynamicConstantDesc.ofNamed(BSM_CLASS_DATA, 
> DEFAULT_NAME, CD_MethodHandle);

Pre-existing tiny wart, but this one seem to be used only exceptionally (see 
the comment/code around line 183) so it's probably better to inline the code at 
the usage site rather than have a constant.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17108#discussion_r1644473614


Re: RFR: 8294960: Convert java.base/java.lang.invoke package to use the Classfile API to generate lambdas and method handles [v18]

2024-06-18 Thread Claes Redestad
On Mon, 17 Jun 2024 11:21:37 GMT, Adam Sotona  wrote:

>> LMF will BMH and `ClassSpecializer`, but does not call into this code 
>> normally as the simple bound method handle needed by LMF is statically 
>> defined (`BoundMethodHandle.Species_L`). To be perfectly safe - and to keep 
>> lambda/indy/condy bootstrap overheads to a minumum - I'll continue 
>> recommending the desugaring of lambdas in java.lang.invoke
>
> It results in not very nice code, but makes sense.

I'll note that replacing a lambda with an anonymous class has a very marginal 
effect on startup when amortizing the cost of setting up the infrastructure for 
spinning classes. What can have a decent effect is if we can replace a few 
lambdas/anon classes with more imperative code that spins up/loads fewer 
classes, e.g., desugaring streams to for loops.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17108#discussion_r1644202229


Re: RFR: 8333265: De-duplicate method references in java.util.stream.FindOps [v3]

2024-06-18 Thread Claes Redestad
> Extracting duplicate method references to static field reduce proxy class 
> spinning and loading. In this case 2 less classes loaded when using 
> `findAny()` on each type of stream.

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Fixes

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/19477/files
  - new: https://git.openjdk.org/jdk/pull/19477/files/a223e608..d83a1c96

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=19477=02
 - incr: https://webrevs.openjdk.org/?repo=jdk=19477=01-02

  Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/19477.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19477/head:pull/19477

PR: https://git.openjdk.org/jdk/pull/19477


Re: RFR: 8333265: De-duplicate method references in java.util.stream.FindOps [v2]

2024-06-18 Thread Claes Redestad
> Extracting duplicate method references to static field reduce proxy class 
> spinning and loading. In this case 2 less classes loaded when using 
> `findAny()` on each type of stream.

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  @ExE-Boss review

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/19477/files
  - new: https://git.openjdk.org/jdk/pull/19477/files/cff6d034..a223e608

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=19477=01
 - incr: https://webrevs.openjdk.org/?repo=jdk=19477=00-01

  Stats: 40 lines in 1 file changed: 11 ins; 0 del; 29 mod
  Patch: https://git.openjdk.org/jdk/pull/19477.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19477/head:pull/19477

PR: https://git.openjdk.org/jdk/pull/19477


RFR: 8334437: De-duplicate ProxyMethod list creation

2024-06-18 Thread Claes Redestad
Simple refactoring to extract identical, simple lambda expressions. Improve 
code clarity and reduce classes generated.

-

Commit messages:
 - Simplify
 - Refactor ProxyMethod list creation

Changes: https://git.openjdk.org/jdk/pull/19760/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk=19760=00
  Issue: https://bugs.openjdk.org/browse/JDK-8334437
  Stats: 10 lines in 1 file changed: 4 ins; 4 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/19760.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19760/head:pull/19760

PR: https://git.openjdk.org/jdk/pull/19760


Re: RFR: 8334328: Reduce object allocation for FloatToDecimal and DoubleToDecimal [v8]

2024-06-16 Thread Claes Redestad
On Sat, 15 Jun 2024 13:27:41 GMT, Shaojin Wen  wrote:

>> The current versions of FloatToDecimal and DoubleToDecimal allocate 
>> additional objects. Reducing these allocations can improve the performance 
>> of Float/Double.toString and AbstractStringBuilder's append(float/double).
>> 
>> This patch is just a code refactoring to reduce object allocation, but does 
>> not change the Float/Double to decimal algorithm.
>> 
>> The following code comments the allocated objects to be removed.
>> 
>> 
>> class FloatToDecimal {
>> public static String toString(float v) {
>> // allocate object FloatToDecimal
>> return new FloatToDecimal().toDecimalString(v);
>> }
>> 
>> public static Appendable appendTo(float v, Appendable app)
>> throws IOException {
>> // allocate object FloatToDecimal
>> return new FloatToDecimal().appendDecimalTo(v, app);
>> }
>> 
>> private Appendable appendDecimalTo(float v, Appendable app)
>> throws IOException {
>> switch (toDecimal(v)) {
>> case NON_SPECIAL:
>> // allocate object char[]
>> char[] chars = new char[index + 1];
>> for (int i = 0; i < chars.length; ++i) {
>> chars[i] = (char) bytes[i];
>> }
>> if (app instanceof StringBuilder builder) {
>> return builder.append(chars);
>> }
>> if (app instanceof StringBuffer buffer) {
>> return buffer.append(chars);
>> }
>> for (char c : chars) {
>> app.append(c);
>> }
>> return app;
>> // ...
>> }
>> }
>> }
>> 
>> class DoubleToDecimal {
>> public static String toString(double v) {
>> // allocate object DoubleToDecimal
>> return new DoubleToDecimal(false).toDecimalString(v);
>> }
>> 
>> public static Appendable appendTo(double v, Appendable app)
>> throws IOException {
>> // allocate object DoubleToDecimal
>> return new DoubleToDecimal(false).appendDecimalTo(v, app);
>> }
>> 
>> private Appendable appendDecimalTo(double v, Appendable app)
>> throws IOException {
>> switch (toDecimal(v, null)) {
>> case NON_SPECIAL:
>> // allocate object char[]
>> char[] chars = new char[index + 1];
>> for (int i = 0; i < chars.length; ++i) {
>> chars[i] = (char) bytes[i];
>> }
>> ...
>
> Shaojin Wen has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   code format

Just suggesting some improvements

src/java.base/share/classes/jdk/internal/math/DoubleToDecimal.java line 185:

> 183:  * Returns
> 184:  * Combine type and size, the first byte is size, the second 
> byte is type
> 185:  *

Suggestion:

 *  Returns size in the lower byte, type in the high byte, where type is

src/java.base/share/classes/jdk/internal/math/DoubleToDecimal.java line 190:

> 188:  * PLUS_INFiff v is POSITIVE_INFINITY
> 189:  * MINUS_INF   iff v is NEGATIVE_INFINITY
> 190:  * NAN iff v is NaN

Suggestion:

 * NAN iff v is NaN
 * otherwise NON_SPECIAL

src/java.base/share/classes/jdk/internal/math/ToDecimal.java line 169:

> 167: }
> 168: 
> 169: /* Using the deprecated constructor enhances performance */

Enhances performance over what, `new String(str, 0, index, 
StandardCharsets.US_ASCII)`? Please specify.

src/java.base/share/classes/jdk/internal/math/ToDecimal.java line 171:

> 169: /* Using the deprecated constructor enhances performance */
> 170: @SuppressWarnings("deprecation")
> 171: static String charsToString(byte[] str, int index) {

This should be named something like `asciiBytesToString` now. Perhaps 
`ToDecimal.LATIN1` can be renamed `ASCII`, too.

test/micro/org/openjdk/bench/java/lang/StringBuilders.java line 244:

> 242: 
> 243: @Benchmark
> 244: public int appendWithFloat8Latin1() {

It would be interesting with a mixed micro to explore effects of increased 
number of implementors (looping over an array with `sbLatin1` and `sbUtf16`, 
and appending both floats and doubles in the same loop) 

`buf.delete(0, buf.length())` could be replaced with `buf.setLength(0)` in 
these. Might reduce a bit of overhead and focus micros .

-

Changes requested by redestad (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19730#pullrequestreview-2121584681
PR Review Comment: https://git.openjdk.org/jdk/pull/19730#discussion_r1641984296
PR Review Comment: https://git.openjdk.org/jdk/pull/19730#discussion_r1641984363
PR Review Comment: https://git.openjdk.org/jdk/pull/19730#discussion_r1641979604
PR Review Comment: 

Re: RFR: 8333893: Optimization for StringBuilder append boolean & null [v8]

2024-06-13 Thread Claes Redestad
On Thu, 13 Jun 2024 02:06:42 GMT, Shaojin Wen  wrote:

>> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into 
>> primitive arrays by combining values ​​into larger stores.
>> 
>> This PR rewrites the code of appendNull and append(boolean) methods so that 
>> these two methods can be optimized by C2.
>
> Shaojin Wen has updated the pull request incrementally with two additional 
> commits since the last revision:
> 
>  - rename benchmark
>  - code format & use long address

Replacing `StringUTF16.putChar` with an `Unsafe.putByte` construct seems safe 
since the former already depends on bounds checks having been done elsewhere. 
Replacing `byte[]` stores with `Unsafe.putByte` requires a thorough examination 
as that would drop implicit bounds checks.  

On my M1 laptop I get similar numbers:

Name Cnt  Base   Error   Test   Error  Unit  
Change
StringBuilders.appendWithBool8Latin1  15 7,500 ± 0,072  6,112 ± 0,049 ns/op   
1,23x (p = 0,000*)
StringBuilders.appendWithBool8Utf16   15 9,650 ± 0,021  7,304 ± 0,047 ns/op   
1,32x (p = 0,000*)
StringBuilders.appendWithNull8Latin1  15 7,118 ± 0,033  5,158 ± 0,062 ns/op   
1,38x (p = 0,000*)
StringBuilders.appendWithNull8Utf16   15 9,336 ± 0,083  6,870 ± 0,069 ns/op   
1,36x (p = 0,000*)
  * = significant


On a linux-x64 workstation some of the micros regress, though:

Name Cnt   BaseErrorTest   Error  Unit  
Change
StringBuilders.appendWithBool8Latin1  15 10.781 ±  0.236  16.279 ± 1.087 ns/op  
 0.66x (p = 0.000*)
StringBuilders.appendWithBool8Utf16   15 22.942 ±  0.405  23.733 ± 0.792 ns/op  
 0.97x (p = 0.001*)
StringBuilders.appendWithNull8Latin1  15 24.313 ± 10.479  24.397 ± 7.483 ns/op  
 1.00x (p = 0.979 )
StringBuilders.appendWithNull8Utf16   15 38.704 ±  5.972  27.542 ± 5.620 ns/op  
 1.41x (p = 0.000*)
  * = significant


`-XX:+TraceMergeStores` seem to indicate the merging happens as it should in 
each case:


[TraceMergeStores]: Replace
 2493  StoreB  === 1387 5370 2980 2939  [[ 1962 ]]  @byte[int:>=0] 
(java/lang/Cloneable,java/io/Serializable):exact+any *, idx=13; unsafe  Memory: 
@byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact[2] *, 
idx=13; !jvms: StringLatin1::putCharsAt @ bci:50 (line 834) 
AbstractStringBuilder::appendNull @ bci:34 (line 642) 
AbstractStringBuilder::append @ bci:5 (line 587) StringBuilder::append @ bci:2 
(line 179) StringBuilders::appendWithNull8Latin1 @ bci:38 (line 267) 
StringBuilders_appendWithNull8Latin1_jmhTest::appendWithNull8Latin1_avgt_jmhStub
 @ bci:17 (line 190)
 1962  StoreB  === 1387 2493 2494 2442  [[ 1388 ]]  @byte[int:>=0] 
(java/lang/Cloneable,java/io/Serializable):exact+any *, idx=13; unsafe  Memory: 
@byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact[2] *, 
idx=13; !jvms: StringLatin1::putCharsAt @ bci:62 (line 835) 
AbstractStringBuilder::appendNull @ bci:34 (line 642) 
AbstractStringBuilder::append @ bci:5 (line 587) StringBuilder::append @ bci:2 
(line 179) StringBuilders::appendWithNull8Latin1 @ bci:38 (line 267) 
StringBuilders_appendWithNull8Latin1_jmhTest::appendWithNull8Latin1_avgt_jmhStub
 @ bci:17 (line 190)
 1388  StoreB  === 1387 1962 1963 1341  [[ 833 ]]  @byte[int:>=0] 
(java/lang/Cloneable,java/io/Serializable):exact+any *, idx=13; unsafe  Memory: 
@byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact[2] *, 
idx=13; !jvms: StringLatin1::putCharsAt @ bci:77 (line 836) 
AbstractStringBuilder::appendNull @ bci:34 (line 642) 
AbstractStringBuilder::append @ bci:5 (line 587) StringBuilder::append @ bci:2 
(line 179) StringBuilders::appendWithNull8Latin1 @ bci:38 (line 267) 
StringBuilders_appendWithNull8Latin1_jmhTest::appendWithNull8Latin1_avgt_jmhStub
 @ bci:17 (line 190)
  833  StoreB  === 1387 1388 1389 1341  [[ 5311 425 1400 ]]  @byte[int:>=0] 
(java/lang/Cloneable,java/io/Serializable):exact+any *, idx=13; unsafe  Memory: 
@byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact[2] *, 
idx=13; !jvms: StringLatin1::putCharsAt @ bci:92 (line 837) 
AbstractStringBuilder::appendNull @ bci:34 (line 642) 
AbstractStringBuilder::append @ bci:5 (line 587) StringBuilder::append @ bci:2 
(line 179) StringBuilders::appendWithNull8Latin1 @ bci:38 (line 267) 
StringBuilders_appendWithNull8Latin1_jmhTest::appendWithNull8Latin1_avgt_jmhStub
 @ bci:17 (line 190)
[TraceMergeStores]: with
 6207  ConI  === 0  [[ 6208 6210 ]]  #int:1819047278
 6210  StoreI  === 1387 5370 2980 6207  [[ ]]  @byte[int:>=0] 
(java/lang/Cloneable,java/io/Serializable):NotNull:exact[2] *, idx=13; 
mismatched  Memory: @byte[int:>=0] 
(java/lang/Cloneable,java/io/Serializable):NotNull:exact[2] *, idx=13;

-

PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2165167049


Re: RFR: 8294960: Convert java.base/java.lang.invoke package to use the Classfile API to generate lambdas and method handles [v18]

2024-06-13 Thread Claes Redestad
On Thu, 6 Jun 2024 11:41:05 GMT, Chen Liang  wrote:

>> Adam Sotona has updated the pull request incrementally with two additional 
>> commits since the last revision:
>> 
>>  - reverted static initialization of ConstantPoolBuilder and CP entries
>>  - fixed naming conventions
>
> src/java.base/share/classes/java/lang/invoke/ClassSpecializer.java line 616:
> 
>> 614: final ClassDesc classDesc = ClassDesc.of(className0);
>> 615: final ClassDesc superClassDesc = 
>> classDesc(speciesData.deriveSuperClass());
>> 616: return ClassFile.of().build(classDesc, clb -> {
> 
> We need a confirmation here that these BMH species are only initialized after 
> LMF is ready. @cl4es Is a dependency on LMF from BMH safe?

LMF will BMH and `ClassSpecializer`, but does not call into this code normally 
as the simple bound method handle needed by LMF is statically defined 
(`BoundMethodHandle.Species_L`). To be perfectly safe - and to keep 
lambda/indy/condy bootstrap overheads to a minumum - I'll continue recommending 
the desugaring of lambdas in java.lang.invoke

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17108#discussion_r1637893254


Re: RFR: 8333854: IllegalAccessError with proxies after JDK-8332457 [v4]

2024-06-13 Thread Claes Redestad
On Wed, 12 Jun 2024 22:00:45 GMT, Chen Liang  wrote:

>> Please review this patch that fixes a critical issue that breaks some Proxy 
>> usages.
>> 
>> Since the problematic patch from before cannot be backed out, this patch 
>> aims to emulate the old behavior before. A diff between before the 
>> problematic patch and this patch is available at 
>> https://gist.github.com/7565b2091008f561eb0ada019bc5e517, generated by 
>> running `git diff 326dbb1b139dd1ec1b8605339b91697cdf49da9a -- 
>> src/java.base/share/classes/java/lang/reflect/ProxyGenerator.java`.
>
> Chen Liang has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   More rename and code style cleanup

I think this looks better as you're reverting the problematic changes while 
retaining most of the benefits of JDK-8332457.

src/java.base/share/classes/java/lang/reflect/ProxyGenerator.java line 676:

> 674:  * @param method  The method for which to create a proxy
> 675:  */
> 676: private ProxyMethod(Method method, String methodFieldName) {

Could we pass the `FieldRefEntry` directly? I don't see when we'd create a 
`ProxyMethod` without later calling `methodField()`. I think this would also 
remove the need to make `ProxyMethod` non-static.

-

Marked as reviewed by redestad (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19615#pullrequestreview-2115128124
PR Review Comment: https://git.openjdk.org/jdk/pull/19615#discussion_r1637805452


Re: RFR: 8333854: IllegalAccessError with proxies after JDK-8332457 [v2]

2024-06-12 Thread Claes Redestad
On Mon, 10 Jun 2024 04:08:41 GMT, Chen Liang  wrote:

>> Please review this patch that fixes a critical issue that breaks some Proxy 
>> usages.
>> 
>> CONSTANT_Class and CONSTANT_MethodType must fail resolution for inaccessible 
>> package-private types per JVMS 5.4.3.1 and 5.4.3.5, yet such types can 
>> appear in method descriptors in the same class. The proposed way to bypass 
>> is to store the MethodType as its descriptor string in the bootstrap method 
>> arguments, and use MethodType.fromMethodDescriptorString to restore the 
>> arguments, which does not have this restriction and does not eagerly load 
>> the parameter classes. This case isn't covered by existing tests, so a new 
>> test has been added.
>
> Chen Liang has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Obtain classloader in security manager friendly code path

Agree with Roger that static imports should be used sparingly (possibly never 
with .*). 

If other reviewers are happy with the semantic shift in when/where class 
loading can/will happen and which class loader will be used then I won't 
object. I'm just not enough of an expert in this area to tell if there might be 
unintended side-effects we need to account for.

The added cost of spinning and calling these clinits means we might be better 
off reverting from a startup overhead perspective. JDK-8332457 was already a 
regression for some usage patterns.

-

PR Comment: https://git.openjdk.org/jdk/pull/19615#issuecomment-2163123616


Re: RFR: 8333893: Optimization for StringBuilder append boolean & null [v4]

2024-06-12 Thread Claes Redestad
On Wed, 12 Jun 2024 13:38:21 GMT, Shaojin Wen  wrote:

> In the AbstractStringBuilder#appendNull method, is it possible to not check 
> the bounds based on the information from ensureCapacityInternal?

Perhaps you'd need something like a `Preconditions.checkIndex` in or before the 
4-char `StringUTF16.putCharsAt` call to help eliminate excess bounds checks. 
Might work, but would be fragile and probably not easy to get right in all 
places.

-

PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2163093085


Re: RFR: 8333893: Optimization for StringBuilder append boolean & null [v4]

2024-06-12 Thread Claes Redestad
On Tue, 11 Jun 2024 11:35:28 GMT, Shaojin Wen  wrote:

>> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into 
>> primitive arrays by combining values ​​into larger stores.
>> 
>> This PR rewrites the code of appendNull and append(boolean) methods so that 
>> these two methods can be optimized by C2.
>
> Shaojin Wen has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   revert

Rewriting the intrinsic to emit nodes that don't trip up the MergeStore pass 
seem like a reasonable thing to reach for, then. Perhaps some of these cases 
could be helped by a `@ForceInline` or two to get the inlining necessary to 
allow the JIT to eliminate all but the outer bounds checks, but that is 
probably too fragile in this area given how performance sensitive many of these 
methods are.

-

PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2163065760


Re: RFR: 8333893: Optimization for StringBuilder append boolean & null [v4]

2024-06-12 Thread Claes Redestad
On Tue, 11 Jun 2024 11:35:28 GMT, Shaojin Wen  wrote:

>> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into 
>> primitive arrays by combining values ​​into larger stores.
>> 
>> This PR rewrites the code of appendNull and append(boolean) methods so that 
>> these two methods can be optimized by C2.
>
> Shaojin Wen has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   revert

Agreed we should look at obsoleting the `_putCharStringU` intrinsic. This is 
best done separately. If performance with the intrinsic disabled looks good 
here then I think this PR is good to go, even if temporarily subpar on the 
UTF16 tests. The microbenchmark will then serve as an extra verification step 
to show that disabling/removing the intrinsic is beneficial.

-

PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2162962297


Re: RFR: 8333893: Optimization for StringBuilder append boolean & null [v4]

2024-06-12 Thread Claes Redestad
On Tue, 11 Jun 2024 11:35:28 GMT, Shaojin Wen  wrote:

>> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into 
>> primitive arrays by combining values ​​into larger stores.
>> 
>> This PR rewrites the code of appendNull and append(boolean) methods so that 
>> these two methods can be optimized by C2.
>
> Shaojin Wen has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   revert

Good, disabling the intrinsic appears to enable the MergeStore optimization, 
and it looks like inlining happens here to get to 8-byte stores. What 
performance do you get now with that?

-

PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2162936067


Re: RFR: 8333893: Optimization for StringBuilder append boolean & null [v4]

2024-06-12 Thread Claes Redestad
On Tue, 11 Jun 2024 11:35:28 GMT, Shaojin Wen  wrote:

>> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into 
>> primitive arrays by combining values ​​into larger stores.
>> 
>> This PR rewrites the code of appendNull and append(boolean) methods so that 
>> these two methods can be optimized by C2.
>
> Shaojin Wen has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   revert

And what happens when you run that last thing with 
`-XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_putCharStringU`?  I 
believe the intrinsic is basically a handcrafted merge optimization, and might 
very well be obsolete with the MergeStores optimization in place.

-

PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2162893168


Re: RFR: 8333893: Optimization for StringBuilder append boolean & null

2024-06-10 Thread Claes Redestad
On Mon, 10 Jun 2024 12:12:58 GMT, Shaojin Wen  wrote:

> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into 
> primitive arrays by combining values ​​into larger stores.
> 
> This PR rewrites the code of appendNull and append(boolean) methods so that 
> these two methods can be optimized by C2.

Whether increments are handled or not, I think the `StringUTF16.putChar` 
intrinsic might be getting in the way here and prevent merging consecutive 
"char" stores into 4 or 8 byte stores. It would be interesting to get numbers 
for the `putChar` version with the intrinsic disabled 
(`-XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_putCharStringU`).

-

PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2159509806


Re: RFR: 8333854: IllegalAccessError with proxies after JDK-8332457 [v2]

2024-06-10 Thread Claes Redestad
On Mon, 10 Jun 2024 04:08:41 GMT, Chen Liang  wrote:

>> Please review this patch that fixes a critical issue that breaks some Proxy 
>> usages.
>> 
>> CONSTANT_Class and CONSTANT_MethodType must fail resolution for inaccessible 
>> package-private types per JVMS 5.4.3.1 and 5.4.3.5, yet such types can 
>> appear in method descriptors in the same class. The proposed way to bypass 
>> is to store the MethodType as its descriptor string in the bootstrap method 
>> arguments, and use MethodType.fromMethodDescriptorString to restore the 
>> arguments, which does not have this restriction and does not eagerly load 
>> the parameter classes. This case isn't covered by existing tests, so a new 
>> test has been added.
>
> Chen Liang has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Obtain classloader in security manager friendly code path

I'm wary about the subtle semantic differences that might result from that and 
would be more comfortable reverting this behavior to what we did before 
JDK-8332457. Perhaps even backing that one out and re-examine the changes with 
less urgency.

-

PR Comment: https://git.openjdk.org/jdk/pull/19615#issuecomment-2158280708


Re: RFR: 8333854: IllegalAccessError with proxies after JDK-8332457 [v2]

2024-06-10 Thread Claes Redestad
On Mon, 10 Jun 2024 04:08:41 GMT, Chen Liang  wrote:

>> Please review this patch that fixes a critical issue that breaks some Proxy 
>> usages.
>> 
>> CONSTANT_Class and CONSTANT_MethodType must fail resolution for inaccessible 
>> package-private types per JVMS 5.4.3.1 and 5.4.3.5, yet such types can 
>> appear in method descriptors in the same class. The proposed way to bypass 
>> is to store the MethodType as its descriptor string in the bootstrap method 
>> arguments, and use MethodType.fromMethodDescriptorString to restore the 
>> arguments, which does not have this restriction and does not eagerly load 
>> the parameter classes. This case isn't covered by existing tests, so a new 
>> test has been added.
>
> Chen Liang has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Obtain classloader in security manager friendly code path

Original code did all the classloading in the `` - should we do that 
here, too?

Can you explain the need to `getPlatformClassLoader` when 
`classEntry.getClassLoader == null`? Pre-JDK-8332457 did no such thing.

I'll park #19598 until this stabilizes.

-

PR Comment: https://git.openjdk.org/jdk/pull/19615#issuecomment-2158189721


Re: RFR: 8333793: Improve BootstrapMethodInvoker for ConstantBootstraps and ProxyGenerator [v2]

2024-06-10 Thread Claes Redestad
On Fri, 7 Jun 2024 18:58:36 GMT, Claes Redestad  wrote:

>> This PR refactors type matching in BootstrapMethodInvoker and adds a few 
>> types, seeking to improve bootstrap overheads of some ConstantBootstraps and 
>> in particular the ProxyGenerator condys generated for e.g. annotation 
>> proxies since [JDK-8332457](https://bugs.openjdk.org/browse/JDK-8332457)
>> 
>> I've adjusted the micro-benchmark added by JDK-8332457 to not only generate 
>> a proxy but also call into one of the proxied methodt (`Object::hashCode`). 
>> 
>> Running org.openjdk.bench.java.lang.reflect.ProxyGenBench as a one-off 
>> startup benchmark sees significant improvement (-9% instructions, -6% 
>> cycles):
>> 
>> Name Cnt   Base  ErrorTest  
>> Error Unit  Change
>> Perfstartup-JMH   20154,000 ±8,165 148,000 ±   
>> 23,164ms/op   1,04x (p = 0,352 )
>>   :.cycles925335973,200 ± 47147600,262   842221278,800 ± 
>> 46836254,964   cycles   0,91x (p = 0,000*)
>>   :.instructions 2101588857,600 ± 81105850,361  1966307798,400 ± 
>> 22011043,908 instructions   0,94x (p = 0,000*)
>>   :.taskclock   291,500 ±   16,494 262,000 ±   
>> 15,328   ms   0,90x (p = 0,000*)
>>   * = significant
>> 
>> Number of classes loaded drops from 1096 to 1092
>> 
>> Running the micro regularly shows no significant difference:
>> 
>> Name  Cnt   Base   Error    Test   Error  Unit  
>> Change
>> ProxyGenBench.generateAndProxy100  10 26,827 ± 8,954  26,855 ± 7,531 ms/op   
>> 1,00x (p = 0,991 )
>>   * = significant
>
> Claes Redestad has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Adress review comments, add ConstantBootstraps#invoke to list of recognized 
> type signatures

Thanks, I'll see if I can get that working and if it has any effect as a 
follow-up. 

Testing caught a bug introduced by a copy-paste error in 532966a, re-running a 
few more tests now.

-

PR Comment: https://git.openjdk.org/jdk/pull/19598#issuecomment-2157922124


Re: RFR: 8333793: Improve BootstrapMethodInvoker for ConstantBootstraps and ProxyGenerator [v3]

2024-06-10 Thread Claes Redestad
> This PR refactors type matching in BootstrapMethodInvoker and adds a few 
> types, seeking to improve bootstrap overheads of some ConstantBootstraps and 
> in particular the ProxyGenerator condys generated for e.g. annotation proxies 
> since [JDK-8332457](https://bugs.openjdk.org/browse/JDK-8332457)
> 
> I've adjusted the micro-benchmark added by JDK-8332457 to not only generate a 
> proxy but also call into one of the proxied methodt (`Object::hashCode`). 
> 
> Running org.openjdk.bench.java.lang.reflect.ProxyGenBench as a one-off 
> startup benchmark sees significant improvement (-9% instructions, -6% cycles):
> 
> Name Cnt   Base  ErrorTest  
> Error Unit  Change
> Perfstartup-JMH   20154,000 ±8,165 148,000 ±   
> 23,164ms/op   1,04x (p = 0,352 )
>   :.cycles925335973,200 ± 47147600,262   842221278,800 ± 
> 46836254,964   cycles   0,91x (p = 0,000*)
>   :.instructions 2101588857,600 ± 81105850,361  1966307798,400 ± 
> 22011043,908 instructions   0,94x (p = 0,000*)
>   :.taskclock   291,500 ±   16,494 262,000 ±   
> 15,328   ms   0,90x (p = 0,000*)
>   * = significant
> 
> Number of classes loaded drops from 1096 to 1092
> 
> Running the micro regularly shows no significant difference:
> 
> Name  Cnt   Base   ErrorTest   Error  Unit  
> Change
> ProxyGenBench.generateAndProxy100  10 26,827 ± 8,954  26,855 ± 7,531 ms/op   
> 1,00x (p = 0,991 )
>   * = significant

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Copy-paste error; CONDY_INVOKE returns Object. Fixes TestDynamicConstant.java

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/19598/files
  - new: https://git.openjdk.org/jdk/pull/19598/files/532966a4..2d64ae1c

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=19598=02
 - incr: https://webrevs.openjdk.org/?repo=jdk=19598=01-02

  Stats: 5 lines in 1 file changed: 0 ins; 2 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/19598.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19598/head:pull/19598

PR: https://git.openjdk.org/jdk/pull/19598


Integrated: 8333824: Unused ClassValue in VarHandles

2024-06-10 Thread Claes Redestad
On Mon, 10 Jun 2024 08:30:59 GMT, Claes Redestad  wrote:

> Trivially removing the a leftover, unused `ClassValue` from 
> `java.lang.invoke.VarHandle`

This pull request has now been integrated.

Changeset: 7b43a8cd
Author:    Claes Redestad 
URL:   
https://git.openjdk.org/jdk/commit/7b43a8cd7c663facbe490f889838d7ead0eba0f9
Stats: 10 lines in 1 file changed: 0 ins; 10 del; 0 mod

8333824: Unused ClassValue in VarHandles

Reviewed-by: mcimadamore

-

PR: https://git.openjdk.org/jdk/pull/19620


Re: RFR: 8333824: Unused ClassValue in VarHandles

2024-06-10 Thread Claes Redestad
On Mon, 10 Jun 2024 08:57:23 GMT, Maurizio Cimadamore  
wrote:

> Good catch!

Thanks!

-

PR Comment: https://git.openjdk.org/jdk/pull/19620#issuecomment-2157910277


RFR: 8333824: Unused ClassValue in VarHandles

2024-06-10 Thread Claes Redestad
Trivially removing the a leftover, unused `ClassValue` from 
`java.lang.invoke.VarHandle`

-

Commit messages:
 - 8333824: Unused ClassValue in VarHandles

Changes: https://git.openjdk.org/jdk/pull/19620/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk=19620=00
  Issue: https://bugs.openjdk.org/browse/JDK-8333824
  Stats: 10 lines in 1 file changed: 0 ins; 10 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/19620.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19620/head:pull/19620

PR: https://git.openjdk.org/jdk/pull/19620


Re: RFR: 8333833: Remove the use of ByteArrayLittleEndian from UUID::toString [v7]

2024-06-10 Thread Claes Redestad
On Mon, 10 Jun 2024 07:32:26 GMT, Shaojin Wen  wrote:

>> After PR #16245, C2 optimizes stores into primitive arrays by combining 
>> values ​​into larger stores. In the UUID.toString method, 
>> ByteArrayLittleEndian can be removed, making the code more elegant and 
>> faster.
>
> Shaojin Wen has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Update src/java.base/share/classes/jdk/internal/util/HexDigits.java
>   
>   Co-authored-by: Claes Redestad 

Great, go ahead and /integrate again and I'll sponsor.

-

PR Comment: https://git.openjdk.org/jdk/pull/19610#issuecomment-2157627942


Re: RFR: 8333793: Improve BootstrapMethodInvoker for ConstantBootstraps and ProxyGenerator [v2]

2024-06-10 Thread Claes Redestad
On Fri, 7 Jun 2024 18:58:36 GMT, Claes Redestad  wrote:

>> This PR refactors type matching in BootstrapMethodInvoker and adds a few 
>> types, seeking to improve bootstrap overheads of some ConstantBootstraps and 
>> in particular the ProxyGenerator condys generated for e.g. annotation 
>> proxies since [JDK-8332457](https://bugs.openjdk.org/browse/JDK-8332457)
>> 
>> I've adjusted the micro-benchmark added by JDK-8332457 to not only generate 
>> a proxy but also call into one of the proxied methodt (`Object::hashCode`). 
>> 
>> Running org.openjdk.bench.java.lang.reflect.ProxyGenBench as a one-off 
>> startup benchmark sees significant improvement (-9% instructions, -6% 
>> cycles):
>> 
>> Name Cnt   Base  ErrorTest  
>> Error Unit  Change
>> Perfstartup-JMH   20154,000 ±8,165 148,000 ±   
>> 23,164ms/op   1,04x (p = 0,352 )
>>   :.cycles925335973,200 ± 47147600,262   842221278,800 ± 
>> 46836254,964   cycles   0,91x (p = 0,000*)
>>   :.instructions 2101588857,600 ± 81105850,361  1966307798,400 ± 
>> 22011043,908 instructions   0,94x (p = 0,000*)
>>   :.taskclock   291,500 ±   16,494 262,000 ±   
>> 15,328   ms   0,90x (p = 0,000*)
>>   * = significant
>> 
>> Number of classes loaded drops from 1096 to 1092
>> 
>> Running the micro regularly shows no significant difference:
>> 
>> Name  Cnt   Base   Error    Test   Error  Unit  
>> Change
>> ProxyGenBench.generateAndProxy100  10 26,827 ± 8,954  26,855 ± 7,531 ms/op   
>> 1,00x (p = 0,991 )
>>   * = significant
>
> Claes Redestad has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Adress review comments, add ConstantBootstraps#invoke to list of recognized 
> type signatures

Sure. If you can show how you'd do that I'd be happy to test it. Maybe there 
are excess native linkers I'm not seeing from using `invokeExact`, but AFAICT 
there are no linkers being spun when the type match is made exact via static 
casts (which is what these exact matches in BMI achieves). So I remain a bit 
confused about what we would be able to improve.

Thanks for reviewing!

-

PR Comment: https://git.openjdk.org/jdk/pull/19598#issuecomment-2157595699


Re: RFR: 8333833: UUID toString removes the use of ByteArrayLittleEndian [v7]

2024-06-10 Thread Claes Redestad
On Mon, 10 Jun 2024 07:32:26 GMT, Shaojin Wen  wrote:

>> After PR #16245, C2 optimizes stores into primitive arrays by combining 
>> values ​​into larger stores. In the UUID.toString method, 
>> ByteArrayLittleEndian can be removed, making the code more elegant and 
>> faster.
>
> Shaojin Wen has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Update src/java.base/share/classes/jdk/internal/util/HexDigits.java
>   
>   Co-authored-by: Claes Redestad 

Thanks. I hate to nitpick, but is it OK if I rename the RFE as "Remove the use 
of ByteArrayLittleEndian from UUID::toString" (the PR need to follow suit). I 
think the current name might be read as doing something completely different.

-

PR Comment: https://git.openjdk.org/jdk/pull/19610#issuecomment-2157580938


Re: RFR: 8333833: UUID toString removes the use of ByteArrayLittleEndian [v6]

2024-06-10 Thread Claes Redestad
On Sun, 9 Jun 2024 22:45:26 GMT, Shaojin Wen  wrote:

>> After PR #16245, C2 optimizes stores into primitive arrays by combining 
>> values ​​into larger stores. In the UUID.toString method, 
>> ByteArrayLittleEndian can be removed, making the code more elegant and 
>> faster.
>
> Shaojin Wen has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   better parameter names

src/java.base/share/classes/jdk/internal/util/HexDigits.java line 118:

> 116: /**
> 117:  * Insert the unsigned 2-byte integer into the buffer as 4 
> hexadecimal digit ASCII bytes,
> 118:  * only least significant 16 bits of {@code i} are used.

Suggestion:

 * only least significant 16 bits of {@code value} are used.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/19610#discussion_r1632711983


Re: RFR: 8333833: UUID toString removes the use of ByteArrayLittleEndian [v5]

2024-06-09 Thread Claes Redestad
On Sun, 9 Jun 2024 07:22:39 GMT, Shaojin Wen  wrote:

>> After PR #16245, C2 optimizes stores into primitive arrays by combining 
>> values ​​into larger stores. In the UUID.toString method, 
>> ByteArrayLittleEndian can be removed, making the code more elegant and 
>> faster.
>
> Shaojin Wen has updated the pull request incrementally with two additional 
> commits since the last revision:
> 
>  - rename putHex4 to put4
>  - update copy right year

Glad to see #16245 in action, enabling simpler code with equal or better 
performance.

src/java.base/share/classes/jdk/internal/util/HexDigits.java line 123:

> 121:  * @param i to convert
> 122:  */
> 123: public static void put4(byte[] buffer, int off, int i) {

Perhaps `index` and `value` are better names than `off` and `i`, respectively.

-

Marked as reviewed by redestad (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19610#pullrequestreview-2106355754
PR Review Comment: https://git.openjdk.org/jdk/pull/19610#discussion_r1632380045


Re: RFR: 8333793: Improve BootstrapMethodInvoker for ConstantBootstraps and ProxyGenerator [v2]

2024-06-08 Thread Claes Redestad
On Fri, 7 Jun 2024 19:22:43 GMT, Chen Liang  wrote:

>> It's not the intent of this PR to exhaustively enumerate all methods in 
>> `ConstantBootstraps`, primarily those shown to be bootstrap sensitive in 
>> some app. I've so far never seen a use of `primitiveClass` (and I admit 
>> being ignorant as to why this even exists as a BSM) so I don't have any 
>> reason to believe special-casing it will carry its own weight.
>
> For its existence, I think it was the same as null, that they weren't 
> representable by cp entries for bootstrap method args. They are now backed by 
> PrimitiveClassDescImpl, a type of condy, and ClassFile API currently writes 
> loadConstant(primitiveCd) as a condy.

I am sure there's a perfectly good explanation why we've done it like that 
rather than translating things like javac would (AFAICT a load from 
`Integer/Short/Byte/...TYPE`).

-

PR Review Comment: https://git.openjdk.org/jdk/pull/19598#discussion_r1632041009


Re: RFR: 8333793: Improve BootstrapMethodInvoker for ConstantBootstraps and ProxyGenerator

2024-06-07 Thread Claes Redestad
On Fri, 7 Jun 2024 13:46:22 GMT, Jorn Vernee  wrote:

> As an aside: I suppose we pre-generate all the linkers that get spun up for 
> these `invokeExact`calls already. But, we might still get a little bit of 
> mileage out of switching to `invokeBasic`, which bypasses the linkers 
> altogether. With the caveat that we have to be very careful that the call 
> site type is correct.

I'm not sure I see what linkers you're alluding to here in profiles (and I 
don't think we have code pre-generating all of them), and I don't see how to 
use `invokeBasic` here instead. Care to elaborate?

-

PR Comment: https://git.openjdk.org/jdk/pull/19598#issuecomment-2155357245


Re: RFR: 8333793: Improve BootstrapMethodInvoker for ConstantBootstraps and ProxyGenerator [v2]

2024-06-07 Thread Claes Redestad
> This PR refactors type matching in BootstrapMethodInvoker and adds a few 
> types, seeking to improve bootstrap overheads of some ConstantBootstraps and 
> in particular the ProxyGenerator condys generated for e.g. annotation proxies 
> since [JDK-8332457](https://bugs.openjdk.org/browse/JDK-8332457)
> 
> I've adjusted the micro-benchmark added by JDK-8332457 to not only generate a 
> proxy but also call into one of the proxied methodt (`Object::hashCode`). 
> 
> Running org.openjdk.bench.java.lang.reflect.ProxyGenBench as a one-off 
> startup benchmark sees significant improvement (-9% instructions, -6% cycles):
> 
> Name Cnt   Base  ErrorTest  
> Error Unit  Change
> Perfstartup-JMH   20154,000 ±8,165 148,000 ±   
> 23,164ms/op   1,04x (p = 0,352 )
>   :.cycles925335973,200 ± 47147600,262   842221278,800 ± 
> 46836254,964   cycles   0,91x (p = 0,000*)
>   :.instructions 2101588857,600 ± 81105850,361  1966307798,400 ± 
> 22011043,908 instructions   0,94x (p = 0,000*)
>   :.taskclock   291,500 ±   16,494 262,000 ±   
> 15,328   ms   0,90x (p = 0,000*)
>   * = significant
> 
> Number of classes loaded drops from 1096 to 1092
> 
> Running the micro regularly shows no significant difference:
> 
> Name  Cnt   Base   ErrorTest   Error  Unit  
> Change
> ProxyGenBench.generateAndProxy100  10 26,827 ± 8,954  26,855 ± 7,531 ms/op   
> 1,00x (p = 0,991 )
>   * = significant

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Adress review comments, add ConstantBootstraps#invoke to list of recognized 
type signatures

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/19598/files
  - new: https://git.openjdk.org/jdk/pull/19598/files/21f945b5..532966a4

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=19598=01
 - incr: https://webrevs.openjdk.org/?repo=jdk=19598=00-01

  Stats: 29 lines in 1 file changed: 17 ins; 0 del; 12 mod
  Patch: https://git.openjdk.org/jdk/pull/19598.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19598/head:pull/19598

PR: https://git.openjdk.org/jdk/pull/19598


Re: RFR: 8333793: Improve BootstrapMethodInvoker for ConstantBootstraps and ProxyGenerator

2024-06-07 Thread Claes Redestad
On Fri, 7 Jun 2024 18:38:52 GMT, ExE Boss  wrote:

>> src/java.base/share/classes/java/lang/invoke/BootstrapMethodInvoker.java 
>> line 289:
>> 
>>> 287:  * bootstrap method.
>>> 288:  */
>>> 289: private static final MethodType CONDY_GET_STATIC_FINAL_MT = 
>>> MethodType.methodType(Object.class,
>> 
>> This type should just be called `CONDY_NO_ARG` as it's the most common form 
>> of condy that doesn't take any arg (`nullConstant` `primitiveClass` etc.)
>
> Note that [`ConstantBootstraps​::primitiveClass​(Lookup, String, Class)`] has 
> a static return type of `Class⁠<⁠?⁠>`, so the following is also needed:
> 
> /**
>  * Exact type used of the {@link ConstantBootstraps#primitiveClass(Lookup, 
> String, Class)}
>  * bootstrap method.
>  */
> private static final MethodType CONDY_CLASS_NO_ARG_MT
>   = CONDY_GET_STATIC_FINAL_MT.changeReturnType(Class.class);
> 
> 
> [`ConstantBootstraps​::primitiveClass​(Lookup, String, Class)`]: 
> https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/invoke/ConstantBootstraps.html#primitiveClass(java.lang.invoke.MethodHandles.Lookup,java.lang.String,java.lang.Class)

It's not the intent of this PR to exhaustively enumerate all methods in 
`ConstantBootstraps`, primarily those shown to be bootstrap sensitive in some 
app. I've so far never seen a use of `primitiveClass` (and I admit being 
ignorant as to why this even exists as a BSM) so I don't have any reason to 
believe special-casing it will carry its own weight.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/19598#discussion_r1631578674


Re: RFR: 8333774: Avoid eagerly loading various EmptySpliterator classes [v3]

2024-06-07 Thread Claes Redestad
On Fri, 7 Jun 2024 13:08:24 GMT, Claes Redestad  wrote:

>> Trivially move a few private static finals to their respective source type 
>> to avoid eagerly loading a few small classes.
>
> Claes Redestad has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Update src/java.base/share/classes/java/util/Spliterators.java
>   
>   Co-authored-by: Chen Liang 

Thanks! Yes, the best holder for a singleton is typically the class itself 

-

PR Comment: https://git.openjdk.org/jdk/pull/19591#issuecomment-2155003099


Integrated: 8333774: Avoid eagerly loading various EmptySpliterator classes

2024-06-07 Thread Claes Redestad
On Fri, 7 Jun 2024 08:21:58 GMT, Claes Redestad  wrote:

> Trivially move a few private static finals to their respective source type to 
> avoid eagerly loading a few small classes.

This pull request has now been integrated.

Changeset: d744059b
Author:    Claes Redestad 
URL:   
https://git.openjdk.org/jdk/commit/d744059b5b3e944bee53536de6f404666e45e8e5
Stats: 33 lines in 1 file changed: 12 ins; 12 del; 9 mod

8333774: Avoid eagerly loading various EmptySpliterator classes

Reviewed-by: liach, pminborg

-

PR: https://git.openjdk.org/jdk/pull/19591


Re: RFR: 8333749: Consolidate ConstantDesc conversion in java.base [v6]

2024-06-07 Thread Claes Redestad
On Fri, 7 Jun 2024 13:56:24 GMT, Chen Liang  wrote:

>> In java.base, especially in bytecode generators, we have many different 
>> methods converting known valid Class and MethodType into ClassDesc and 
>> MethodTypeDesc. These conversions should be consolidated into the same 
>> utility method for the ease of future maintenance.
>
> Chen Liang has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   ExE-Boss review, go through validated path in Class

Good incremental improvements.

-

Marked as reviewed by redestad (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19585#pullrequestreview-2104694382


Re: RFR: 8333774: Avoid eagerly loading various EmptySpliterator classes [v3]

2024-06-07 Thread Claes Redestad
> Trivially move a few private static finals to their respective source type to 
> avoid eagerly loading a few small classes.

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Update src/java.base/share/classes/java/util/Spliterators.java
  
  Co-authored-by: Chen Liang 

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/19591/files
  - new: https://git.openjdk.org/jdk/pull/19591/files/9d19bbe1..10611afd

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=19591=02
 - incr: https://webrevs.openjdk.org/?repo=jdk=19591=01-02

  Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/19591.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19591/head:pull/19591

PR: https://git.openjdk.org/jdk/pull/19591


Re: RFR: 8333774: Avoid eagerly loading various EmptySpliterator classes [v2]

2024-06-07 Thread Claes Redestad
> Trivially move a few private static finals to their respective source type to 
> avoid eagerly loading a few small classes.

Claes Redestad has updated the pull request incrementally with two additional 
commits since the last revision:

 - revert typo
 - Copyright

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/19591/files
  - new: https://git.openjdk.org/jdk/pull/19591/files/5eb2ddd8..9d19bbe1

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=19591=01
 - incr: https://webrevs.openjdk.org/?repo=jdk=19591=00-01

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/19591.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19591/head:pull/19591

PR: https://git.openjdk.org/jdk/pull/19591


RFR: 8333793: Improve BootstrapMethodInvoker for ConstantBootstraps and ProxyGenerator

2024-06-07 Thread Claes Redestad
This PR refactors type matching in BootstrapMethodInvoker and adds a few types, 
seeking to improve bootstrap overheads of some ConstantBootstraps and in 
particular the ProxyGenerator condys generated for e.g. annotation proxies 
since [JDK-8332457](https://bugs.openjdk.org/browse/JDK-8332457)

I've adjusted the micro-benchmark added by JDK-8332457 to not only generate a 
proxy but also call into one of the proxied methodt (`Object::hashCode`). 

Running org.openjdk.bench.java.lang.reflect.ProxyGenBench as a one-off startup 
benchmark sees significant improvement (-9% instructions, -6% cycles):

Name Cnt   Base  ErrorTest  
Error Unit  Change
Perfstartup-JMH   20154,000 ±8,165 148,000 ±   
23,164ms/op   1,04x (p = 0,352 )
  :.cycles925335973,200 ± 47147600,262   842221278,800 ± 
46836254,964   cycles   0,91x (p = 0,000*)
  :.instructions 2101588857,600 ± 81105850,361  1966307798,400 ± 
22011043,908 instructions   0,94x (p = 0,000*)
  :.taskclock   291,500 ±   16,494 262,000 ±   
15,328   ms   0,90x (p = 0,000*)
  * = significant

Number of classes loaded drops from 1096 to 1092

Running the micro regularly shows no significant difference:

Name  Cnt   Base   ErrorTest   Error  Unit  
Change
ProxyGenBench.generateAndProxy100  10 26,827 ± 8,954  26,855 ± 7,531 ms/op   
1,00x (p = 0,991 )
  * = significant

-

Commit messages:
 - Rename microbenchmark
 - Ensure proxying for hashCode etc is 'implemented'
 - Ensure proxying for hashCode etc is 'implemented'
 - Change ProxyGenBench to provoke proxy condy
 - Improve BootstrapMethodInvoker for ConstantBootstraps and ProxyGenerator

Changes: https://git.openjdk.org/jdk/pull/19598/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk=19598=00
  Issue: https://bugs.openjdk.org/browse/JDK-8333793
  Stats: 89 lines in 2 files changed: 35 ins; 24 del; 30 mod
  Patch: https://git.openjdk.org/jdk/pull/19598.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19598/head:pull/19598

PR: https://git.openjdk.org/jdk/pull/19598


Re: RFR: 8332249: Micro-optimize Method.hashCode [v2]

2024-06-07 Thread Claes Redestad
On Wed, 5 Jun 2024 13:45:46 GMT, Per Minborg  wrote:

>> As a note, If we would have access to the contemplated `StableValue` and 
>> `hash` was declared even as an _instance variable_ `StableValue` in 
>> a regular class, then it would be trusted and would be eligible for constant 
>> folding due to the VM will have special rules for  fields of StableValue 
>> type.
>
> Ahh. There was a benchmark in the initial message. Sorry.

We think this is caused by the membars being conservatively emitted at method 
exit, and more precis membar control should help ARM. Filed: 
https://bugs.openjdk.org/browse/JDK-8333791

-

PR Review Comment: https://git.openjdk.org/jdk/pull/19433#discussion_r1631071572


Re: RFR: 8332457: Examine startup overheads from JDK-8294961 [v19]

2024-06-07 Thread Claes Redestad
On Wed, 5 Jun 2024 12:39:13 GMT, Chen Liang  wrote:

>> Adam Sotona has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   An assortment of potential improvements
>>   
>>   Co-authored-by: Claes Redestad 
>
> src/java.base/share/classes/java/lang/constant/ConstantDescs.java line 356:
> 
>> 354: ClassDesc[] fullParamTypes = new ClassDesc[paramTypes.length + 
>> prefixLen];
>> 355: System.arraycopy(INDY_BOOTSTRAP_ARGS, 0, fullParamTypes, 0, 
>> prefixLen);
>> 356: System.arraycopy(paramTypes, 0, fullParamTypes, prefixLen, 
>> paramTypes.length);
> 
> I'm thinking about creating a basic MethodTypeDesc like `INDY_BOOTSTRAP_TYPE 
> = MethodTypeDesc.ofValidated(CD_Object, INDY_BOOTSTRAP_ARGS)`, and we derive 
> bsm with 
> 
> INDY_BOOTSTRAP_TYPE
> .insertParameterTypes(INDY_BOOTSTRAP_TYPE.parameterCount(), paramTypes)
> .changeReturnType(returnType)
> 
> which creates two MTD wrappers but they share the same parameter array

That'd arguably be a bit cleaner. Might allocate an extra MTD (unless/until EA 
kicks in), but that might not matter really.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/19410#discussion_r1630868455


RFR: 8333774: Avoid eagerly loading various EmptySpliterator classes

2024-06-07 Thread Claes Redestad
Trivially move a few private static finals to their respective source type to 
avoid eagerly loading a few small classes.

-

Commit messages:
 - 8333774: Avoid eagerly loading various EmptySpliterator classes

Changes: https://git.openjdk.org/jdk/pull/19591/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk=19591=00
  Issue: https://bugs.openjdk.org/browse/JDK-8333774
  Stats: 31 lines in 1 file changed: 12 ins; 11 del; 8 mod
  Patch: https://git.openjdk.org/jdk/pull/19591.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19591/head:pull/19591

PR: https://git.openjdk.org/jdk/pull/19591


Re: RFR: 8333749: Consolidate ConstantDesc conversion in java.base [v3]

2024-06-07 Thread Claes Redestad
On Thu, 6 Jun 2024 19:12:35 GMT, Chen Liang  wrote:

>> src/java.base/share/classes/java/lang/runtime/SwitchBootstraps.java line 469:
>> 
>>> 467: // o instanceof Wrapped(float)
>>> 468: cb.aload(SELECTOR_OBJ);
>>> 469: 
>>> cb.instanceOf(classDesc(Wrapper.forBasicType(classLabel)
>> 
>> I have a patch somewhere to cache the wrapper class desc in 
>> `sun.invoke.util.Wrapper`, both as a micro-optimization and to help 
>> disambigutate the unfortunately named (my bad) `Wrapper::classDescriptor` 
>> method. Maybe we can roll that into this..?
>
> Feel free, or you can get your patch merged first and then I base off yours 
> if yours is ready.

Created a PR you could fold into this, if you'd like: 
https://github.com/liachmodded/jdk/pull/1

-

PR Review Comment: https://git.openjdk.org/jdk/pull/19585#discussion_r1630816783


Re: RFR: 8333749: Consolidate ConstantDesc conversion in java.base

2024-06-06 Thread Claes Redestad
On Thu, 6 Jun 2024 18:35:01 GMT, Chen Liang  wrote:

> In java.base, especially in bytecode generators, we have many different 
> methods converting known valid Class and MethodType into ClassDesc and 
> MethodTypeDesc. These conversions should be consolidated into the same 
> utility method for the ease of future maintenance.

Looks good to me. Probably should get someone to OK changes in foreign/abi 
(@JornVernee perhaps?).

There are some pre-existing places where we call 
`ReferenceClassDescImpl.ofValidated` directly that could probably be switched 
over to `ConstantUtils.classDesc` for slightly nicer code. Or - if it matters - 
add a `referenceClassDesc` which avoids the `type.isPrimitive` test.

src/java.base/share/classes/java/lang/invoke/MethodHandleProxies.java line 256:

> 254: // the field name holding the method handle for this method
> 255: String fieldName = "m" + count++;
> 256: var mt = methodDesc(m.getReturnType(), 
> JLRA.getExecutableSharedParameterTypes(m));

Suggestion:

var md = methodDesc(m.getReturnType(), 
JLRA.getExecutableSharedParameterTypes(m));

src/java.base/share/classes/java/lang/runtime/SwitchBootstraps.java line 469:

> 467: // o instanceof Wrapped(float)
> 468: cb.aload(SELECTOR_OBJ);
> 469: 
> cb.instanceOf(classDesc(Wrapper.forBasicType(classLabel)

I have a patch somewhere to cache the wrapper class desc in 
`sun.invoke.util.Wrapper`, both as a micro-optimization and to help 
disambigutate the unfortunately named (my bad) `Wrapper::classDescriptor` 
method. Maybe we can roll that into this..?

src/java.base/share/classes/jdk/internal/foreign/abi/BindingSpecializer.java 
line 471:

> 469: case Allocate allocate   -> 
> emitAllocBuffer(allocate);
> 470: case BoxAddress boxAddress   -> 
> emitBoxAddress(boxAddress);
> 471: case SegmentBase _   -> emitSegmentBase();

Seem a bit too far detached from the changes at hand for a drive-by code 
cleanup?

-

Marked as reviewed by redestad (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19585#pullrequestreview-2102953208
PR Review Comment: https://git.openjdk.org/jdk/pull/19585#discussion_r1630049028
PR Review Comment: https://git.openjdk.org/jdk/pull/19585#discussion_r1630061889
PR Review Comment: https://git.openjdk.org/jdk/pull/19585#discussion_r1630074187


Re: RFR: 8332457: Examine startup overheads from JDK-8294961 [v19]

2024-06-05 Thread Claes Redestad
On Wed, 5 Jun 2024 12:00:25 GMT, Adam Sotona  wrote:

>> [JDK-8294961](https://bugs.openjdk.org/browse/JDK-8294961) changed to use 
>> classfile API for reflection proxy-generation. Actual implementation of 
>> `ProxyGenerator` is focused on performance, however it causes JDK bootstrap 
>> regressions. `ProxyGenerator.TEMPLATE` class model is statically created and 
>> each proxy class is transformed from the template.
>> 
>> This patch is intended to examine plain proxy generation impact on 
>> performance and JDK bootstrap (vs proxy transformation from template).
>> 
>> The generated proxy is migrated from static initialization to CONDY 
>> bootstrap.
>> 
>> Please review.
>> 
>> Thank you,
>> Adam
>
> Adam Sotona has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   An assortment of potential improvements
>   
>   Co-authored-by: Claes Redestad 

Looks good. Generation of proxy classes gets a nice boost this way. The condy 
bsm calls that may happen later takes a small hit which we can improve in a 
follow-up.

-

Marked as reviewed by redestad (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19410#pullrequestreview-2098990024


Re: RFR: 8332457: Examine startup overheads from JDK-8294961 [v14]

2024-06-03 Thread Claes Redestad
On Mon, 3 Jun 2024 11:09:31 GMT, Adam Sotona  wrote:

>> [JDK-8294961](https://bugs.openjdk.org/browse/JDK-8294961) changed to use 
>> classfile API for reflection proxy-generation. Actual implementation of 
>> `ProxyGenerator` is focused on performance, however it causes JDK bootstrap 
>> regressions. `ProxyGenerator.TEMPLATE` class model is statically created and 
>> each proxy class is transformed from the template.
>> 
>> This patch is intended to examine plain proxy generation impact on 
>> performance and JDK bootstrap (vs proxy transformation from template).
>> 
>> The generated proxy is migrated from static initialization to CONDY 
>> bootstrap.
>> 
>> Please review.
>> 
>> Thank you,
>> Adam
>
> Adam Sotona has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   ProxyGenBench simplification

test/micro/org/openjdk/bench/java/lang/reflect/Proxy/ProxyGenBench.java line 23:

> 21:  * questions.
> 22:  */
> 23: package org.openjdk.bench.java.lang.reflect.Proxy;

Package name needs to be lowercase. Not sure why the folder name is uppercase 
Proxy, but the two pre-existing benchmarks both have lower case package 
declarations. Uppercase letters in package names may subtly break a few tools

-

PR Review Comment: https://git.openjdk.org/jdk/pull/19410#discussion_r1624270828


Re: RFR: 8332457: Examine startup overheads from JDK-8294961 [v13]

2024-06-03 Thread Claes Redestad
On Mon, 3 Jun 2024 10:11:34 GMT, Adam Sotona  wrote:

>> [JDK-8294961](https://bugs.openjdk.org/browse/JDK-8294961) changed to use 
>> classfile API for reflection proxy-generation. Actual implementation of 
>> `ProxyGenerator` is focused on performance, however it causes JDK bootstrap 
>> regressions. `ProxyGenerator.TEMPLATE` class model is statically created and 
>> each proxy class is transformed from the template.
>> 
>> This patch is intended to examine plain proxy generation impact on 
>> performance and JDK bootstrap (vs proxy transformation from template).
>> 
>> The generated proxy is migrated from static initialization to CONDY 
>> bootstrap.
>> 
>> Please review.
>> 
>> Thank you,
>> Adam
>
> Adam Sotona has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   added ProxyGenBench JMH micro benchmark

test/micro/org/openjdk/bench/java/lang/reflect/Proxy/ProxyGenBench.java line 66:

> 64: ClassDesc tempDesc = 
> ClassDesc.ofDescriptor(Interfaze.class.descriptorString());
> 65: loader = new ClsLoader();
> 66: clsMap = new HashMap<>(100);

Suggestion:

clsMap = HashMap.newHashMap(100);

-

PR Review Comment: https://git.openjdk.org/jdk/pull/19410#discussion_r1624192732


Re: RFR: 8333265: De-duplicate method references in java.util.stream.FindOps

2024-05-30 Thread Claes Redestad
On Thu, 30 May 2024 12:50:36 GMT, Claes Redestad  wrote:

> Extracting duplicate method references to static field reduce proxy class 
> spinning and loading. In this case 2 less classes loaded when using 
> `findAny()` on each type of stream.

Vicente filed a bug on javac to investigate this: 
https://bugs.openjdk.org/browse/JDK-8333278

I wouldn't mind using condy for constant aka non-capturing lambdas. I recall we 
had a prototype for that years ago, but switching over was shelved for some 
reason.

-

PR Comment: https://git.openjdk.org/jdk/pull/19477#issuecomment-2139703977


Re: RFR: 8333265: De-duplicate method references in java.util.stream.FindOps

2024-05-30 Thread Claes Redestad
On Thu, 30 May 2024 13:04:46 GMT, Chen Liang  wrote:

> Should we extract them to private static utility methods for potential 
> laziness support in the future?

Ideally we shouldn't have to do any of this. I thought such method references 
were already de-duplicated in javac - at least within the same compilation 
units - but I saw this in a startup profile and was surprised myself that the 
demonstrated manual de-duplication has an observable effect.

-

PR Comment: https://git.openjdk.org/jdk/pull/19477#issuecomment-2139576911


RFR: 8333265: De-duplicate method references in java.util.stream.FindOps

2024-05-30 Thread Claes Redestad
Extracting duplicate method references to static field reduce proxy class 
spinning and loading. In this case 2 less classes loaded when using `findAny()` 
on each type of stream.

-

Commit messages:
 - De-duplicate identical lambdas in FindOps

Changes: https://git.openjdk.org/jdk/pull/19477/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk=19477=00
  Issue: https://bugs.openjdk.org/browse/JDK-8333265
  Stats: 35 lines in 2 files changed: 17 ins; 7 del; 11 mod
  Patch: https://git.openjdk.org/jdk/pull/19477.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19477/head:pull/19477

PR: https://git.openjdk.org/jdk/pull/19477


Re: RFR: 8333103: Re-examine the console provider loading

2024-05-30 Thread Claes Redestad
On Thu, 30 May 2024 10:02:25 GMT, Alan Bateman  wrote:

> Would it be possible to provide more context/background here? This is not 
> code that is used during startup. Is there benchmark data to share for first 
> use of java.io.IO ?

I think this was brought to the fore by https://openjdk.org/jeps/477 where 
running the example implicit main:

void main() {
println("Hello, World!");
}

 .. brings in this code. This PR is one of several to try and smoothen a few 
rough edges that makes the startup of that quite a bit heavier than the 
baseline non-implicit Hello World.

-

PR Comment: https://git.openjdk.org/jdk/pull/19467#issuecomment-2139218517


Re: RFR: 8332826: Make hashCode methods in ArraysSupport friendlier [v2]

2024-05-30 Thread Claes Redestad
On Wed, 29 May 2024 09:18:51 GMT, Pavel Rappo  wrote:

>> The non-constant test was added because that very bailout caused a crash. 
>> The other test is actually less interesting since it'll likely be covered 
>> indirectly by regular use. But as we are hiding these away this gets ever 
>> more obscure and perhaps the test could be dropped entirely.
>
> @cl4es, do you want me to delete that test file altogether?

I thought you verified that the non-constant type test still provoke a crash 
(on x86) if you back out the code changes in 
https://github.com/openjdk/jdk/commit/969f6a37e4649079c7acea1952f5537fd9ba2f0a 
? If so that test is still somewhat useful to guard against future coding 
mistakes by verifying that the bail out doesn't mess things up. The constant 
type tests have less utility, perhaps. I'd keep it as is unless there's a 
strong desire to reduce test runtime (these should be pretty quick).

-

PR Review Comment: https://git.openjdk.org/jdk/pull/19414#discussion_r1620259115


Re: RFR: 8333103: Re-examine the console provider loading

2024-05-30 Thread Claes Redestad
On Wed, 29 May 2024 22:25:09 GMT, Naoto Sato  wrote:

>> Yes, `break` guarantees that the search completes one way or another once 
>> the module name has been matched. This is not how it used to be done.
>
> Right. Since `findAny` is after the module name matching, there is at most 1 
> match. In the case we didn't find any, the final `orElse(null)` eventually 
> returns null. So the refactored code is semantically the same.

It's only semantically the same if we assume a module can only provide a single 
`JdkConsoleProvider`, no? The `break;` disallows multiple providers (for 
disjoint sets of charsets) in a single module.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/19467#discussion_r1620164006


Re: RFR: 8333103: Re-examine the console provider loading

2024-05-29 Thread Claes Redestad
On Wed, 29 May 2024 19:51:36 GMT, Naoto Sato  wrote:

> There is an initialization code in `Console` class that searches for the 
> Console implementations. Refactoring the init code not to use lambda/stream 
> would reduce the (initial) number of loaded classes by about 100 for 
> java.base implementations. This would become relevant when the java.io.IO 
> (JEP 477) uses Console as the underlying framework.

Marked as reviewed by redestad (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/19467#pullrequestreview-2086537783


Re: RFR: 8332457: Examine startup overheads from JDK-8294961 [v12]

2024-05-29 Thread Claes Redestad
On Wed, 29 May 2024 07:17:38 GMT, Adam Sotona  wrote:

>> [JDK-8294961](https://bugs.openjdk.org/browse/JDK-8294961) changed to use 
>> classfile API for reflection proxy-generation. Actual implementation of 
>> `ProxyGenerator` is focused on performance, however it causes JDK bootstrap 
>> regressions. `ProxyGenerator.TEMPLATE` class model is statically created and 
>> each proxy class is transformed from the template.
>> 
>> This patch is intended to examine plain proxy generation impact on 
>> performance and JDK bootstrap (vs proxy transformation from template).
>> 
>> The generated proxy is migrated from static initialization to CONDY 
>> bootstrap.
>> 
>> Please review.
>> 
>> Thank you,
>> Adam
>
> Adam Sotona has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains 13 additional 
> commits since the last revision:
> 
>  - obsolete import
>  - Merge branch 'master' into JDK-8332457-proxy-startup
>  - missing bracket
>  - Update src/java.base/share/classes/java/lang/reflect/ProxyGenerator.java
>
>Co-authored-by: liach <7806504+li...@users.noreply.github.com>
>  - removed obsolete entry
>  - MTD_ cleanup
>  - fixed javadoc
>  - CONDY implementation of ProxyGenerator
>  - simplification of the proxy class loading
>  - more improvements
>  - ... and 3 more: https://git.openjdk.org/jdk/compare/62348071...942342d5

The condy bootstrapping adds a little noise when running this through an 
affected startup benchmark. Overall the net impact is small or even negative. 
This is a nice cleanup, though, but perhaps we ought to keep JDK-8332457 open 
(or file a new bug to keep investigating overheads).

-

PR Comment: https://git.openjdk.org/jdk/pull/19410#issuecomment-2137310415


Re: RFR: 8332826: Make hashCode methods in ArraysSupport friendlier [v2]

2024-05-29 Thread Claes Redestad
On Mon, 27 May 2024 20:55:29 GMT, Pavel Rappo  wrote:

>> Please review this PR, which supersedes a now withdrawn 
>> https://github.com/openjdk/jdk/pull/14831.
>> 
>> This PR replaces `ArraysSupport.vectorizedHashCode` with a set of more 
>> user-friendly methods. Here's a summary:
>> 
>> - Made the operand constants (i.e. `T_BOOLEAN` and friends) and the 
>> `vectorizedHashCode` method private
>> 
>> - Made the `vectorizedHashCode` method private, but didn't rename it. 
>> Renaming would dramatically increase this PR review cost, because that 
>> method's name is used by a lot of VM code. On a bright side, since the 
>> method is now private, it's no longer callable by clients of 
>> `ArraysSupport`, thus a problem of an inaccurate name is less severe.
>> 
>> - Made the `ArraysSupport.utf16HashCode` method private
>> 
>> - Moved tiny cases (i.e. 0, 1, 2) to `ArraysSupport`
>
> Pavel Rappo has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Fix incorrect utf16 hashCode adaptation

Marked as reviewed by redestad (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/19414#pullrequestreview-2084714609


Re: RFR: 8332826: Make hashCode methods in ArraysSupport friendlier [v2]

2024-05-29 Thread Claes Redestad
On Wed, 29 May 2024 03:16:02 GMT, Chen Liang  wrote:

>> In fact, if I change it to `2`, the following tests will fail:
>> 
>>   - `jdk/jdk/classfile/Utf8EntryTest.java`
>>   - `jdk/java/util/zip/ZipCoding.java`
>>   - `jdk/java/text/Format/MessageFormat/MessageRegression.java`
>
> Indeed, the actual length passed at call site is `value.length >> 1` instead 
> of `value.length`; this adjusted char-length carries on to 
> `vectorizedHashCode` call.

Ah, sneaky. Might affect scores for the zero and one-char cases since the shift 
now happens unconditionally, but probably doesn't matter in practice.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/19414#discussion_r1618456668


Re: RFR: 8332826: Make hashCode methods in ArraysSupport friendlier [v2]

2024-05-28 Thread Claes Redestad
On Tue, 28 May 2024 19:13:30 GMT, Jorn Vernee  wrote:

>> Pavel Rappo has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Fix incorrect utf16 hashCode adaptation
>
> src/java.base/share/classes/jdk/internal/util/ArraysSupport.java line 275:
> 
>> 273: return switch (length) {
>> 274: case 0 -> initialValue;
>> 275: case 1 -> 31 * initialValue + (a[fromIndex] & 0xff);
> 
> For clarity, if you think it helps:
> Suggestion:
> 
> case 1 -> 31 * initialValue + Byte.toUnsignedInt(a[fromIndex]);

I don't care as long as microbenchmarks don't get a hiccup.

> src/java.base/share/classes/jdk/internal/util/ArraysSupport.java line 301:
> 
>> 299: return switch (length) {
>> 300: case 0 -> initialValue;
>> 301: case 1 -> 31 * initialValue + JLA.getUTF16Char(a, 
>> fromIndex);
> 
> There seems to be a mismatch here with the original code in StringUTF16, 
> where the length that is tested for is `2` instead of `1`.

Yes, should be `2` (`a` is semantically a `char[]`). This typo likely pass 
functional testing since `1` can never happen in practice, and the default case 
should work for any value. There might be a String microbenchmark out there 
that might be slightly unhappy, though.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/19414#discussion_r1617867797
PR Review Comment: https://git.openjdk.org/jdk/pull/19414#discussion_r1617865658


Re: RFR: 8332826: Make hashCode methods in ArraysSupport friendlier [v2]

2024-05-28 Thread Claes Redestad
On Tue, 28 May 2024 19:19:51 GMT, Jorn Vernee  wrote:

>> Pavel Rappo has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Fix incorrect utf16 hashCode adaptation
>
> test/hotspot/jtreg/compiler/intrinsics/TestArraysHashCode.java line 88:
> 
>> 86: private static int testIntrinsic(byte[] bytes, int type)
>> 87: throws InvocationTargetException, IllegalAccessException {
>> 88: return (int) vectorizedHashCode.invoke(null, bytes, 0, 256, 1, 
>> type);
> 
> Better to just call `hashCodeOfUnsigned` here I think.
> 
> The test for the non-constant type could be dropped. That is no longer a part 
> of the 'API' of `ArraySupport`. It looks like the intrinsic bails out when 
> the basic type is not constant any ways: 
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/library_call.cpp#L6401-L6404

The non-constant test was added because that very bailout caused a crash. The 
other test is actually less interesting since it'll likely be covered 
indirectly by regular use. But as we are hiding these away this gets ever more 
obscure and perhaps the test could be dropped entirely.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/19414#discussion_r1617848032


  1   2   3   4   5   6   7   8   >