date:20210402

Garbage Free Check

2021-04-02 Thread Ralph Goers

Log4j 2 supports the notion of a PreciseClock - one that can be initialized to 
something more precise than a millisecond. At the same time it also supports 
running with no heap allocations in certain circumstances. I am in the process 
of moving our master branch to require Java 11 as the minimum. In doing so I am 
encountering unit test errors while verifying that logging is garbage free. 
They all occur allocating an Instant.

The code we have simply does

public void init(MutableInstant mutableInstant) {
Instant instant = java.time.Clock.systemUTC().instant();
mutableInstant.initFromEpochSecond(instant.getEpochSecond(), 
instant.getNano());
}
In our previous tests we had thought the allocation was being eliminated due to 
escape analysis since the data is being extracted from the Instant and not 
passed along. However, after upgrading the Google test library and the JDK 
version it appears that is not the case.
Ideally we would really like something like

public void init(MutableInstant mutableInstant) {
java.time.Clock.systemUTC().initInstant(mutableInstant);
}

where Mutable instant would implement an interface that has the two set 
methods.The method would execute the same logic that is in the instant() method 
but instead of creating a new Instant it would call the set methods for the 
provided object.

This would allow us to either have the MutableInstants in ThreadLocals or some 
other mechanism to ensure they are thread safe and have no heap allocations. As 
it stands now I see no way to gain access to the higher precision clock without 
memory allocation.

Do you know of another way to do this? Am I missing something?

Ralph

Integrated: JDK-8264572: ForkJoinPool.getCommonPoolParallelism() reports always 1

2021-04-02 Thread Doug Lea

On Fri, 2 Apr 2021 10:52:45 GMT, Doug Lea  wrote:

> This reinserts setting common pool parallelism inadvertently clobbered in 
> JDK-8259800

This pull request has now been integrated.

Changeset: cec66cf8
Author:Doug Lea 
URL:   https://git.openjdk.java.net/jdk/commit/cec66cf8
Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod

8264572: ForkJoinPool.getCommonPoolParallelism() reports always 1

Reviewed-by: alanb

-

PR: https://git.openjdk.java.net/jdk/pull/3324

Re: RFR: 8264512: jdk/test/jdk/java/util/prefs/ExportNode.java relies on default platform encoding

2021-04-02 Thread Naoto Sato

On Fri, 2 Apr 2021 21:45:58 GMT, Brian Burkhalter  wrote:

> This test emits to a `java.io.ByteArrayOutputStream` the contents of a 
> `java.utils.prefs.Preferences` node. The `UTF-8` character encoding is used 
> [1]. The `ByteArrayOutputStream` is then converted to a `String` using 
> `toString()` which uses the platform's default character set [2]. The 
> resulting `String` is then checked for the node names that it should and 
> should not contain.
> 
> This change proposes to use `ByteArrayOutputStream.toString(String)` to 
> obtain the string [3] to maintain consistency of the encoding. It also adds 
> throwing a `RuntimeException` if the node string is not as expected. It is 
> unclear why this test was not already throwing such an exception.
> 
> [1] 
> https://docs.oracle.com/en/java/javase/16/docs/api/java.prefs/java/util/prefs/Preferences.html#exportNode(java.io.OutputStream)
> [2] 
> https://docs.oracle.com/en/java/javase/16/docs/api/java.base/java/io/ByteArrayOutputStream.html#toString()
> [3] 
> https://docs.oracle.com/en/java/javase/16/docs/api/java.base/java/io/ByteArrayOutputStream.html#toString(java.lang.String)

Looks good.

-

Marked as reviewed by naoto (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/3332

RFR: 8264512: jdk/test/jdk/java/util/prefs/ExportNode.java relies on default platform encoding

2021-04-02 Thread Brian Burkhalter

This test emits to a `java.io.ByteArrayOutputStream` the contents of a 
`java.utils.prefs.Preferences` node. The `UTF-8` character encoding is used 
[1]. The `ByteArrayOutputStream` is then converted to a `String` using 
`toString()` which uses the platform's default character set [2]. The resulting 
`String` is then checked for the node names that it should and should not 
contain.

This change proposes to use `ByteArrayOutputStream.toString(String)` to obtain 
the string [3] to maintain consistency of the encoding. It also adds throwing a 
`RuntimeException` if the node string is not as expected. It is unclear why 
this test was not already throwing such an exception.

[1] 
https://docs.oracle.com/en/java/javase/16/docs/api/java.prefs/java/util/prefs/Preferences.html#exportNode(java.io.OutputStream)
[2] 
https://docs.oracle.com/en/java/javase/16/docs/api/java.base/java/io/ByteArrayOutputStream.html#toString()
[3] 
https://docs.oracle.com/en/java/javase/16/docs/api/java.base/java/io/ByteArrayOutputStream.html#toString(java.lang.String)

-

Commit messages:
 - 8264512: jdk/test/jdk/java/util/prefs/ExportNode.java relies on default 
platform encoding

Changes: https://git.openjdk.java.net/jdk/pull/3332/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk=3332=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8264512
  Stats: 39 lines in 1 file changed: 21 ins; 0 del; 18 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3332.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3332/head:pull/3332

PR: https://git.openjdk.java.net/jdk/pull/3332

Re: RFR: 8037397: Lost nested character class after intersection && [v3]

2021-04-02 Thread Roger Riggs

On Fri, 2 Apr 2021 15:48:50 GMT, Ian Graves  wrote:

>> Bug fix with the intersection `&&` operator in regex patterns. In 
>> JDK-8037397, some character classes on the right hand side of the operator 
>> are dropped in cases where nested `[..]` classes are used with non "nested" 
>> ones.
>
> Ian Graves has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Adding a test.

Update copyright before push.

-

Marked as reviewed by rriggs (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/3291

Re: RFR: 8037397: Lost nested character class after intersection && [v4]

2021-04-02 Thread Ian Graves

> Bug fix with the intersection `&&` operator in regex patterns. In 
> JDK-8037397, some character classes on the right hand side of the operator 
> are dropped in cases where nested `[..]` classes are used with non "nested" 
> ones.

Ian Graves has updated the pull request incrementally with one additional 
commit since the last revision:

  Bumping copyright date.

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3291/files
  - new: https://git.openjdk.java.net/jdk/pull/3291/files/9a46443d..de0fa217

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=3291=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=3291=02-03

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3291.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3291/head:pull/3291

PR: https://git.openjdk.java.net/jdk/pull/3291

Re: RFR: 8264148: Update spec for exceptions retrofitted for exception chaining

2021-04-02 Thread Serguei Spitsyn

On Wed, 24 Mar 2021 23:17:46 GMT, Joe Darcy  wrote:

> 8264148: Update spec for exceptions retrofitted for exception chaining

Joe,
The Serviceability part looks good.
Thanks,
Serguei

-

PR: https://git.openjdk.java.net/jdk/pull/3182

Integrated: 8264544: Case-insensitive comparison issue with supplementary characters.

2021-04-02 Thread Naoto Sato

On Thu, 1 Apr 2021 03:24:04 GMT, Naoto Sato  wrote:

> Please review the fix to the subject issue. Thanks to the contribution by 
> Chris Johnson.

This pull request has now been integrated.

Changeset: 6c145c47
Author:Naoto Sato 
URL:   https://git.openjdk.java.net/jdk/commit/6c145c47
Stats: 11 lines in 3 files changed: 3 ins; 0 del; 8 mod

8264544: Case-insensitive comparison issue with supplementary characters.

Co-authored-by: Chris Johnson 
Reviewed-by: joehw, iris, alanb

-

PR: https://git.openjdk.java.net/jdk/pull/3300

Re: RFR: 8037397: Lost nested character class after intersection && [v3]

2021-04-02 Thread Ian Graves

On Thu, 1 Apr 2021 00:33:26 GMT, Florent Guillaume 
 wrote:

>> Ian Graves has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Adding a test.
>
> src/java.base/share/classes/java/util/regex/Pattern.java line 2666:
> 
>> 2664: } else { // abc&
>> 2665: unread();
>> 2666: if(right == null) {
> 
> Missing space after `if`. There will be a unit test as well?
> 
> (Please tell me if these comments aren't suitable or relevant, as I'm not 
> involved in the project.)

Yep, they are suitable thanks!

-

PR: https://git.openjdk.java.net/jdk/pull/3291

Re: RFR: 8037397: Lost nested character class after intersection && [v3]

2021-04-02 Thread Ian Graves

> Bug fix with the intersection `&&` operator in regex patterns. In 
> JDK-8037397, some character classes on the right hand side of the operator 
> are dropped in cases where nested `[..]` classes are used with non "nested" 
> ones.

Ian Graves has updated the pull request incrementally with one additional 
commit since the last revision:

  Adding a test.

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3291/files
  - new: https://git.openjdk.java.net/jdk/pull/3291/files/2a783a1e..9a46443d

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=3291=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=3291=01-02

  Stats: 35 lines in 1 file changed: 34 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3291.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3291/head:pull/3291

PR: https://git.openjdk.java.net/jdk/pull/3291

Re: RFR: 8037397: Lost nested character class after intersection && [v3]

2021-04-02 Thread Ian Graves

On Thu, 1 Apr 2021 14:21:20 GMT, Roger Riggs  wrote:

>> Ian Graves has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Adding a test.
>
> Please add a test.

Tagging with CSR because this will impact a long-standing (buggy) behavior.

-

PR: https://git.openjdk.java.net/jdk/pull/3291

Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v2]

2021-04-02 Thread Vladimir Ivanov

On Fri, 2 Apr 2021 14:41:06 GMT, Jorn Vernee  wrote:

>> src/java.base/share/classes/java/lang/invoke/InvokerBytecodeGenerator.java 
>> line 874:
>> 
>>> 872: case NEW_ARRAY:
>>> 873: Class rtype = 
>>> name.function.methodType().returnType();
>>> 874: if (isStaticallyNameable(rtype)) {
>> 
>> Why do you remote `isStaticallyNameable(rtype)` check? 
>> 
>> It is there for a reason: only a very limited set of classes can be directly 
>> referenced from LF bytecode.
>
> I removed the `NEW_ARRAY` intrinsic altogether at first, but added it back in 
> the latest commit. I didn't add the check since I was not aware of the 
> restriction (looks like some of the tests failed as well).
> 
> Good to know, will add a check.

Also, please, add a test case for such scenario. It should be trivial: just use 
a class defined by the test (loaded by `AppClassLoader`).

>> src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1897:
>> 
>>> 1895: // field of the receiver handle.
>>> 1896: // This is left as a followup enhancement, as it needs to be 
>>> investigated if this causes
>>> 1897: // profile pollution.
>> 
>> Profile pollution shouldn't be a problem here: both types (element type + 
>> array type) will be constants attached to the "receiver" BMH instance and 
>> during inlining will be fetched from there.  
>> 
>> I'm fine with handling it as a separate RFE.
>
> That's what I thought as well (but not 100% sure). I can partially roll back 
> the last commit so the code still uses an injected array constructor handle, 
> and then it should be easy to add caching in the cases where the component 
> type is a reference type.

Whatever you prefer. I'm fine with it either way.

-

PR: https://git.openjdk.java.net/jdk/pull/3306

Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v2]

2021-04-02 Thread Jorn Vernee

On Fri, 2 Apr 2021 13:56:31 GMT, Vladimir Ivanov  wrote:

>> Jorn Vernee has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   - Address review comments
>>   - Use cached version of store func getter
>>   - Use ARRAY_STORE intrinsic for array stores
>>   - Generate direct call to Array.newInstance instead of using an array 
>> constructor handle
>>   - Intrinsify call to Array.newInstance if the component type is constant
>
> src/java.base/share/classes/java/lang/invoke/InvokerBytecodeGenerator.java 
> line 874:
> 
>> 872: case NEW_ARRAY:
>> 873: Class rtype = 
>> name.function.methodType().returnType();
>> 874: if (isStaticallyNameable(rtype)) {
> 
> Why do you remote `isStaticallyNameable(rtype)` check? 
> 
> It is there for a reason: only a very limited set of classes can be directly 
> referenced from LF bytecode.

I removed the `NEW_ARRAY` intrinsic altogether at first, but added it back in 
the latest commit. I didn't add the check since I was not aware of the 
restriction.

Good to know, will add a check.

> src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1897:
> 
>> 1895: // field of the receiver handle.
>> 1896: // This is left as a followup enhancement, as it needs to be 
>> investigated if this causes
>> 1897: // profile pollution.
> 
> Profile pollution shouldn't be a problem here: both types (element type + 
> array type) will be constants attached to the "receiver" BMH instance and 
> during inlining will be fetched from there.  
> 
> I'm fine with handling it as a separate RFE.

That's what I thought as well (but not 100% sure). I can partially roll back 
the last commit so the code still uses an injected array constructor handle, 
and then it should be easy to add caching in the cases where the component type 
is a reference type.

-

PR: https://git.openjdk.java.net/jdk/pull/3306

Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v2]

2021-04-02 Thread Vladimir Ivanov

On Fri, 2 Apr 2021 13:17:55 GMT, Jorn Vernee  wrote:

>> This patch speeds up MethodHandle.asCollector handles where the array type 
>> is not Object[], as well as speeding up all collectors where the arity is 
>> greater than 10.
>> 
>> The old code is creating a collector handle by combining a set of hard coded 
>> methods for collecting arguments into an Object[], up to a maximum of ten 
>> elements, and then copying this intermediate array into a final array.
>> 
>> In principle, it shouldn't matter how slow (or fast) this handle is, because 
>> it should be replaced by existing bytecode intrinsification which does the 
>> right thing. But, through investigation it turns out that the 
>> intrinsification is only being applied in a very limited amount of cases: 
>> Object[] with max 10 elements only, only for the intermediate collector 
>> handles. Every other collector shape incurs overhead because it essentially 
>> ends up calling the ineffecient fallback handle.
>> 
>> Rather than sticking on a band aid (I tried this, but it turned out to be 
>> very tricky to untangle the existing code), the new code replaces the 
>> existing implementation with a collector handle implemented using a 
>> LambdaForm, which removes the need for intrinsification, and also greatly 
>> reduces code-complexity of the implementation. (I plan to expose this 
>> collector using a public API in the future as well, so users don't have to 
>> go through MHs::identity to make a collector).
>> 
>> The old code was also using a special lambda form transform for collecting 
>> arguments into an array. I believe this was done to take advantage of the 
>> existing-but-limited bytecode intrinsification, at least for Object[] with 
>> less than 10 elements.
>> 
>> The new code just uses the existing collect arguments transform with the 
>> newly added collector handle as filter, and this works just as well for the 
>> existing case, but as a bonus is also much simpler, since no separate 
>> transform is needed. Using the collect arguments transform should also 
>> improve sharing.
>> 
>> As a result of these changes a lot of code was unused and has been removed 
>> in this patch.
>> 
>> Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch), 
>> as well as another variant of the benchmark that used a declared static 
>> identity method instead of MHs::identity (not included). Before/after 
>> comparison of MethodHandleAs* benchmarks (no differences there).
>> 
>> Here are some numbers from the added benchmark:
>> 
>> Before:
>> BenchmarkMode  CntScoreError  
>> Units
>> TypedAsCollector.testIntCollect  avgt   30  189.156 �  1.796  
>> ns/op
>> TypedAsCollector.testIntCollectHighArity avgt   30  660.549 � 10.159  
>> ns/op
>> TypedAsCollector.testObjectCollect   avgt   307.092 �  0.042  
>> ns/op
>> TypedAsCollector.testObjectCollectHighArity  avgt   30   65.225 �  0.546  
>> ns/op
>> TypedAsCollector.testStringCollect   avgt   30   28.511 �  0.243  
>> ns/op
>> TypedAsCollector.testStringCollectHighArity  avgt   30   57.054 �  0.635  
>> ns/op
>> (as you can see, just the Object[] with arity less than 10 case is fast here)
>> After:
>> BenchmarkMode  Cnt  Score   Error  Units
>> TypedAsCollector.testIntCollect  avgt   30  6.569 � 0.131  ns/op
>> TypedAsCollector.testIntCollectHighArity avgt   30  8.923 � 0.066  ns/op
>> TypedAsCollector.testObjectCollect   avgt   30  6.813 � 0.035  ns/op
>> TypedAsCollector.testObjectCollectHighArity  avgt   30  9.718 � 0.066  ns/op
>> TypedAsCollector.testStringCollect   avgt   30  6.737 � 0.016  ns/op
>> TypedAsCollector.testStringCollectHighArity  avgt   30  9.618 � 0.052  ns/op
>> 
>> Thanks,
>> Jorn
>
> Jorn Vernee has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   - Address review comments
>   - Use cached version of store func getter
>   - Use ARRAY_STORE intrinsic for array stores
>   - Generate direct call to Array.newInstance instead of using an array 
> constructor handle
>   - Intrinsify call to Array.newInstance if the component type is constant

src/java.base/share/classes/java/lang/invoke/InvokerBytecodeGenerator.java line 
874:

> 872: case NEW_ARRAY:
> 873: Class rtype = 
> name.function.methodType().returnType();
> 874: if (isStaticallyNameable(rtype)) {

Why do you remote `isStaticallyNameable(rtype)` check? 

It is there for a reason: only a very limited set of classes can be directly 
referenced from LF bytecode.

src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1897:

> 1895: // field of the receiver handle.
> 1896: // This is left as a followup enhancement, as it needs to be 
> investigated if this causes
> 1897: // profile pollution.

Profile pollution

Re: RFR: JDK-8264572 reinsert common pool parallelism assignment

2021-04-02 Thread Doug Lea

On Fri, 2 Apr 2021 13:20:44 GMT, Alan Bateman  wrote:

>> This reinserts setting common pool parallelism inadvertently clobbered in 
>> JDK-8259800
>
> This looks okay to me, I just wonder if we should have a test to the value of 
> getCommonPoolParallelism.

We once had a test, but now that Runtime.availableProcessors (the only means of 
checking) can return varying values, we can't include in tck  tests. The error 
here wasn't caught  because we didn't run the usual performance tests (that 
catch these problems immediately) because I was focused only on JDK-8259800. 
Sorry.

-

PR: https://git.openjdk.java.net/jdk/pull/3324

Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v2]

2021-04-02 Thread Jorn Vernee

On Thu, 1 Apr 2021 19:19:10 GMT, Paul Sandoz  wrote:

>> Jorn Vernee has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   - Address review comments
>>   - Use cached version of store func getter
>>   - Use ARRAY_STORE intrinsic for array stores
>>   - Generate direct call to Array.newInstance instead of using an array 
>> constructor handle
>>   - Intrinsify call to Array.newInstance if the component type is constant
>
> That's an elegant solution.
> 
> At first i thought it might unduly perturb lambda form generation and 
> caching. but you slotted a different lambda form implementation underneath 
> the varargs implementation.

I've address review comments, plus some other things:

- I realized that I was calling the uncached version of the store function 
factory. Fixed that.
- I also realized that there's already an `ARRAY_STORE` intrinsic, which I'm 
now using to avoid generating a call.
- I also realized that since we only have 1 array creation handle per lambda 
form, we can instead generate a direct call to `Array::newInstance` instead of 
going through the array constructor handle (which also avoids having to use a 
BoundMethodHandle).
- Finally, I added an instrinsic, under the old `NEW_ARRAY` name, that 
intrinsifies a call to `Array::newInstance` if the component type argument is 
constant (which it is in this case).

As a result, the lambda form is now fully intrinsified (no more calls in the 
generated bytecode) e.g.:

static java.lang.Object collector001__L(java.lang.Object, java.lang.Object, 
java.lang.Object, java.lang.Object);
Code:
   0: iconst_3
   1: anewarray #12 // class java/lang/String
   4: astore4
   6: aload 4
   8: checkcast #14 // class "[Ljava/lang/String;"
  11: dup
  12: astore4
  14: iconst_0
  15: aload_1
  16: checkcast #12 // class java/lang/String
  19: aastore
  20: aload 4
  22: iconst_1
  23: aload_2
  24: checkcast #12 // class java/lang/String
  27: aastore
  28: aload 4
  30: iconst_2
  31: aload_3
  32: checkcast #12 // class java/lang/String
  35: aastore
  36: aload 4
  38: areturn

Thanks,
Jorn

-

PR: https://git.openjdk.java.net/jdk/pull/3306

Re: RFR: JDK-8264572 reinsert common pool parallelism assignment

2021-04-02 Thread Alan Bateman

On Fri, 2 Apr 2021 10:52:45 GMT, Doug Lea  wrote:

> This reinserts setting common pool parallelism inadvertently clobbered in 
> JDK-8259800

This looks okay to me, I just wonder if we should have a test to the value of 
getCommonPoolParallelism.

-

Marked as reviewed by alanb (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/3324

Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v2]

2021-04-02 Thread Jorn Vernee

> This patch speeds up MethodHandle.asCollector handles where the array type is 
> not Object[], as well as speeding up all collectors where the arity is 
> greater than 10.
> 
> The old code is creating a collector handle by combining a set of hard coded 
> methods for collecting arguments into an Object[], up to a maximum of ten 
> elements, and then copying this intermediate array into a final array.
> 
> In principle, it shouldn't matter how slow (or fast) this handle is, because 
> it should be replaced by existing bytecode intrinsification which does the 
> right thing. But, through investigation it turns out that the 
> intrinsification is only being applied in a very limited amount of cases: 
> Object[] with max 10 elements only, only for the intermediate collector 
> handles. Every other collector shape incurs overhead because it essentially 
> ends up calling the ineffecient fallback handle.
> 
> Rather than sticking on a band aid (I tried this, but it turned out to be 
> very tricky to untangle the existing code), the new code replaces the 
> existing implementation with a collector handle implemented using a 
> LambdaForm, which removes the need for intrinsification, and also greatly 
> reduces code-complexity of the implementation. (I plan to expose this 
> collector using a public API in the future as well, so users don't have to go 
> through MHs::identity to make a collector).
> 
> The old code was also using a special lambda form transform for collecting 
> arguments into an array. I believe this was done to take advantage of the 
> existing-but-limited bytecode intrinsification, at least for Object[] with 
> less than 10 elements.
> 
> The new code just uses the existing collect arguments transform with the 
> newly added collector handle as filter, and this works just as well for the 
> existing case, but as a bonus is also much simpler, since no separate 
> transform is needed. Using the collect arguments transform should also 
> improve sharing.
> 
> As a result of these changes a lot of code was unused and has been removed in 
> this patch.
> 
> Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch), 
> as well as another variant of the benchmark that used a declared static 
> identity method instead of MHs::identity (not included). Before/after 
> comparison of MethodHandleAs* benchmarks (no differences there).
> 
> Here are some numbers from the added benchmark:
> 
> Before:
> BenchmarkMode  CntScoreError  
> Units
> TypedAsCollector.testIntCollect  avgt   30  189.156 �  1.796  
> ns/op
> TypedAsCollector.testIntCollectHighArity avgt   30  660.549 � 10.159  
> ns/op
> TypedAsCollector.testObjectCollect   avgt   307.092 �  0.042  
> ns/op
> TypedAsCollector.testObjectCollectHighArity  avgt   30   65.225 �  0.546  
> ns/op
> TypedAsCollector.testStringCollect   avgt   30   28.511 �  0.243  
> ns/op
> TypedAsCollector.testStringCollectHighArity  avgt   30   57.054 �  0.635  
> ns/op
> (as you can see, just the Object[] with arity less than 10 case is fast here)
> After:
> BenchmarkMode  Cnt  Score   Error  Units
> TypedAsCollector.testIntCollect  avgt   30  6.569 � 0.131  ns/op
> TypedAsCollector.testIntCollectHighArity avgt   30  8.923 � 0.066  ns/op
> TypedAsCollector.testObjectCollect   avgt   30  6.813 � 0.035  ns/op
> TypedAsCollector.testObjectCollectHighArity  avgt   30  9.718 � 0.066  ns/op
> TypedAsCollector.testStringCollect   avgt   30  6.737 � 0.016  ns/op
> TypedAsCollector.testStringCollectHighArity  avgt   30  9.618 � 0.052  ns/op
> 
> Thanks,
> Jorn

Jorn Vernee has updated the pull request incrementally with one additional 
commit since the last revision:

  - Address review comments
  - Use cached version of store func getter
  - Use ARRAY_STORE intrinsic for array stores
  - Generate direct call to Array.newInstance instead of using an array 
constructor handle
  - Intrinsify call to Array.newInstance if the component type is constant

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3306/files
  - new: https://git.openjdk.java.net/jdk/pull/3306/files/46bc5d19..313ffaaf

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=3306=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=3306=00-01

  Stats: 57 lines in 2 files changed: 27 ins; 19 del; 11 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3306.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3306/head:pull/3306

PR: https://git.openjdk.java.net/jdk/pull/3306

Re: RFR: 8203359: Container level resources events [v7]

2021-04-02 Thread Jaroslav Bachorik

> With this change it becomes possible to surface various cgroup level metrics 
> (available via `jdk.internal.platform.Metrics`) as JFR events.
> 
> Only a subset of the metrics exposed by `jdk.internal.platform.Metrics` is 
> turned into JFR events to start with.
> * CPU related metrics
> * Memory related metrics
> * I/O related metrics
> 
> For each of those subsystems a configuration data will be emitted as well. 
> The initial proposal is to emit the configuration data events at least once 
> per chunk and the metrics values at 30 seconds interval. 
> By using these values the emitted events seem to contain useful information 
> without increasing overhead (the metrics values are read from `/proc` 
> filesystem so that should not be done too frequently).

Jaroslav Bachorik has updated the pull request incrementally with one 
additional commit since the last revision:

  Remove trailing spaces

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3126/files
  - new: https://git.openjdk.java.net/jdk/pull/3126/files/78dc85d3..f4372e23

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=3126=06
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=3126=05-06

  Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3126.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3126/head:pull/3126

PR: https://git.openjdk.java.net/jdk/pull/3126

RFR: JDK-8264572 reinsert common pool parallelism assignment

2021-04-02 Thread Doug Lea

This reinserts setting common pool parallelism inadvertently clobbered in 
JDK-8259800

-

Commit messages:
 - Reinsert omitted assignment
 - Reinsert omitted assignment

Changes: https://git.openjdk.java.net/jdk/pull/3324/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk=3324=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8264572
  Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3324.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3324/head:pull/3324

PR: https://git.openjdk.java.net/jdk/pull/3324

Re: RFR: 8203359: Container level resources events [v6]

2021-04-02 Thread Jaroslav Bachorik

> With this change it becomes possible to surface various cgroup level metrics 
> (available via `jdk.internal.platform.Metrics`) as JFR events.
> 
> Only a subset of the metrics exposed by `jdk.internal.platform.Metrics` is 
> turned into JFR events to start with.
> * CPU related metrics
> * Memory related metrics
> * I/O related metrics
> 
> For each of those subsystems a configuration data will be emitted as well. 
> The initial proposal is to emit the configuration data events at least once 
> per chunk and the metrics values at 30 seconds interval. 
> By using these values the emitted events seem to contain useful information 
> without increasing overhead (the metrics values are read from `/proc` 
> filesystem so that should not be done too frequently).

Jaroslav Bachorik has updated the pull request incrementally with one 
additional commit since the last revision:

  Doh

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3126/files
  - new: https://git.openjdk.java.net/jdk/pull/3126/files/2514b1a7..78dc85d3

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=3126=05
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=3126=04-05

  Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3126.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3126/head:pull/3126

PR: https://git.openjdk.java.net/jdk/pull/3126

Re: RFR: 8203359: Container level resources events [v5]

2021-04-02 Thread Jaroslav Bachorik

> With this change it becomes possible to surface various cgroup level metrics 
> (available via `jdk.internal.platform.Metrics`) as JFR events.
> 
> Only a subset of the metrics exposed by `jdk.internal.platform.Metrics` is 
> turned into JFR events to start with.
> * CPU related metrics
> * Memory related metrics
> * I/O related metrics
> 
> For each of those subsystems a configuration data will be emitted as well. 
> The initial proposal is to emit the configuration data events at least once 
> per chunk and the metrics values at 30 seconds interval. 
> By using these values the emitted events seem to contain useful information 
> without increasing overhead (the metrics values are read from `/proc` 
> filesystem so that should not be done too frequently).

Jaroslav Bachorik has updated the pull request incrementally with one 
additional commit since the last revision:

  Report container type and register events conditionally

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3126/files
  - new: https://git.openjdk.java.net/jdk/pull/3126/files/79c91f57..2514b1a7

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=3126=04
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=3126=03-04

  Stats: 41 lines in 2 files changed: 24 ins; 8 del; 9 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3126.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3126/head:pull/3126

PR: https://git.openjdk.java.net/jdk/pull/3126

Re: RFR: 8203359: Container level resources events

2021-04-02 Thread Jaroslav Bachorik

On Thu, 1 Apr 2021 15:55:59 GMT, Jaroslav Bachorik  
wrote:

>>> Does each getter call result in parsing /proc, or do things aggregated over 
>>> several calls or hooks?
>> 
>> From the looks of it the event emitting code uses `Metrics.java` interface 
>> for retrieving the info. Each call to a method exposed by Metrics result in 
>> file IO on some cgroup (v1 or v2) interface file(s) in `/sys/fs/...`. I 
>> don't see any aggregation being done.
>> 
>> On the hotspot side, we implemented some caching for frequent calls 
>> (JDK-8232207, JDK-8227006), but we didn't do that yet for the Java side 
>> since there wasn't any need (so far). If calls are becoming frequent with 
>> this it should be reconsidered.
>> 
>> So +1 on getting some data on what the perf penalty of this is.
>
> Thanks to all for chiming in!
> 
> I have added the tests to 
> `test/hotspot/jtreg/containers/docker/TestJFREvents.java` where there already 
> were some templates for the container event data.
> 
> As for the performance - as expected, extracting the data from `/proc` is not 
> exactly cheap. On my test c5.4xlarge instance I am getting an average 
> wall-clock time to generate the usage/throttling events (one instance of 
> each) of ~15ms.
> I would argue that 15ms per 30s (the default emission period for those 
> events) might be acceptable to start with. 
> 
> Caching of cgroups parsed data would help if the emission period is shorter 
> than the cache TTL. This is exacerbated by the fact that (almost) each 
> container event type requires data from a different cgroups control file - 
> hence the data will not be shared between the event type instances even if 
> cached. Realistically, caching benefits would become visible only for 
> sub-second emission periods.
> 
> If the caching is still required I would suggest having a follow up ticket 
> just for that - it will require setting up some benchmarks to justify the 
> changes that would need to be done in the metrics implementation.

I tried to measure the startup regression and here are my observations:
* Startup is not affected unless the application is started with JFR
* The extra events and hooks take ~5ms on my work machine
* It is possible not to register those events and hooks in a non-container env 
- then the overhead is 20-50us which it takes to figure out whether running in 
container

In order to minimize the effect this change will have on the startup I would 
suggest using conditional registration unless I hear strong objections to that.

-

PR: https://git.openjdk.java.net/jdk/pull/3126

Re: RFR: 8256245: AArch64: Implement Base64 decoding intrinsic [v3]

2021-04-02 Thread Andrew Haley

On Fri, 2 Apr 2021 03:10:57 GMT, Dong Bo  wrote:

>> In JDK-8248188, IntrinsicCandidate and API is added for Base64 decoding.
>> Base64 decoding can be improved on aarch64 with ld4/tbl/tbx/st3, a basic 
>> idea can be found at 
>> http://0x80.pl/articles/base64-simd-neon.html#encoding-quadwords.
>> 
>> Patch passed jtreg tier1-3 tests with linux-aarch64-server-fastdebug build.
>> Tests in `test/jdk/java/util/Base64/` and 
>> `compiler/intrinsics/base64/TestBase64.java` runned specially for the 
>> correctness of the implementation.
>> 
>> There can be illegal characters at the start of the input if the data is 
>> MIME encoded.
>> It would be no benefits to use SIMD for this case, so the stub use no-simd 
>> instructions for MIME encoded data now.
>> 
>> A JMH micro, Base64Decode.java, is added for performance test.
>> With different input length (upper-bounded by parameter `maxNumBytes` in the 
>> JMH micro),
>> we witness ~2.5x improvements with long inputs and no regression with short 
>> inputs for raw base64 decodeing, minor improvements (~10.95%) for MIME on 
>> Kunpeng916.
>> 
>> The Base64Decode.java JMH micro-benchmark results:
>> 
>> Benchmark  (lineSize)  (maxNumBytes)  Mode  Cnt  
>>  Score   Error  Units
>> 
>> # Kunpeng916, intrinsic
>> Base64Decode.testBase64Decode   4  1  avgt5  
>> 48.614 ± 0.609  ns/op
>> Base64Decode.testBase64Decode   4  3  avgt5  
>> 58.199 ± 1.650  ns/op
>> Base64Decode.testBase64Decode   4  7  avgt5  
>> 69.400 ± 0.931  ns/op
>> Base64Decode.testBase64Decode   4 32  avgt5  
>> 96.818 ± 1.687  ns/op
>> Base64Decode.testBase64Decode   4 64  avgt5 
>> 122.856 ± 9.217  ns/op
>> Base64Decode.testBase64Decode   4 80  avgt5 
>> 130.935 ± 1.667  ns/op
>> Base64Decode.testBase64Decode   4 96  avgt5 
>> 143.627 ± 1.751  ns/op
>> Base64Decode.testBase64Decode   4112  avgt5 
>> 152.311 ± 1.178  ns/op
>> Base64Decode.testBase64Decode   4512  avgt5 
>> 342.631 ± 0.584  ns/op
>> Base64Decode.testBase64Decode   4   1000  avgt5 
>> 573.635 ± 1.050  ns/op
>> Base64Decode.testBase64Decode   4  2  avgt5
>> 9534.136 ±45.172  ns/op
>> Base64Decode.testBase64Decode   4  5  avgt5   
>> 22718.726 ±   192.070  ns/op
>> Base64Decode.testBase64MIMEDecode   4  1  avgt   10  
>> 63.558 ±0.336  ns/op
>> Base64Decode.testBase64MIMEDecode   4  3  avgt   10  
>> 82.504 ±0.848  ns/op
>> Base64Decode.testBase64MIMEDecode   4  7  avgt   10 
>> 120.591 ±0.608  ns/op
>> Base64Decode.testBase64MIMEDecode   4 32  avgt   10 
>> 324.314 ±6.236  ns/op
>> Base64Decode.testBase64MIMEDecode   4 64  avgt   10 
>> 532.678 ±4.670  ns/op
>> Base64Decode.testBase64MIMEDecode   4 80  avgt   10 
>> 678.126 ±4.324  ns/op
>> Base64Decode.testBase64MIMEDecode   4 96  avgt   10 
>> 771.603 ±6.393  ns/op
>> Base64Decode.testBase64MIMEDecode   4112  avgt   10 
>> 889.608 ±   0.759  ns/op
>> Base64Decode.testBase64MIMEDecode   4512  avgt   10
>> 3663.557 ±3.422  ns/op
>> Base64Decode.testBase64MIMEDecode   4   1000  avgt   10
>> 7017.784 ±9.128  ns/op
>> Base64Decode.testBase64MIMEDecode   4  2  avgt   10  
>> 128670.660 ± 7951.521  ns/op
>> Base64Decode.testBase64MIMEDecode   4  5  avgt   10  
>> 317113.667 ±  161.758  ns/op
>> 
>> # Kunpeng916, default
>> Base64Decode.testBase64Decode   4  1  avgt5  
>> 48.455 ±   0.571  ns/op
>> Base64Decode.testBase64Decode   4  3  avgt5  
>> 57.937 ±   0.505  ns/op
>> Base64Decode.testBase64Decode   4  7  avgt5  
>> 73.823 ±   1.452  ns/op
>> Base64Decode.testBase64Decode   4 32  avgt5 
>> 106.484 ±   1.243  ns/op
>> Base64Decode.testBase64Decode   4 64  avgt5 
>> 141.004 ±   1.188  ns/op
>> Base64Decode.testBase64Decode   4 80  avgt5 
>> 156.284 ±   0.572  ns/op
>> Base64Decode.testBase64Decode   4 96  avgt5 
>> 174.137 ±   0.177  ns/op
>> Base64Decode.testBase64Decode   4112  avgt5 
>> 188.445 ±   0.572  ns/op
>> Base64Decode.testBase64Decode   4512  avgt5 
>> 610.847 ±   1.559  ns/op
>> Base64Decode.testBase64Decode   4   1000  avgt

Re: RFR: 8256245: AArch64: Implement Base64 decoding intrinsic

2021-04-02 Thread Andrew Haley

On Fri, 2 Apr 2021 07:05:26 GMT, Dong Bo  wrote:

> PING... Any suggestions on the updated commit?

Once you reply to the comments, sure.

-

PR: https://git.openjdk.java.net/jdk/pull/3228

Integrated: 8264109: Add vectorized implementation for VectorMask.andNot()

2021-04-02 Thread Xiaohong Gong

On Fri, 26 Mar 2021 01:50:33 GMT, Xiaohong Gong  wrote:

> Currently "VectorMask.andNot()" is not vectorized:
> public VectorMask andNot(VectorMask m) {
> // FIXME: Generate good code here.
> return bOp(m, (i, a, b) -> a && !b);
> }
> This can be implemented with` "and(m.not())" `directly. Since 
> `"VectorMask.and()/not()" `have been vectorized in hotspot, `"andNot"`
> can be vectorized as well by calling them.
> 
> The performance gain is >100% for such a simple JMH:
>   @Benchmark
>   public Object andNot(Blackhole bh) {
> boolean[] mask = fm.apply(SPECIES.length());
> boolean[] r = fmt.apply(SPECIES.length());
> VectorMask rm = VectorMask.fromArray(SPECIES, r, 0);
> 
> for (int ic = 0; ic < INVOC_COUNT; ic++) {
>   for (int i = 0; i < mask.length; i += SPECIES.length()) {
> VectorMask vmask = VectorMask.fromArray(SPECIES, mask, i);
>   rm = rm.andNot(vmask);
> }
> }
> return rm;
>   }

This pull request has now been integrated.

Changeset: 7d0a0bad
Author:Xiaohong Gong 
Committer: Ningsheng Jian 
URL:   https://git.openjdk.java.net/jdk/commit/7d0a0bad
Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod

8264109: Add vectorized implementation for VectorMask.andNot()

Reviewed-by: psandoz, njian

-

PR: https://git.openjdk.java.net/jdk/pull/3211

Re: RFR: 8264109: Add vectorized implementation for VectorMask.andNot() [v2]

2021-04-02 Thread Ningsheng Jian

On Thu, 1 Apr 2021 03:39:43 GMT, Xiaohong Gong  wrote:

>> Currently "VectorMask.andNot()" is not vectorized:
>> public VectorMask andNot(VectorMask m) {
>> // FIXME: Generate good code here.
>> return bOp(m, (i, a, b) -> a && !b);
>> }
>> This can be implemented with` "and(m.not())" `directly. Since 
>> `"VectorMask.and()/not()" `have been vectorized in hotspot, `"andNot"`
>> can be vectorized as well by calling them.
>> 
>> The performance gain is >100% for such a simple JMH:
>>   @Benchmark
>>   public Object andNot(Blackhole bh) {
>> boolean[] mask = fm.apply(SPECIES.length());
>> boolean[] r = fmt.apply(SPECIES.length());
>> VectorMask rm = VectorMask.fromArray(SPECIES, r, 0);
>> 
>> for (int ic = 0; ic < INVOC_COUNT; ic++) {
>>   for (int i = 0; i < mask.length; i += SPECIES.length()) {
>> VectorMask vmask = VectorMask.fromArray(SPECIES, mask, i);
>>   rm = rm.andNot(vmask);
>> }
>> }
>> return rm;
>>   }
>
> Xiaohong Gong has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Move the changing to AbstractMask.andNot and revert changes in VectorMask
>   
>   Change-Id: Ie3ec8f53cb24526c8e1ccc68038355d024910818
>   CustomizedGitHooks: yes

LGTM

-

Marked as reviewed by njian (Committer).

PR: https://git.openjdk.java.net/jdk/pull/3211

Re: RFR: 8264109: Add vectorized implementation for VectorMask.andNot() [v2]

2021-04-02 Thread Xiaohong Gong

On Fri, 2 Apr 2021 09:59:31 GMT, Ningsheng Jian  wrote:

>> Xiaohong Gong has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Move the changing to AbstractMask.andNot and revert changes in VectorMask
>>   
>>   Change-Id: Ie3ec8f53cb24526c8e1ccc68038355d024910818
>>   CustomizedGitHooks: yes
>
> LGTM

Thanks for the review @nsjian !

-

PR: https://git.openjdk.java.net/jdk/pull/3211

Re: RFR: 4890732: GZIPOutputStream doesn't support optional GZIP fields [v3]

2021-04-02 Thread Lin Zang

On Wed, 24 Mar 2021 10:25:44 GMT, Lance Andersen  wrote:

>> Hi Lance，
>> Thanks a lot for your review. I will update the PR ASAP. 
>> May I ask your help to also review the CSR？ Thanks!
>> 
>> BRs,
>> Lin
>
> Hi Lin,
> 
> On Mar 24, 2021, at 2:51 AM, Lin Zang ***@***.**@***.***>> wrote:
> 
> 
> 
> Hi Lance，
> Thanks a lot for your review. I will update the PR ASAP.
> May I ask your help to also review the CSR？
> 
> I believe we need to flush out some of the issues I raised that were not test 
> related as they will result in updates to the javadoc which will require an 
> update to the CSR.
> 
> 
> 
> Thanks!
> 
> BRs,
> Lin
> 
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on 
> GitHub, or 
> unsubscribe.
> 
> ***@***.***
> 
> 
> 
> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
> Oracle Java Engineering
> 1 Network Drive
> Burlington, MA 01803
> ***@***.**@***.***>

Dear @LanceAndersen,
Here I added a class `GZIPHeaderData` which is defined to hold all gzip header 
members, and make it an argument of the GZIPOutputStream constructor.  Would 
you like to help review and check whether this change looks reasonable?

BTW, this pr still lack of the negative test cases, I plan to add it when the 
constructor API is fixed.

Thanks!
Lin

-

PR: https://git.openjdk.java.net/jdk/pull/3072

Re: RFR: 4890732: GZIPOutputStream doesn't support optional GZIP fields [v4]

2021-04-02 Thread Lin Zang

> 4890732: GZIPOutputStream doesn't support optional GZIP fields

Lin Zang has updated the pull request incrementally with one additional commit 
since the last revision:

  add class GZIPHeaderData, refine testcases

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3072/files
  - new: https://git.openjdk.java.net/jdk/pull/3072/files/9b7dabfe..03b3e966

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=3072=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=3072=02-03

  Stats: 633 lines in 3 files changed: 421 ins; 183 del; 29 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3072.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3072/head:pull/3072

PR: https://git.openjdk.java.net/jdk/pull/3072

Re: RFR: 8256245: AArch64: Implement Base64 decoding intrinsic

2021-04-02 Thread Andrew Haley

On Mon, 29 Mar 2021 03:28:54 GMT, Dong Bo  wrote:

> > Please consider losing the non-SIMD case. It doesn't result in any 
> > significant gain.
> 
> The non-SIMD case is useful for MIME decoding performance.

Your test results suggest that it isn't useful for that, surely?

-

PR: https://git.openjdk.java.net/jdk/pull/3228

Re: RFR: 8256245: AArch64: Implement Base64 decoding intrinsic [v3]

2021-04-02 Thread Andrew Haley

On Fri, 2 Apr 2021 03:10:57 GMT, Dong Bo  wrote:

>> In JDK-8248188, IntrinsicCandidate and API is added for Base64 decoding.
>> Base64 decoding can be improved on aarch64 with ld4/tbl/tbx/st3, a basic 
>> idea can be found at 
>> http://0x80.pl/articles/base64-simd-neon.html#encoding-quadwords.
>> 
>> Patch passed jtreg tier1-3 tests with linux-aarch64-server-fastdebug build.
>> Tests in `test/jdk/java/util/Base64/` and 
>> `compiler/intrinsics/base64/TestBase64.java` runned specially for the 
>> correctness of the implementation.
>> 
>> There can be illegal characters at the start of the input if the data is 
>> MIME encoded.
>> It would be no benefits to use SIMD for this case, so the stub use no-simd 
>> instructions for MIME encoded data now.
>> 
>> A JMH micro, Base64Decode.java, is added for performance test.
>> With different input length (upper-bounded by parameter `maxNumBytes` in the 
>> JMH micro),
>> we witness ~2.5x improvements with long inputs and no regression with short 
>> inputs for raw base64 decodeing, minor improvements (~10.95%) for MIME on 
>> Kunpeng916.
>> 
>> The Base64Decode.java JMH micro-benchmark results:
>> 
>> Benchmark  (lineSize)  (maxNumBytes)  Mode  Cnt  
>>  Score   Error  Units
>> 
>> # Kunpeng916, intrinsic
>> Base64Decode.testBase64Decode   4  1  avgt5  
>> 48.614 ± 0.609  ns/op
>> Base64Decode.testBase64Decode   4  3  avgt5  
>> 58.199 ± 1.650  ns/op
>> Base64Decode.testBase64Decode   4  7  avgt5  
>> 69.400 ± 0.931  ns/op
>> Base64Decode.testBase64Decode   4 32  avgt5  
>> 96.818 ± 1.687  ns/op
>> Base64Decode.testBase64Decode   4 64  avgt5 
>> 122.856 ± 9.217  ns/op
>> Base64Decode.testBase64Decode   4 80  avgt5 
>> 130.935 ± 1.667  ns/op
>> Base64Decode.testBase64Decode   4 96  avgt5 
>> 143.627 ± 1.751  ns/op
>> Base64Decode.testBase64Decode   4112  avgt5 
>> 152.311 ± 1.178  ns/op
>> Base64Decode.testBase64Decode   4512  avgt5 
>> 342.631 ± 0.584  ns/op
>> Base64Decode.testBase64Decode   4   1000  avgt5 
>> 573.635 ± 1.050  ns/op
>> Base64Decode.testBase64Decode   4  2  avgt5
>> 9534.136 ±45.172  ns/op
>> Base64Decode.testBase64Decode   4  5  avgt5   
>> 22718.726 ±   192.070  ns/op
>> Base64Decode.testBase64MIMEDecode   4  1  avgt   10  
>> 63.558 ±0.336  ns/op
>> Base64Decode.testBase64MIMEDecode   4  3  avgt   10  
>> 82.504 ±0.848  ns/op
>> Base64Decode.testBase64MIMEDecode   4  7  avgt   10 
>> 120.591 ±0.608  ns/op
>> Base64Decode.testBase64MIMEDecode   4 32  avgt   10 
>> 324.314 ±6.236  ns/op
>> Base64Decode.testBase64MIMEDecode   4 64  avgt   10 
>> 532.678 ±4.670  ns/op
>> Base64Decode.testBase64MIMEDecode   4 80  avgt   10 
>> 678.126 ±4.324  ns/op
>> Base64Decode.testBase64MIMEDecode   4 96  avgt   10 
>> 771.603 ±6.393  ns/op
>> Base64Decode.testBase64MIMEDecode   4112  avgt   10 
>> 889.608 ±   0.759  ns/op
>> Base64Decode.testBase64MIMEDecode   4512  avgt   10
>> 3663.557 ±3.422  ns/op
>> Base64Decode.testBase64MIMEDecode   4   1000  avgt   10
>> 7017.784 ±9.128  ns/op
>> Base64Decode.testBase64MIMEDecode   4  2  avgt   10  
>> 128670.660 ± 7951.521  ns/op
>> Base64Decode.testBase64MIMEDecode   4  5  avgt   10  
>> 317113.667 ±  161.758  ns/op
>> 
>> # Kunpeng916, default
>> Base64Decode.testBase64Decode   4  1  avgt5  
>> 48.455 ±   0.571  ns/op
>> Base64Decode.testBase64Decode   4  3  avgt5  
>> 57.937 ±   0.505  ns/op
>> Base64Decode.testBase64Decode   4  7  avgt5  
>> 73.823 ±   1.452  ns/op
>> Base64Decode.testBase64Decode   4 32  avgt5 
>> 106.484 ±   1.243  ns/op
>> Base64Decode.testBase64Decode   4 64  avgt5 
>> 141.004 ±   1.188  ns/op
>> Base64Decode.testBase64Decode   4 80  avgt5 
>> 156.284 ±   0.572  ns/op
>> Base64Decode.testBase64Decode   4 96  avgt5 
>> 174.137 ±   0.177  ns/op
>> Base64Decode.testBase64Decode   4112  avgt5 
>> 188.445 ±   0.572  ns/op
>> Base64Decode.testBase64Decode   4512  avgt5 
>> 610.847 ±   1.559  ns/op
>> Base64Decode.testBase64Decode   4   1000  avgt

Re: RFR: 8256245: AArch64: Implement Base64 decoding intrinsic [v3]

2021-04-02 Thread Andrew Haley

On Fri, 2 Apr 2021 03:10:57 GMT, Dong Bo  wrote:

>> In JDK-8248188, IntrinsicCandidate and API is added for Base64 decoding.
>> Base64 decoding can be improved on aarch64 with ld4/tbl/tbx/st3, a basic 
>> idea can be found at 
>> http://0x80.pl/articles/base64-simd-neon.html#encoding-quadwords.
>> 
>> Patch passed jtreg tier1-3 tests with linux-aarch64-server-fastdebug build.
>> Tests in `test/jdk/java/util/Base64/` and 
>> `compiler/intrinsics/base64/TestBase64.java` runned specially for the 
>> correctness of the implementation.
>> 
>> There can be illegal characters at the start of the input if the data is 
>> MIME encoded.
>> It would be no benefits to use SIMD for this case, so the stub use no-simd 
>> instructions for MIME encoded data now.
>> 
>> A JMH micro, Base64Decode.java, is added for performance test.
>> With different input length (upper-bounded by parameter `maxNumBytes` in the 
>> JMH micro),
>> we witness ~2.5x improvements with long inputs and no regression with short 
>> inputs for raw base64 decodeing, minor improvements (~10.95%) for MIME on 
>> Kunpeng916.
>> 
>> The Base64Decode.java JMH micro-benchmark results:
>> 
>> Benchmark  (lineSize)  (maxNumBytes)  Mode  Cnt  
>>  Score   Error  Units
>> 
>> # Kunpeng916, intrinsic
>> Base64Decode.testBase64Decode   4  1  avgt5  
>> 48.614 ± 0.609  ns/op
>> Base64Decode.testBase64Decode   4  3  avgt5  
>> 58.199 ± 1.650  ns/op
>> Base64Decode.testBase64Decode   4  7  avgt5  
>> 69.400 ± 0.931  ns/op
>> Base64Decode.testBase64Decode   4 32  avgt5  
>> 96.818 ± 1.687  ns/op
>> Base64Decode.testBase64Decode   4 64  avgt5 
>> 122.856 ± 9.217  ns/op
>> Base64Decode.testBase64Decode   4 80  avgt5 
>> 130.935 ± 1.667  ns/op
>> Base64Decode.testBase64Decode   4 96  avgt5 
>> 143.627 ± 1.751  ns/op
>> Base64Decode.testBase64Decode   4112  avgt5 
>> 152.311 ± 1.178  ns/op
>> Base64Decode.testBase64Decode   4512  avgt5 
>> 342.631 ± 0.584  ns/op
>> Base64Decode.testBase64Decode   4   1000  avgt5 
>> 573.635 ± 1.050  ns/op
>> Base64Decode.testBase64Decode   4  2  avgt5
>> 9534.136 ±45.172  ns/op
>> Base64Decode.testBase64Decode   4  5  avgt5   
>> 22718.726 ±   192.070  ns/op
>> Base64Decode.testBase64MIMEDecode   4  1  avgt   10  
>> 63.558 ±0.336  ns/op
>> Base64Decode.testBase64MIMEDecode   4  3  avgt   10  
>> 82.504 ±0.848  ns/op
>> Base64Decode.testBase64MIMEDecode   4  7  avgt   10 
>> 120.591 ±0.608  ns/op
>> Base64Decode.testBase64MIMEDecode   4 32  avgt   10 
>> 324.314 ±6.236  ns/op
>> Base64Decode.testBase64MIMEDecode   4 64  avgt   10 
>> 532.678 ±4.670  ns/op
>> Base64Decode.testBase64MIMEDecode   4 80  avgt   10 
>> 678.126 ±4.324  ns/op
>> Base64Decode.testBase64MIMEDecode   4 96  avgt   10 
>> 771.603 ±6.393  ns/op
>> Base64Decode.testBase64MIMEDecode   4112  avgt   10 
>> 889.608 ±   0.759  ns/op
>> Base64Decode.testBase64MIMEDecode   4512  avgt   10
>> 3663.557 ±3.422  ns/op
>> Base64Decode.testBase64MIMEDecode   4   1000  avgt   10
>> 7017.784 ±9.128  ns/op
>> Base64Decode.testBase64MIMEDecode   4  2  avgt   10  
>> 128670.660 ± 7951.521  ns/op
>> Base64Decode.testBase64MIMEDecode   4  5  avgt   10  
>> 317113.667 ±  161.758  ns/op
>> 
>> # Kunpeng916, default
>> Base64Decode.testBase64Decode   4  1  avgt5  
>> 48.455 ±   0.571  ns/op
>> Base64Decode.testBase64Decode   4  3  avgt5  
>> 57.937 ±   0.505  ns/op
>> Base64Decode.testBase64Decode   4  7  avgt5  
>> 73.823 ±   1.452  ns/op
>> Base64Decode.testBase64Decode   4 32  avgt5 
>> 106.484 ±   1.243  ns/op
>> Base64Decode.testBase64Decode   4 64  avgt5 
>> 141.004 ±   1.188  ns/op
>> Base64Decode.testBase64Decode   4 80  avgt5 
>> 156.284 ±   0.572  ns/op
>> Base64Decode.testBase64Decode   4 96  avgt5 
>> 174.137 ±   0.177  ns/op
>> Base64Decode.testBase64Decode   4112  avgt5 
>> 188.445 ±   0.572  ns/op
>> Base64Decode.testBase64Decode   4512  avgt5 
>> 610.847 ±   1.559  ns/op
>> Base64Decode.testBase64Decode   4   1000  avgt

Re: RFR: 8264544: Case-insensitive comparison issue with supplementary characters.

2021-04-02 Thread Alan Bateman

On Thu, 1 Apr 2021 03:24:04 GMT, Naoto Sato  wrote:

> Please review the fix to the subject issue. Thanks to the contribution by 
> Chris Johnson.

Marked as reviewed by alanb (Reviewer).

-

PR: https://git.openjdk.java.net/jdk/pull/3300

Re: RFR: 8166727: javac crashed: [jimage.dll+0x1942] ImageStrings::find+0x28

2021-04-02 Thread Alan Bateman

On Thu, 1 Apr 2021 18:48:15 GMT, Jim Laskey  wrote:

>> src/java.base/share/native/libjimage/imageFile.cpp line 219:
>> 
>>> 217: // WARNING: Should never close the jimage file.
>>> 218: //  Threads may still be running at shutdown.
>>> 219: #if 0
>> 
>> Are you keeping the code in order to re-visit it again? Just wondering why 
>> it's not deleted.
>
> Leaving the code as an example of what is required to close in case the topic 
> gets revisited in the future and I get hit by a bus or dementia.

Okay although I assume someone will spot this and be tempted to remove it.

-

PR: https://git.openjdk.java.net/jdk/pull/3304

Re: RFR: 8166727: javac crashed: [jimage.dll+0x1942] ImageStrings::find+0x28

2021-04-02 Thread Alan Bateman

On Thu, 1 Apr 2021 11:48:21 GMT, Jim Laskey  wrote:

> We should never close the jimage since java threads can still be running 
> after a hard exit().

Marked as reviewed by alanb (Reviewer).

-

PR: https://git.openjdk.java.net/jdk/pull/3304

Re: RFR: 8256245: AArch64: Implement Base64 decoding intrinsic

2021-04-02 Thread Dong Bo

On Tue, 30 Mar 2021 03:24:16 GMT, Dong Bo  wrote:

>>> I think I can rewrite this part as loops.
>>> With an intial implemention, we can have almost half of the code size 
>>> reduced (1312B -> 748B). Sounds OK to you?
>> 
>> Sounds great, but I'm still somewhat concerned that the non-SIMD case only 
>> offers 3-12% performance gain. Make it just 748 bytes, and therefore not 
>> icache-hostile, then perhaps the balance of risk and reward is justified.
>
>> > With an intial implemention, we can have almost half of the code size 
>> > reduced (1312B -> 748B). Sounds OK to you?
>> 
>> Sounds great, but I'm still somewhat concerned that the non-SIMD case only 
>> offers 3-12% performance gain. Make it just 748 bytes, and therefore not 
>> icache-hostile, then perhaps the balance of risk and reward is justified.
> 
> Hi, @theRealAph @nick-arm 
> 
> The code is updated. The error handling in SIMD case was rewriten as loops.
> 
> Also combined the two non-SIMD code blocks into one.
> Due to we have only one non-SIMD loop now, it is moved into 
> `generate_base64_decodeBlock`.
> The size of the stub is 692 bytes, the non-SIMD loop takes about 92 bytes if 
> my calculation is right.
> 
> Verified with tests `test/jdk/java/util/Base64/` and 
> `compiler/intrinsics/base64/TestBase64.java`.
> Compared with previous implementation, the performance changes are negligible.
> 
> Other comments are addressed too. Thanks.

PING... Any suggestions on the updated commit?

-

PR: https://git.openjdk.java.net/jdk/pull/3228

37 matches

Mail list logo