Re: RFR: 8268081: Upgrade Unicode Data Files to 14.0.0

2022-01-05 Thread Iris Clark
On Wed, 5 Jan 2022 22:42:38 GMT, Naoto Sato  wrote:

> Please review the changes for upgrading the Unicode support in the JDK, from 
> version 13 to version 14. Corresponding CSR has also been drafted.

Marked as reviewed by iris (Reviewer).

-

PR: https://git.openjdk.java.net/jdk/pull/6974


Re: RFR: 8268081: Upgrade Unicode Data Files to 14.0.0

2022-01-05 Thread Joe Wang
On Wed, 5 Jan 2022 22:42:38 GMT, Naoto Sato  wrote:

> Please review the changes for upgrading the Unicode support in the JDK, from 
> version 13 to version 14. Corresponding CSR has also been drafted.

I like how they changed dizzy face to face with crossed-out eyes. Pistol to 
water pistol, that's even better, just to avoid any confusion  ;-)

-

PR: https://git.openjdk.java.net/jdk/pull/6974


Re: RFR: 8268081: Upgrade Unicode Data Files to 14.0.0

2022-01-05 Thread Joe Wang
On Wed, 5 Jan 2022 22:42:38 GMT, Naoto Sato  wrote:

> Please review the changes for upgrading the Unicode support in the JDK, from 
> version 13 to version 14. Corresponding CSR has also been drafted.

Marked as reviewed by joehw (Reviewer).

-

PR: https://git.openjdk.java.net/jdk/pull/6974


Integrated: Merge jdk18

2022-01-05 Thread Jesper Wilhelmsson
On Thu, 6 Jan 2022 00:42:14 GMT, Jesper Wilhelmsson  
wrote:

> Forwardport JDK 18 -> JDK 19

This pull request has now been integrated.

Changeset: 844dfb3a
Author:Jesper Wilhelmsson 
URL:   
https://git.openjdk.java.net/jdk/commit/844dfb3ab6a1d8b68ccdcc73726ee0f73cfcb3c8
Stats: 750 lines in 28 files changed: 687 ins; 8 del; 55 mod

Merge

-

PR: https://git.openjdk.java.net/jdk/pull/6975


RFR: Merge jdk18

2022-01-05 Thread Jesper Wilhelmsson
Forwardport JDK 18 -> JDK 19

-

Commit messages:
 - Merge remote-tracking branch 'jdk18/master' into Merge_jdk18
 - 8279529: ProblemList 
java/nio/channels/DatagramChannel/ManySourcesAndTargets.java on macosx-aarch64
 - 8278612: [macos] test/jdk/java/awt/dnd/RemoveDropTargetCrashTest crashes 
with VoiceOver on macOS
 - 8279525: ProblemList java/awt/GraphicsDevice/CheckDisplayModes.java on 
macosx-aarch64
 - 8278897: Alignment of heap segments is not enforced correctly
 - 8279222: Incorrect legacyMap.get in java.security.Provider after JDK-8276660
 - 8278948: compiler/vectorapi/reshape/TestVectorCastAVX1.java crashes in 
assembler

The webrevs contain the adjustments done while merging with regards to each 
parent branch:
 - master: https://webrevs.openjdk.java.net/?repo=jdk&pr=6975&range=00.0
 - jdk18: https://webrevs.openjdk.java.net/?repo=jdk&pr=6975&range=00.1

Changes: https://git.openjdk.java.net/jdk/pull/6975/files
  Stats: 750 lines in 28 files changed: 687 ins; 8 del; 55 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6975.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6975/head:pull/6975

PR: https://git.openjdk.java.net/jdk/pull/6975


RFR: 8268081: Support for Unicode 14

2022-01-05 Thread Naoto Sato
Please review the changes for upgrading the Unicode support in the JDK, from 
version 13 to version 14. Corresponding CSR has also been drafted.

-

Commit messages:
 - Amend unicode.md and icu.md files
 - Minor fixup
 - Merge branch 'master' into unicode
 - Copyright year to 2022
 - ICU4J 70.1
 - 18 -> 19
 - Merge branch 'master' into unicode
 - Unicode 14.0.0 (final)
 - UCD ver. 14.0 (beta) / Unicode Text Segmentation rev. 38 (draft)
 - ICU4J 69.1

Changes: https://git.openjdk.java.net/jdk/pull/6974/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6974&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8268081
  Stats: 3443 lines in 41 files changed: 2353 ins; 101 del; 989 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6974.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6974/head:pull/6974

PR: https://git.openjdk.java.net/jdk/pull/6974


Re: [jdk18] RFR: 8279527: Dereferencing segments backed by different scopes leads to pollution [v2]

2022-01-05 Thread Maurizio Cimadamore
On Wed, 5 Jan 2022 18:24:39 GMT, Paul Sandoz  wrote:

>> I'll change use of `owner`. It's not really possible to write 
>> checkValidStateSlow in terms of checkValidState, because the latter does a 
>> plain read of the state, whereas the former does a volatile read. Reusing 
>> one from the other would result in two reads (a plain and a volatile).
>
> Ok. My thought was that since this is slow two reads do not matter, but i did 
> not reason fully about the concurrent implications (if the fast alive check 
> returns false, the slow alive check can still return true so that seems good, 
> if the fast check returns true i was presume the slow alive check would also 
> be true, given the way state changes monotonically?)

If we're ok with a redundant plain read, then I don't think there are issues. 
You just do two reads, and the latter (the volatile one) is the one that 
counts. I don't think we can rely much on dependencies between what the plain 
read and what the volatile read will see. The state is updated in both 
direction (for shared segments) e.g. we can go from ALIVE to CLOSING then back 
to ALIVE. Or we could go from ALIVE to CLOSING to CLOSE.

That said, I guess my main reservation for writing one routine on top of the 
other is that we really want checkValidState to be only used in critical hot 
paths. It has a non-volatile semantics and an exception handling which only 
really makes sense when combined with ScopedMemoryAccess - for this reason, 
using it as an internal building primitive didn't seem to me as a great idea.

-

PR: https://git.openjdk.java.net/jdk18/pull/82


Re: [jdk18] RFR: 8279527: Dereferencing segments backed by different scopes leads to pollution [v2]

2022-01-05 Thread Paul Sandoz
On Wed, 5 Jan 2022 18:08:01 GMT, Maurizio Cimadamore  
wrote:

>> This patch fixes a performance issue when dereferencing memory segments 
>> backed by different kinds of scopes. When looking at inline traces, I found 
>> that one of our benchmark, namely `LoopOverPollutedSegment` was already 
>> hitting the ceiling of the bimorphic inline cache, specifically when 
>> checking liveness of the segment scope in the memory access hotpath 
>> (`ResourceScopeImpl::checkValidState`). The benchmark only used segments 
>> backed by confined and global scope. I then added (in the initialization 
>> "polluting" loop) segments backed by a shared scope, and then the benchmark 
>> numbers started to look as follows:
>> 
>> 
>> Benchmark  Mode  Cnt  Score   
>> Error  Units
>> LoopOverPollutedSegments.heap_segment_floats_VHavgt   30  7.004 ? 
>> 0.089  ms/op
>> LoopOverPollutedSegments.heap_segment_floats_instance  avgt   30  7.159 ? 
>> 0.016  ms/op
>> LoopOverPollutedSegments.heap_segment_ints_VH  avgt   30  7.017 ? 
>> 0.110  ms/op
>> LoopOverPollutedSegments.heap_segment_ints_instanceavgt   30  7.175 ? 
>> 0.048  ms/op
>> LoopOverPollutedSegments.heap_unsafe   avgt   30  0.243 ? 
>> 0.004  ms/op
>> LoopOverPollutedSegments.native_segment_VH avgt   30  7.366 ? 
>> 0.036  ms/op
>> LoopOverPollutedSegments.native_segment_instance   avgt   30  7.305 ? 
>> 0.098  ms/op
>> LoopOverPollutedSegments.native_unsafe avgt   30  0.238 ? 
>> 0.002  ms/op
>> 
>> 
>> That is, since now we have *three* different kinds of scopes (confined, 
>> shared and global), the call to the liveness check can no longer be inlined. 
>> One solution could be, as we do for the *base* accessor, to add a scope 
>> accessor to all memory segment implementation classes. But doing so only 
>> works ok for heap segments (for which the scope accessor just returns the 
>> global scope constants). For native segments, we're still megamorphic (as a 
>> native segment can be backed by all kinds of scopes).
>> 
>> In the end, it turned out to be much simpler to just make the liveness check 
>> monomorphic, since there's so much sharing between the code paths already. 
>> With that change, numbers of the tweaked benchmark go back to normal:
>> 
>> 
>> Benchmark  Mode  Cnt  Score   
>> Error  Units
>> LoopOverPollutedSegments.heap_segment_floats_VHavgt   30  0.241 ? 
>> 0.003  ms/op
>> LoopOverPollutedSegments.heap_segment_floats_instance  avgt   30  0.244 ? 
>> 0.003  ms/op
>> LoopOverPollutedSegments.heap_segment_ints_VH  avgt   30  0.242 ? 
>> 0.003  ms/op
>> LoopOverPollutedSegments.heap_segment_ints_instanceavgt   30  0.248 ? 
>> 0.001  ms/op
>> LoopOverPollutedSegments.heap_unsafe   avgt   30  0.247 ? 
>> 0.013  ms/op
>> LoopOverPollutedSegments.native_segment_VH avgt   30  0.245 ? 
>> 0.004  ms/op
>> LoopOverPollutedSegments.native_segment_instance   avgt   30  0.245 ? 
>> 0.001  ms/op
>> LoopOverPollutedSegments.native_unsafe avgt   30  0.247 ? 
>> 0.005  ms/op
>> 
>> 
>> Note that this patch tidies up a bit the usage of `checkValidState` vs. 
>> `checkValidStateSlow`. The former should only really be used in the hot 
>> path, while the latter is a more general routine which should be used in 
>> non-performance critical code. Making `checkValidState` monomorphic caused 
>> the `ScopeAccessError` to be generated in more places, so I needed to either 
>> update the usage to use the safer `checkValidStateSlow` (where possible) or, 
>> (in `Buffer` and `ConfinedScope`) just add extra wrapping.
>
> Maurizio Cimadamore has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Use owner field instead of accessor in checkValidStateSlow

Marked as reviewed by psandoz (Reviewer).

-

PR: https://git.openjdk.java.net/jdk18/pull/82


Re: [jdk18] RFR: 8279527: Dereferencing segments backed by different scopes leads to pollution [v2]

2022-01-05 Thread Paul Sandoz
On Wed, 5 Jan 2022 17:57:44 GMT, Maurizio Cimadamore  
wrote:

>> src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/ResourceScopeImpl.java
>>  line 190:
>> 
>>> 188: @ForceInline
>>> 189: public final void checkValidState() {
>>> 190: if (owner != null && owner != Thread.currentThread()) {
>> 
>> For consistency we could change code `checkValidStateSlow` to refer directly 
>> to `owner`.
>> 
>> It would be satisfying, but I don't know if it's possible, to compose 
>> `checkValidStateSlow` from `checkValidState` e.g.
>> 
>> public final checkValidStateSlow() {
>> checkValidState();
>> if (!isAlive() { ... }
>> }
>
> I'll change use of `owner`. It's not really possible to write 
> checkValidStateSlow in terms of checkValidState, because the latter does a 
> plain read of the state, whereas the former does a volatile read. Reusing one 
> from the other would result in two reads (a plain and a volatile).

Ok. My thought was that since this is slow two reads do not matter, but i did 
not reason fully about the concurrent implications (if the fast alive check 
returns false, the slow alive check can still return true so that seems good, 
if the fast check returns true i was presume the slow alive check would also be 
true, given the way state changes monotonically?)

-

PR: https://git.openjdk.java.net/jdk18/pull/82


Re: RFR: 8273322: Enhance macro logic optimization for masked logic operations. [v4]

2022-01-05 Thread Vladimir Kozlov
On Wed, 5 Jan 2022 08:59:00 GMT, Jatin Bhateja  wrote:

>> Patch extends existing macrologic inferencing algorithm to handle masked 
>> logic operations.
>> 
>> Existing algorithm:
>> 
>> 1. Identify logic cone roots.
>> 2. Packs parent and logic child nodes into a MacroLogic node in bottom up 
>> traversal if input constraint are met.
>> i.e. maximum number of inputs which a macro logic node can have.
>> 3. Perform symbolic evaluation of logic expression tree by assigning value 
>> corresponding to a truth table column
>> to each input.
>> 4. Inputs along with encoded function together represents a macro logic node 
>> which mimics a truth table.
>> 
>> Modification:
>> Extended the packing algorithm to operate on both predicated or 
>> non-predicated logic nodes. Following
>> rules define the criteria under which nodes gets packed into a macro logic 
>> node:-
>> 
>> 1. Parent and both child nodes are all unmasked or masked with same 
>> predicates.
>> 2. Masked parent can be packed with left child if it is predicated and both 
>> have same prediates.
>> 3. Masked parent can be packed with right child if its un-predicated or has 
>> matching predication condition.
>> 4. An unmasked parent can be packed with an unmasked child.
>> 
>> New jtreg test case added with the patch exhaustively covers all the 
>> different combinations of predications of parent and
>> child nodes.
>> 
>> Following are the performance number for JMH benchmark included with the 
>> patch.
>> 
>> Machine Configuration:  Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S 
>> Icelake Server)
>> 
>> Benchmark | ARRAYLEN | Baseline (ops/s) | Withopt (ops/s) | Gain ( 
>> withopt/baseline)
>> -- | -- | -- | -- | --
>> o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 64 | 2365.421 | 5136.283 
>> | 2.171403315
>> o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 128 | 2034.1 | 4073.381 | 
>> 2.002547072
>> o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 256 | 1568.694 | 2811.975 
>> | 1.792558013
>> o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 512 | 883.261 | 1662.771 
>> | 1.882536419
>> o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 1024 | 469.513 | 732.81 | 
>> 1.560787454
>> o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 64 | 273.049 | 552.106 | 
>> 2.022003377
>> o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 128 | 219.624 | 359.775 | 
>> 1.63814064
>> o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 256 | 131.649 | 182.23 | 
>> 1.384211046
>> o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 512 | 71.452 | 81.522 | 
>> 1.140933774
>> o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 1024 | 37.427 | 41.966 | 
>> 1.121276084
>> o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 64 | 2805.759 | 3383.16 | 
>> 1.205791374
>> o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 128 | 2069.012 | 2250.37 
>> | 1.087654397
>> o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 256 | 1098.766 | 1101.996 
>> | 1.002939661
>> o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 512 | 470.035 | 484.732 | 
>> 1.031267884
>> o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 1024 | 202.827 | 209.073 
>> | 1.030794717
>> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 256 
>> | 3435.989 | 4418.09 | 1.285827749
>> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 512 
>> | 1524.803 | 1678.201 | 1.100601848
>> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 
>> 1024 | 972.501 | 1166.734 | 1.199725244
>> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 256 
>> | 5980.85 | 7584.17 | 1.268075608
>> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 512 
>> | 3258.108 | 3939.23 | 1.209054457
>> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 
>> 1024 | 1475.365 | 1511.159 | 1.024261115
>> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 256 
>> | 4208.766 | 4220.678 | 1.002830283
>> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 512 
>> | 2056.651 | 2049.489 | 0.99651764
>> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 
>> 1024 | 1110.461 | 1116.448 | 1.005391455
>> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 
>> 256 | 3259.348 | 3947.94 | 1.211266793
>> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 
>> 512 | 1515.147 | 1536.647 | 1.014190042
>> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 
>> 1024 | 911.58 | 1030.54 | 1.130498695
>> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 
>> 256 | 2034.611 | 2073.764 | 1.019243482
>> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 
>> 512 | 1110.659 | 1116.093 | 1.004892591
>> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 
>> 1024 | 559.269 | 559.651 | 1.000683034
>> o.o.b.jdk.incubator.vector.MaskedLogicOpt

Re: [jdk18] RFR: 8279527: Dereferencing segments backed by different scopes leads to pollution [v2]

2022-01-05 Thread Maurizio Cimadamore
> This patch fixes a performance issue when dereferencing memory segments 
> backed by different kinds of scopes. When looking at inline traces, I found 
> that one of our benchmark, namely `LoopOverPollutedSegment` was already 
> hitting the ceiling of the bimorphic inline cache, specifically when checking 
> liveness of the segment scope in the memory access hotpath 
> (`ResourceScopeImpl::checkValidState`). The benchmark only used segments 
> backed by confined and global scope. I then added (in the initialization 
> "polluting" loop) segments backed by a shared scope, and then the benchmark 
> numbers started to look as follows:
> 
> 
> Benchmark  Mode  Cnt  Score   
> Error  Units
> LoopOverPollutedSegments.heap_segment_floats_VHavgt   30  7.004 ? 
> 0.089  ms/op
> LoopOverPollutedSegments.heap_segment_floats_instance  avgt   30  7.159 ? 
> 0.016  ms/op
> LoopOverPollutedSegments.heap_segment_ints_VH  avgt   30  7.017 ? 
> 0.110  ms/op
> LoopOverPollutedSegments.heap_segment_ints_instanceavgt   30  7.175 ? 
> 0.048  ms/op
> LoopOverPollutedSegments.heap_unsafe   avgt   30  0.243 ? 
> 0.004  ms/op
> LoopOverPollutedSegments.native_segment_VH avgt   30  7.366 ? 
> 0.036  ms/op
> LoopOverPollutedSegments.native_segment_instance   avgt   30  7.305 ? 
> 0.098  ms/op
> LoopOverPollutedSegments.native_unsafe avgt   30  0.238 ? 
> 0.002  ms/op
> 
> 
> That is, since now we have *three* different kinds of scopes (confined, 
> shared and global), the call to the liveness check can no longer be inlined. 
> One solution could be, as we do for the *base* accessor, to add a scope 
> accessor to all memory segment implementation classes. But doing so only 
> works ok for heap segments (for which the scope accessor just returns the 
> global scope constants). For native segments, we're still megamorphic (as a 
> native segment can be backed by all kinds of scopes).
> 
> In the end, it turned out to be much simpler to just make the liveness check 
> monomorphic, since there's so much sharing between the code paths already. 
> With that change, numbers of the tweaked benchmark go back to normal:
> 
> 
> Benchmark  Mode  Cnt  Score   
> Error  Units
> LoopOverPollutedSegments.heap_segment_floats_VHavgt   30  0.241 ? 
> 0.003  ms/op
> LoopOverPollutedSegments.heap_segment_floats_instance  avgt   30  0.244 ? 
> 0.003  ms/op
> LoopOverPollutedSegments.heap_segment_ints_VH  avgt   30  0.242 ? 
> 0.003  ms/op
> LoopOverPollutedSegments.heap_segment_ints_instanceavgt   30  0.248 ? 
> 0.001  ms/op
> LoopOverPollutedSegments.heap_unsafe   avgt   30  0.247 ? 
> 0.013  ms/op
> LoopOverPollutedSegments.native_segment_VH avgt   30  0.245 ? 
> 0.004  ms/op
> LoopOverPollutedSegments.native_segment_instance   avgt   30  0.245 ? 
> 0.001  ms/op
> LoopOverPollutedSegments.native_unsafe avgt   30  0.247 ? 
> 0.005  ms/op
> 
> 
> Note that this patch tidies up a bit the usage of `checkValidState` vs. 
> `checkValidStateSlow`. The former should only really be used in the hot path, 
> while the latter is a more general routine which should be used in 
> non-performance critical code. Making `checkValidState` monomorphic caused 
> the `ScopeAccessError` to be generated in more places, so I needed to either 
> update the usage to use the safer `checkValidStateSlow` (where possible) or, 
> (in `Buffer` and `ConfinedScope`) just add extra wrapping.

Maurizio Cimadamore has updated the pull request incrementally with one 
additional commit since the last revision:

  Use owner field instead of accessor in checkValidStateSlow

-

Changes:
  - all: https://git.openjdk.java.net/jdk18/pull/82/files
  - new: https://git.openjdk.java.net/jdk18/pull/82/files/c6082953..04a1e9f2

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk18&pr=82&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk18&pr=82&range=00-01

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk18/pull/82.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk18 pull/82/head:pull/82

PR: https://git.openjdk.java.net/jdk18/pull/82


Re: [jdk18] RFR: 8279527: Dereferencing segments backed by different scopes leads to pollution

2022-01-05 Thread Maurizio Cimadamore
On Wed, 5 Jan 2022 17:23:40 GMT, Paul Sandoz  wrote:

>> This patch fixes a performance issue when dereferencing memory segments 
>> backed by different kinds of scopes. When looking at inline traces, I found 
>> that one of our benchmark, namely `LoopOverPollutedSegment` was already 
>> hitting the ceiling of the bimorphic inline cache, specifically when 
>> checking liveness of the segment scope in the memory access hotpath 
>> (`ResourceScopeImpl::checkValidState`). The benchmark only used segments 
>> backed by confined and global scope. I then added (in the initialization 
>> "polluting" loop) segments backed by a shared scope, and then the benchmark 
>> numbers started to look as follows:
>> 
>> 
>> Benchmark  Mode  Cnt  Score   
>> Error  Units
>> LoopOverPollutedSegments.heap_segment_floats_VHavgt   30  7.004 ? 
>> 0.089  ms/op
>> LoopOverPollutedSegments.heap_segment_floats_instance  avgt   30  7.159 ? 
>> 0.016  ms/op
>> LoopOverPollutedSegments.heap_segment_ints_VH  avgt   30  7.017 ? 
>> 0.110  ms/op
>> LoopOverPollutedSegments.heap_segment_ints_instanceavgt   30  7.175 ? 
>> 0.048  ms/op
>> LoopOverPollutedSegments.heap_unsafe   avgt   30  0.243 ? 
>> 0.004  ms/op
>> LoopOverPollutedSegments.native_segment_VH avgt   30  7.366 ? 
>> 0.036  ms/op
>> LoopOverPollutedSegments.native_segment_instance   avgt   30  7.305 ? 
>> 0.098  ms/op
>> LoopOverPollutedSegments.native_unsafe avgt   30  0.238 ? 
>> 0.002  ms/op
>> 
>> 
>> That is, since now we have *three* different kinds of scopes (confined, 
>> shared and global), the call to the liveness check can no longer be inlined. 
>> One solution could be, as we do for the *base* accessor, to add a scope 
>> accessor to all memory segment implementation classes. But doing so only 
>> works ok for heap segments (for which the scope accessor just returns the 
>> global scope constants). For native segments, we're still megamorphic (as a 
>> native segment can be backed by all kinds of scopes).
>> 
>> In the end, it turned out to be much simpler to just make the liveness check 
>> monomorphic, since there's so much sharing between the code paths already. 
>> With that change, numbers of the tweaked benchmark go back to normal:
>> 
>> 
>> Benchmark  Mode  Cnt  Score   
>> Error  Units
>> LoopOverPollutedSegments.heap_segment_floats_VHavgt   30  0.241 ? 
>> 0.003  ms/op
>> LoopOverPollutedSegments.heap_segment_floats_instance  avgt   30  0.244 ? 
>> 0.003  ms/op
>> LoopOverPollutedSegments.heap_segment_ints_VH  avgt   30  0.242 ? 
>> 0.003  ms/op
>> LoopOverPollutedSegments.heap_segment_ints_instanceavgt   30  0.248 ? 
>> 0.001  ms/op
>> LoopOverPollutedSegments.heap_unsafe   avgt   30  0.247 ? 
>> 0.013  ms/op
>> LoopOverPollutedSegments.native_segment_VH avgt   30  0.245 ? 
>> 0.004  ms/op
>> LoopOverPollutedSegments.native_segment_instance   avgt   30  0.245 ? 
>> 0.001  ms/op
>> LoopOverPollutedSegments.native_unsafe avgt   30  0.247 ? 
>> 0.005  ms/op
>> 
>> 
>> Note that this patch tidies up a bit the usage of `checkValidState` vs. 
>> `checkValidStateSlow`. The former should only really be used in the hot 
>> path, while the latter is a more general routine which should be used in 
>> non-performance critical code. Making `checkValidState` monomorphic caused 
>> the `ScopeAccessError` to be generated in more places, so I needed to either 
>> update the usage to use the safer `checkValidStateSlow` (where possible) or, 
>> (in `Buffer` and `ConfinedScope`) just add extra wrapping.
>
> src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/ResourceScopeImpl.java
>  line 190:
> 
>> 188: @ForceInline
>> 189: public final void checkValidState() {
>> 190: if (owner != null && owner != Thread.currentThread()) {
> 
> For consistency we could change code `checkValidStateSlow` to refer directly 
> to `owner`.
> 
> It would be satisfying, but I don't know if it's possible, to compose 
> `checkValidStateSlow` from `checkValidState` e.g.
> 
> public final checkValidStateSlow() {
> checkValidState();
> if (!isAlive() { ... }
> }

I'll change use of `owner`. It's not really possible to write 
checkValidStateSlow in terms of checkValidState, because the latter does a 
plain read of the state, whereas the former does a volatile read. Reusing one 
from the other would result in two reads (a plain and a volatile).

-

PR: https://git.openjdk.java.net/jdk18/pull/82


Re: [jdk18] RFR: 8279529: ProblemList java/nio/channels/DatagramChannel/ManySourcesAndTargets.java on macosx-aarch64

2022-01-05 Thread Jamil Nimeh
On Wed, 5 Jan 2022 17:22:54 GMT, Daniel D. Daugherty  wrote:

> A couple of trivial ProblemListings:
> 
> JDK-8279529 ProblemList 
> java/nio/channels/DatagramChannel/ManySourcesAndTargets.java on macosx-aarch64
> JDK-8279532 ProblemList 
> sun/security/ssl/SSLSessionImpl/NoInvalidateSocketException.java

Looks good to me.

-

Marked as reviewed by jnimeh (Reviewer).

PR: https://git.openjdk.java.net/jdk18/pull/83


[jdk18] RFR: 8279529: ProblemList java/nio/channels/DatagramChannel/ManySourcesAndTargets.java on macosx-aarch64

2022-01-05 Thread Daniel D . Daugherty
A couple of trivial ProblemListings:

JDK-8279529 ProblemList 
java/nio/channels/DatagramChannel/ManySourcesAndTargets.java on macosx-aarch64
JDK-8279532 ProblemList 
sun/security/ssl/SSLSessionImpl/NoInvalidateSocketException.java

-

Commit messages:
 - 8279532: ProblemList 
sun/security/ssl/SSLSessionImpl/NoInvalidateSocketException.java
 - 8279529: ProblemList 
java/nio/channels/DatagramChannel/ManySourcesAndTargets.java on macosx-aarch64

Changes: https://git.openjdk.java.net/jdk18/pull/83/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk18&pr=83&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8279529
  Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk18/pull/83.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk18 pull/83/head:pull/83

PR: https://git.openjdk.java.net/jdk18/pull/83


Idea: Collection Literals

2022-01-05 Thread Izz Rainy
Hello! Hopefully this is the right place for this suggestion - I originally
posted this on the discuss list, but was told this list would be more
appropriate.

Recently, I've read some of the discussion linked in JEP 269 (Convenience
Factory Methods for Collections), including why Collection Literals were
shelved when brought up in Project Coin and Lambda. I believe that things
may be different enough now (with dynamic constants possible, local
variable inference and lambdas present, value types closer to completion,
and Project Amber bringing "smaller" features under consideration) to bring
them up yet again.

I've tried writing an informal proposal for them that tries to answer the
main questions brought up in those original discussions:
https://gist.github.com/l-Luna/08b6574d0c840de93634cf8d1e43c494

I understand that it's still unlikely to be added in the near future, but
thought it wouldn't hurt to bring up again.


Re: [jdk18] RFR: 8279527: Dereferencing segments backed by different scopes leads to pollution

2022-01-05 Thread Paul Sandoz
On Wed, 5 Jan 2022 16:59:30 GMT, Maurizio Cimadamore  
wrote:

> This patch fixes a performance issue when dereferencing memory segments 
> backed by different kinds of scopes. When looking at inline traces, I found 
> that one of our benchmark, namely `LoopOverPollutedSegment` was already 
> hitting the ceiling of the bimorphic inline cache, specifically when checking 
> liveness of the segment scope in the memory access hotpath 
> (`ResourceScopeImpl::checkValidState`). The benchmark only used segments 
> backed by confined and global scope. I then added (in the initialization 
> "polluting" loop) segments backed by a shared scope, and then the benchmark 
> numbers started to look as follows:
> 
> 
> Benchmark  Mode  Cnt  Score   
> Error  Units
> LoopOverPollutedSegments.heap_segment_floats_VHavgt   30  7.004 ? 
> 0.089  ms/op
> LoopOverPollutedSegments.heap_segment_floats_instance  avgt   30  7.159 ? 
> 0.016  ms/op
> LoopOverPollutedSegments.heap_segment_ints_VH  avgt   30  7.017 ? 
> 0.110  ms/op
> LoopOverPollutedSegments.heap_segment_ints_instanceavgt   30  7.175 ? 
> 0.048  ms/op
> LoopOverPollutedSegments.heap_unsafe   avgt   30  0.243 ? 
> 0.004  ms/op
> LoopOverPollutedSegments.native_segment_VH avgt   30  7.366 ? 
> 0.036  ms/op
> LoopOverPollutedSegments.native_segment_instance   avgt   30  7.305 ? 
> 0.098  ms/op
> LoopOverPollutedSegments.native_unsafe avgt   30  0.238 ? 
> 0.002  ms/op
> 
> 
> That is, since now we have *three* different kinds of scopes (confined, 
> shared and global), the call to the liveness check can no longer be inlined. 
> One solution could be, as we do for the *base* accessor, to add a scope 
> accessor to all memory segment implementation classes. But doing so only 
> works ok for heap segments (for which the scope accessor just returns the 
> global scope constants). For native segments, we're still megamorphic (as a 
> native segment can be backed by all kinds of scopes).
> 
> In the end, it turned out to be much simpler to just make the liveness check 
> monomorphic, since there's so much sharing between the code paths already. 
> With that change, numbers of the tweaked benchmark go back to normal:
> 
> 
> Benchmark  Mode  Cnt  Score   
> Error  Units
> LoopOverPollutedSegments.heap_segment_floats_VHavgt   30  0.241 ? 
> 0.003  ms/op
> LoopOverPollutedSegments.heap_segment_floats_instance  avgt   30  0.244 ? 
> 0.003  ms/op
> LoopOverPollutedSegments.heap_segment_ints_VH  avgt   30  0.242 ? 
> 0.003  ms/op
> LoopOverPollutedSegments.heap_segment_ints_instanceavgt   30  0.248 ? 
> 0.001  ms/op
> LoopOverPollutedSegments.heap_unsafe   avgt   30  0.247 ? 
> 0.013  ms/op
> LoopOverPollutedSegments.native_segment_VH avgt   30  0.245 ? 
> 0.004  ms/op
> LoopOverPollutedSegments.native_segment_instance   avgt   30  0.245 ? 
> 0.001  ms/op
> LoopOverPollutedSegments.native_unsafe avgt   30  0.247 ? 
> 0.005  ms/op
> 
> 
> Note that this patch tidies up a bit the usage of `checkValidState` vs. 
> `checkValidStateSlow`. The former should only really be used in the hot path, 
> while the latter is a more general routine which should be used in 
> non-performance critical code. Making `checkValidState` monomorphic caused 
> the `ScopeAccessError` to be generated in more places, so I needed to either 
> update the usage to use the safer `checkValidStateSlow` (where possible) or, 
> (in `Buffer` and `ConfinedScope`) just add extra wrapping.

src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/ResourceScopeImpl.java
 line 190:

> 188: @ForceInline
> 189: public final void checkValidState() {
> 190: if (owner != null && owner != Thread.currentThread()) {

For consistency we could change code `checkValidStateSlow` to refer directly to 
`owner`.

It would be satisfying, but I don't know if it's possible, to compose 
`checkValidStateSlow` from `checkValidState` e.g.

public final checkValidStateSlow() {
checkValidState();
if (!isAlive() { ... }
}

-

PR: https://git.openjdk.java.net/jdk18/pull/82


Re: [jdk18] RFR: 8279527: Dereferencing segments backed by different scopes leads to pollution

2022-01-05 Thread Maurizio Cimadamore
On Wed, 5 Jan 2022 16:59:30 GMT, Maurizio Cimadamore  
wrote:

> This patch fixes a performance issue when dereferencing memory segments 
> backed by different kinds of scopes. When looking at inline traces, I found 
> that one of our benchmark, namely `LoopOverPollutedSegment` was already 
> hitting the ceiling of the bimorphic inline cache, specifically when checking 
> liveness of the segment scope in the memory access hotpath 
> (`ResourceScopeImpl::checkValidState`). The benchmark only used segments 
> backed by confined and global scope. I then added (in the initialization 
> "polluting" loop) segments backed by a shared scope, and then the benchmark 
> numbers started to look as follows:
> 
> 
> Benchmark  Mode  Cnt  Score   
> Error  Units
> LoopOverPollutedSegments.heap_segment_floats_VHavgt   30  7.004 ? 
> 0.089  ms/op
> LoopOverPollutedSegments.heap_segment_floats_instance  avgt   30  7.159 ? 
> 0.016  ms/op
> LoopOverPollutedSegments.heap_segment_ints_VH  avgt   30  7.017 ? 
> 0.110  ms/op
> LoopOverPollutedSegments.heap_segment_ints_instanceavgt   30  7.175 ? 
> 0.048  ms/op
> LoopOverPollutedSegments.heap_unsafe   avgt   30  0.243 ? 
> 0.004  ms/op
> LoopOverPollutedSegments.native_segment_VH avgt   30  7.366 ? 
> 0.036  ms/op
> LoopOverPollutedSegments.native_segment_instance   avgt   30  7.305 ? 
> 0.098  ms/op
> LoopOverPollutedSegments.native_unsafe avgt   30  0.238 ? 
> 0.002  ms/op
> 
> 
> That is, since now we have *three* different kinds of scopes (confined, 
> shared and global), the call to the liveness check can no longer be inlined. 
> One solution could be, as we do for the *base* accessor, to add a scope 
> accessor to all memory segment implementation classes. But doing so only 
> works ok for heap segments (for which the scope accessor just returns the 
> global scope constants). For native segments, we're still megamorphic (as a 
> native segment can be backed by all kinds of scopes).
> 
> In the end, it turned out to be much simpler to just make the liveness check 
> monomorphic, since there's so much sharing between the code paths already. 
> With that change, numbers of the tweaked benchmark go back to normal:
> 
> 
> Benchmark  Mode  Cnt  Score   
> Error  Units
> LoopOverPollutedSegments.heap_segment_floats_VHavgt   30  0.241 ? 
> 0.003  ms/op
> LoopOverPollutedSegments.heap_segment_floats_instance  avgt   30  0.244 ? 
> 0.003  ms/op
> LoopOverPollutedSegments.heap_segment_ints_VH  avgt   30  0.242 ? 
> 0.003  ms/op
> LoopOverPollutedSegments.heap_segment_ints_instanceavgt   30  0.248 ? 
> 0.001  ms/op
> LoopOverPollutedSegments.heap_unsafe   avgt   30  0.247 ? 
> 0.013  ms/op
> LoopOverPollutedSegments.native_segment_VH avgt   30  0.245 ? 
> 0.004  ms/op
> LoopOverPollutedSegments.native_segment_instance   avgt   30  0.245 ? 
> 0.001  ms/op
> LoopOverPollutedSegments.native_unsafe avgt   30  0.247 ? 
> 0.005  ms/op
> 
> 
> Note that this patch tidies up a bit the usage of `checkValidState` vs. 
> `checkValidStateSlow`. The former should only really be used in the hot path, 
> while the latter is a more general routine which should be used in 
> non-performance critical code. Making `checkValidState` monomorphic caused 
> the `ScopeAccessError` to be generated in more places, so I needed to either 
> update the usage to use the safer `checkValidStateSlow` (where possible) or, 
> (in `Buffer` and `ConfinedScope`) just add extra wrapping.

src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/AbstractMemorySegmentImpl.java
 line 367:

> 365: 
> 366: void checkValidState() {
> 367: scope.checkValidStateSlow();

Not to be confused with `ResourceScope::checkValidState` - this method is only 
really used by other non-performance critical code around 
`AbstractMemorySegmentImpl`.

-

PR: https://git.openjdk.java.net/jdk18/pull/82


[jdk18] RFR: 8279527: Dereferencing segments backed by different scopes leads to pollution

2022-01-05 Thread Maurizio Cimadamore
This patch fixes a performance issue when dereferencing memory segments backed 
by different kinds of scopes. When looking at inline traces, I found that one 
of our benchmark, namely `LoopOverPollutedSegment` was already hitting the 
ceiling of the bimorphic inline cache, specifically when checking liveness of 
the segment scope in the memory access hotpath 
(`ResourceScopeImpl::checkValidState`). The benchmark only used segments backed 
by confined and global scope. I then added (in the initialization "polluting" 
loop) segments backed by a shared scope, and then the benchmark numbers started 
to look as follows:


Benchmark  Mode  Cnt  Score   Error 
 Units
LoopOverPollutedSegments.heap_segment_floats_VHavgt   30  7.004 ? 0.089 
 ms/op
LoopOverPollutedSegments.heap_segment_floats_instance  avgt   30  7.159 ? 0.016 
 ms/op
LoopOverPollutedSegments.heap_segment_ints_VH  avgt   30  7.017 ? 0.110 
 ms/op
LoopOverPollutedSegments.heap_segment_ints_instanceavgt   30  7.175 ? 0.048 
 ms/op
LoopOverPollutedSegments.heap_unsafe   avgt   30  0.243 ? 0.004 
 ms/op
LoopOverPollutedSegments.native_segment_VH avgt   30  7.366 ? 0.036 
 ms/op
LoopOverPollutedSegments.native_segment_instance   avgt   30  7.305 ? 0.098 
 ms/op
LoopOverPollutedSegments.native_unsafe avgt   30  0.238 ? 0.002 
 ms/op


That is, since now we have *three* different kinds of scopes (confined, shared 
and global), the call to the liveness check can no longer be inlined. One 
solution could be, as we do for the *base* accessor, to add a scope accessor to 
all memory segment implementation classes. But doing so only works ok for heap 
segments (for which the scope accessor just returns the global scope 
constants). For native segments, we're still megamorphic (as a native segment 
can be backed by all kinds of scopes).

In the end, it turned out to be much simpler to just make the liveness check 
monomorphic, since there's so much sharing between the code paths already. With 
that change, numbers of the tweaked benchmark go back to normal:


Benchmark  Mode  Cnt  Score   Error 
 Units
LoopOverPollutedSegments.heap_segment_floats_VHavgt   30  0.241 ? 0.003 
 ms/op
LoopOverPollutedSegments.heap_segment_floats_instance  avgt   30  0.244 ? 0.003 
 ms/op
LoopOverPollutedSegments.heap_segment_ints_VH  avgt   30  0.242 ? 0.003 
 ms/op
LoopOverPollutedSegments.heap_segment_ints_instanceavgt   30  0.248 ? 0.001 
 ms/op
LoopOverPollutedSegments.heap_unsafe   avgt   30  0.247 ? 0.013 
 ms/op
LoopOverPollutedSegments.native_segment_VH avgt   30  0.245 ? 0.004 
 ms/op
LoopOverPollutedSegments.native_segment_instance   avgt   30  0.245 ? 0.001 
 ms/op
LoopOverPollutedSegments.native_unsafe avgt   30  0.247 ? 0.005 
 ms/op


Note that this patch tidies up a bit the usage of `checkValidState` vs. 
`checkValidStateSlow`. The former should only really be used in the hot path, 
while the latter is a more general routine which should be used in 
non-performance critical code. Making `checkValidState` monomorphic caused the 
`ScopeAccessError` to be generated in more places, so I needed to either update 
the usage to use the safer `checkValidStateSlow` (where possible) or, (in 
`Buffer` and `ConfinedScope`) just add extra wrapping.

-

Commit messages:
 - Initial push

Changes: https://git.openjdk.java.net/jdk18/pull/82/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk18&pr=82&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8279527
  Stats: 108 lines in 8 files changed: 41 ins; 49 del; 18 mod
  Patch: https://git.openjdk.java.net/jdk18/pull/82.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk18 pull/82/head:pull/82

PR: https://git.openjdk.java.net/jdk18/pull/82


Integrated: 8279453: Disable tools/jar/ReproducibleJar.java on 32-bit platforms

2022-01-05 Thread Aleksey Shipilev
On Tue, 4 Jan 2022 16:46:09 GMT, Aleksey Shipilev  wrote:

> The real problem is Y2038 
> ([JDK-8279444](https://bugs.openjdk.java.net/browse/JDK-8279444)), which does 
> not look solvable at this time. So for test cleanliness, we might just 
> disable this test on 32-bit platforms.
> 
> Additional testing:
>  - [x] Linux x86_64 fastdebug, affected test still passes
>  - [x]  Linux x86_32 fastdebug, affected test is now skipped

This pull request has now been integrated.

Changeset: a741b927
Author:Aleksey Shipilev 
URL:   
https://git.openjdk.java.net/jdk/commit/a741b927a3cdc8e339ae557c77886ea850aa06b6
Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod

8279453: Disable tools/jar/ReproducibleJar.java on 32-bit platforms

Reviewed-by: alanb, bpb

-

PR: https://git.openjdk.java.net/jdk/pull/6957


Re: RFR: 8279453: Disable tools/jar/ReproducibleJar.java on 32-bit platforms

2022-01-05 Thread Aleksey Shipilev
On Tue, 4 Jan 2022 16:46:09 GMT, Aleksey Shipilev  wrote:

> The real problem is Y2038 
> ([JDK-8279444](https://bugs.openjdk.java.net/browse/JDK-8279444)), which does 
> not look solvable at this time. So for test cleanliness, we might just 
> disable this test on 32-bit platforms.
> 
> Additional testing:
>  - [x] Linux x86_64 fastdebug, affected test still passes
>  - [x]  Linux x86_32 fastdebug, affected test is now skipped

Thanks for reviews!

-

PR: https://git.openjdk.java.net/jdk/pull/6957


[jdk18] Integrated: 8278897: Alignment of heap segments is not enforced correctly

2022-01-05 Thread Maurizio Cimadamore
On Thu, 16 Dec 2021 12:31:01 GMT, Maurizio Cimadamore  
wrote:

> This PR fixes an issue with alignment constraints not being enforced 
> correctly on on-heap segments dereference/copy operations. Alignment of 
> on-heap segments cannot be computed exactly, as alignment of elements in 
> arrays is, ultimately a VM implementation detail. Because of this, alignment 
> checks on heap segments can fail or pass depending on the platform being used.
> 
> For more details about the problem and the solution please refer to:
> https://mail.openjdk.java.net/pipermail/panama-dev/2021-November/015852.html

This pull request has now been integrated.

Changeset: 9d43d25d
Author:Maurizio Cimadamore 
URL:   
https://git.openjdk.java.net/jdk18/commit/9d43d25da8bcfff425a795dcc230914a384a5c82
Stats: 600 lines in 20 files changed: 566 ins; 0 del; 34 mod

8278897: Alignment of heap segments is not enforced correctly

Reviewed-by: jvernee

-

PR: https://git.openjdk.java.net/jdk18/pull/37


Re: RFR: 8272746: ZipFile can't open big file (NegativeArraySizeException) [v2]

2022-01-05 Thread Masanori Yano
On Thu, 23 Dec 2021 16:42:50 GMT, Alan Bateman  wrote:

>> Masanori Yano has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   8272746: ZipFile can't open big file (NegativeArraySizeException)
>
> src/java.base/share/classes/java/util/zip/ZipFile.java line 1501:
> 
>> 1499: // read in the CEN and END
>> 1500: if (end.cenlen + ENDHDR >= Integer.MAX_VALUE) {
>> 1501: zerror("invalid END header (too large central 
>> directory size)");
> 
> This check looks correct. It might be a bit clearer to say that "central 
> directory size too large" rather than "too large central directory size".
> 
> The bug report says that JDK 8 and the native zip handle these zip files, 
> were you able to check that?

@AlanBateman Could you please review the above comments.

-

PR: https://git.openjdk.java.net/jdk/pull/6927


Re: RFR: 8276694: Pattern trailing unescaped backslash causes internal error

2022-01-05 Thread Masanori Yano
On Mon, 20 Dec 2021 09:57:14 GMT, Masanori Yano  wrote:

> Could you please review the 8276694 bug fixes?
> 
> A message specific for this exception should be printed instead of an 
> internal error. This fix adds a new check to output an appropriate exception 
> message when the regular expression ends with an unescaped backslash. This 
> fix also checks the position of the cursor to rule out other syntax errors at 
> the middle position of the regular expression.

Could someone please review this pull request?

-

PR: https://git.openjdk.java.net/jdk/pull/6891


Re: RFR: 8273322: Enhance macro logic optimization for masked logic operations. [v3]

2022-01-05 Thread Jatin Bhateja
On Tue, 4 Jan 2022 15:11:47 GMT, Jatin Bhateja  wrote:

>> Patch extends existing macrologic inferencing algorithm to handle masked 
>> logic operations.
>> 
>> Existing algorithm:
>> 
>> 1. Identify logic cone roots.
>> 2. Packs parent and logic child nodes into a MacroLogic node in bottom up 
>> traversal if input constraint are met.
>> i.e. maximum number of inputs which a macro logic node can have.
>> 3. Perform symbolic evaluation of logic expression tree by assigning value 
>> corresponding to a truth table column
>> to each input.
>> 4. Inputs along with encoded function together represents a macro logic node 
>> which mimics a truth table.
>> 
>> Modification:
>> Extended the packing algorithm to operate on both predicated or 
>> non-predicated logic nodes. Following
>> rules define the criteria under which nodes gets packed into a macro logic 
>> node:-
>> 
>> 1. Parent and both child nodes are all unmasked or masked with same 
>> predicates.
>> 2. Masked parent can be packed with left child if it is predicated and both 
>> have same prediates.
>> 3. Masked parent can be packed with right child if its un-predicated or has 
>> matching predication condition.
>> 4. An unmasked parent can be packed with an unmasked child.
>> 
>> New jtreg test case added with the patch exhaustively covers all the 
>> different combinations of predications of parent and
>> child nodes.
>> 
>> Following are the performance number for JMH benchmark included with the 
>> patch.
>> 
>> Machine Configuration:  Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S 
>> Icelake Server)
>> 
>> Benchmark | ARRAYLEN | Baseline (ops/s) | Withopt (ops/s) | Gain ( 
>> withopt/baseline)
>> -- | -- | -- | -- | --
>> o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 64 | 2365.421 | 5136.283 
>> | 2.171403315
>> o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 128 | 2034.1 | 4073.381 | 
>> 2.002547072
>> o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 256 | 1568.694 | 2811.975 
>> | 1.792558013
>> o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 512 | 883.261 | 1662.771 
>> | 1.882536419
>> o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 1024 | 469.513 | 732.81 | 
>> 1.560787454
>> o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 64 | 273.049 | 552.106 | 
>> 2.022003377
>> o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 128 | 219.624 | 359.775 | 
>> 1.63814064
>> o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 256 | 131.649 | 182.23 | 
>> 1.384211046
>> o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 512 | 71.452 | 81.522 | 
>> 1.140933774
>> o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 1024 | 37.427 | 41.966 | 
>> 1.121276084
>> o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 64 | 2805.759 | 3383.16 | 
>> 1.205791374
>> o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 128 | 2069.012 | 2250.37 
>> | 1.087654397
>> o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 256 | 1098.766 | 1101.996 
>> | 1.002939661
>> o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 512 | 470.035 | 484.732 | 
>> 1.031267884
>> o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 1024 | 202.827 | 209.073 
>> | 1.030794717
>> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 256 
>> | 3435.989 | 4418.09 | 1.285827749
>> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 512 
>> | 1524.803 | 1678.201 | 1.100601848
>> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 
>> 1024 | 972.501 | 1166.734 | 1.199725244
>> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 256 
>> | 5980.85 | 7584.17 | 1.268075608
>> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 512 
>> | 3258.108 | 3939.23 | 1.209054457
>> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 
>> 1024 | 1475.365 | 1511.159 | 1.024261115
>> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 256 
>> | 4208.766 | 4220.678 | 1.002830283
>> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 512 
>> | 2056.651 | 2049.489 | 0.99651764
>> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 
>> 1024 | 1110.461 | 1116.448 | 1.005391455
>> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 
>> 256 | 3259.348 | 3947.94 | 1.211266793
>> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 
>> 512 | 1515.147 | 1536.647 | 1.014190042
>> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 
>> 1024 | 911.58 | 1030.54 | 1.130498695
>> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 
>> 256 | 2034.611 | 2073.764 | 1.019243482
>> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 
>> 512 | 1110.659 | 1116.093 | 1.004892591
>> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 
>> 1024 | 559.269 | 559.651 | 1.000683034
>> o.o.b.jdk.incubator.vector.MaskedLogicOpt

Re: RFR: 8273322: Enhance macro logic optimization for masked logic operations. [v4]

2022-01-05 Thread Jatin Bhateja
> Patch extends existing macrologic inferencing algorithm to handle masked 
> logic operations.
> 
> Existing algorithm:
> 
> 1. Identify logic cone roots.
> 2. Packs parent and logic child nodes into a MacroLogic node in bottom up 
> traversal if input constraint are met.
> i.e. maximum number of inputs which a macro logic node can have.
> 3. Perform symbolic evaluation of logic expression tree by assigning value 
> corresponding to a truth table column
> to each input.
> 4. Inputs along with encoded function together represents a macro logic node 
> which mimics a truth table.
> 
> Modification:
> Extended the packing algorithm to operate on both predicated or 
> non-predicated logic nodes. Following
> rules define the criteria under which nodes gets packed into a macro logic 
> node:-
> 
> 1. Parent and both child nodes are all unmasked or masked with same 
> predicates.
> 2. Masked parent can be packed with left child if it is predicated and both 
> have same prediates.
> 3. Masked parent can be packed with right child if its un-predicated or has 
> matching predication condition.
> 4. An unmasked parent can be packed with an unmasked child.
> 
> New jtreg test case added with the patch exhaustively covers all the 
> different combinations of predications of parent and
> child nodes.
> 
> Following are the performance number for JMH benchmark included with the 
> patch.
> 
> Machine Configuration:  Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S 
> Icelake Server)
> 
> Benchmark | ARRAYLEN | Baseline (ops/s) | Withopt (ops/s) | Gain ( 
> withopt/baseline)
> -- | -- | -- | -- | --
> o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 64 | 2365.421 | 5136.283 | 
> 2.171403315
> o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 128 | 2034.1 | 4073.381 | 
> 2.002547072
> o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 256 | 1568.694 | 2811.975 
> | 1.792558013
> o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 512 | 883.261 | 1662.771 | 
> 1.882536419
> o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 1024 | 469.513 | 732.81 | 
> 1.560787454
> o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 64 | 273.049 | 552.106 | 
> 2.022003377
> o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 128 | 219.624 | 359.775 | 
> 1.63814064
> o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 256 | 131.649 | 182.23 | 
> 1.384211046
> o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 512 | 71.452 | 81.522 | 
> 1.140933774
> o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 1024 | 37.427 | 41.966 | 
> 1.121276084
> o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 64 | 2805.759 | 3383.16 | 
> 1.205791374
> o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 128 | 2069.012 | 2250.37 | 
> 1.087654397
> o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 256 | 1098.766 | 1101.996 
> | 1.002939661
> o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 512 | 470.035 | 484.732 | 
> 1.031267884
> o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 1024 | 202.827 | 209.073 | 
> 1.030794717
> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 256 
> | 3435.989 | 4418.09 | 1.285827749
> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 512 
> | 1524.803 | 1678.201 | 1.100601848
> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 1024 
> | 972.501 | 1166.734 | 1.199725244
> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 256 
> | 5980.85 | 7584.17 | 1.268075608
> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 512 
> | 3258.108 | 3939.23 | 1.209054457
> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 1024 
> | 1475.365 | 1511.159 | 1.024261115
> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 256 
> | 4208.766 | 4220.678 | 1.002830283
> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 512 
> | 2056.651 | 2049.489 | 0.99651764
> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 1024 
> | 1110.461 | 1116.448 | 1.005391455
> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 256 
> | 3259.348 | 3947.94 | 1.211266793
> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 512 
> | 1515.147 | 1536.647 | 1.014190042
> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 
> 1024 | 911.58 | 1030.54 | 1.130498695
> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 256 
> | 2034.611 | 2073.764 | 1.019243482
> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 512 
> | 1110.659 | 1116.093 | 1.004892591
> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 
> 1024 | 559.269 | 559.651 | 1.000683034
> o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 256 
> | 3636.141 | 4446.505 | 1.222863745
> o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 512 
> | 

Re: RFR: 8273322: Enhance macro logic optimization for masked logic operations. [v2]

2022-01-05 Thread Jatin Bhateja
On Tue, 4 Jan 2022 02:25:36 GMT, Vladimir Kozlov  wrote:

> I think whole "Bitwise operation packing optimization" code should be moved 
> out from `compile.cpp`. May be to `vectornode.cpp where `MacroLogicVNode` 
> code is located.
> 
Hi @vnkozlov ,
Yes we can also extended AndV/OrV/XorV/AndVMask/OrVMask/XorVMask idealizations 
to perform macro logic folding, 
current changes keeps the implementation clean and limited to one optimization 
stage.

> Copyright year should be updated to 2022 in all changed files.

-

PR: https://git.openjdk.java.net/jdk/pull/6893