Re: RFR: 8268081: Upgrade Unicode Data Files to 14.0.0
On Wed, 5 Jan 2022 22:42:38 GMT, Naoto Sato wrote: > Please review the changes for upgrading the Unicode support in the JDK, from > version 13 to version 14. Corresponding CSR has also been drafted. Marked as reviewed by iris (Reviewer). - PR: https://git.openjdk.java.net/jdk/pull/6974
Re: RFR: 8268081: Upgrade Unicode Data Files to 14.0.0
On Wed, 5 Jan 2022 22:42:38 GMT, Naoto Sato wrote: > Please review the changes for upgrading the Unicode support in the JDK, from > version 13 to version 14. Corresponding CSR has also been drafted. I like how they changed dizzy face to face with crossed-out eyes. Pistol to water pistol, that's even better, just to avoid any confusion ;-) - PR: https://git.openjdk.java.net/jdk/pull/6974
Re: RFR: 8268081: Upgrade Unicode Data Files to 14.0.0
On Wed, 5 Jan 2022 22:42:38 GMT, Naoto Sato wrote: > Please review the changes for upgrading the Unicode support in the JDK, from > version 13 to version 14. Corresponding CSR has also been drafted. Marked as reviewed by joehw (Reviewer). - PR: https://git.openjdk.java.net/jdk/pull/6974
Integrated: Merge jdk18
On Thu, 6 Jan 2022 00:42:14 GMT, Jesper Wilhelmsson wrote: > Forwardport JDK 18 -> JDK 19 This pull request has now been integrated. Changeset: 844dfb3a Author:Jesper Wilhelmsson URL: https://git.openjdk.java.net/jdk/commit/844dfb3ab6a1d8b68ccdcc73726ee0f73cfcb3c8 Stats: 750 lines in 28 files changed: 687 ins; 8 del; 55 mod Merge - PR: https://git.openjdk.java.net/jdk/pull/6975
RFR: Merge jdk18
Forwardport JDK 18 -> JDK 19 - Commit messages: - Merge remote-tracking branch 'jdk18/master' into Merge_jdk18 - 8279529: ProblemList java/nio/channels/DatagramChannel/ManySourcesAndTargets.java on macosx-aarch64 - 8278612: [macos] test/jdk/java/awt/dnd/RemoveDropTargetCrashTest crashes with VoiceOver on macOS - 8279525: ProblemList java/awt/GraphicsDevice/CheckDisplayModes.java on macosx-aarch64 - 8278897: Alignment of heap segments is not enforced correctly - 8279222: Incorrect legacyMap.get in java.security.Provider after JDK-8276660 - 8278948: compiler/vectorapi/reshape/TestVectorCastAVX1.java crashes in assembler The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.java.net/?repo=jdk&pr=6975&range=00.0 - jdk18: https://webrevs.openjdk.java.net/?repo=jdk&pr=6975&range=00.1 Changes: https://git.openjdk.java.net/jdk/pull/6975/files Stats: 750 lines in 28 files changed: 687 ins; 8 del; 55 mod Patch: https://git.openjdk.java.net/jdk/pull/6975.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6975/head:pull/6975 PR: https://git.openjdk.java.net/jdk/pull/6975
RFR: 8268081: Support for Unicode 14
Please review the changes for upgrading the Unicode support in the JDK, from version 13 to version 14. Corresponding CSR has also been drafted. - Commit messages: - Amend unicode.md and icu.md files - Minor fixup - Merge branch 'master' into unicode - Copyright year to 2022 - ICU4J 70.1 - 18 -> 19 - Merge branch 'master' into unicode - Unicode 14.0.0 (final) - UCD ver. 14.0 (beta) / Unicode Text Segmentation rev. 38 (draft) - ICU4J 69.1 Changes: https://git.openjdk.java.net/jdk/pull/6974/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6974&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8268081 Stats: 3443 lines in 41 files changed: 2353 ins; 101 del; 989 mod Patch: https://git.openjdk.java.net/jdk/pull/6974.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6974/head:pull/6974 PR: https://git.openjdk.java.net/jdk/pull/6974
Re: [jdk18] RFR: 8279527: Dereferencing segments backed by different scopes leads to pollution [v2]
On Wed, 5 Jan 2022 18:24:39 GMT, Paul Sandoz wrote: >> I'll change use of `owner`. It's not really possible to write >> checkValidStateSlow in terms of checkValidState, because the latter does a >> plain read of the state, whereas the former does a volatile read. Reusing >> one from the other would result in two reads (a plain and a volatile). > > Ok. My thought was that since this is slow two reads do not matter, but i did > not reason fully about the concurrent implications (if the fast alive check > returns false, the slow alive check can still return true so that seems good, > if the fast check returns true i was presume the slow alive check would also > be true, given the way state changes monotonically?) If we're ok with a redundant plain read, then I don't think there are issues. You just do two reads, and the latter (the volatile one) is the one that counts. I don't think we can rely much on dependencies between what the plain read and what the volatile read will see. The state is updated in both direction (for shared segments) e.g. we can go from ALIVE to CLOSING then back to ALIVE. Or we could go from ALIVE to CLOSING to CLOSE. That said, I guess my main reservation for writing one routine on top of the other is that we really want checkValidState to be only used in critical hot paths. It has a non-volatile semantics and an exception handling which only really makes sense when combined with ScopedMemoryAccess - for this reason, using it as an internal building primitive didn't seem to me as a great idea. - PR: https://git.openjdk.java.net/jdk18/pull/82
Re: [jdk18] RFR: 8279527: Dereferencing segments backed by different scopes leads to pollution [v2]
On Wed, 5 Jan 2022 18:08:01 GMT, Maurizio Cimadamore wrote: >> This patch fixes a performance issue when dereferencing memory segments >> backed by different kinds of scopes. When looking at inline traces, I found >> that one of our benchmark, namely `LoopOverPollutedSegment` was already >> hitting the ceiling of the bimorphic inline cache, specifically when >> checking liveness of the segment scope in the memory access hotpath >> (`ResourceScopeImpl::checkValidState`). The benchmark only used segments >> backed by confined and global scope. I then added (in the initialization >> "polluting" loop) segments backed by a shared scope, and then the benchmark >> numbers started to look as follows: >> >> >> Benchmark Mode Cnt Score >> Error Units >> LoopOverPollutedSegments.heap_segment_floats_VHavgt 30 7.004 ? >> 0.089 ms/op >> LoopOverPollutedSegments.heap_segment_floats_instance avgt 30 7.159 ? >> 0.016 ms/op >> LoopOverPollutedSegments.heap_segment_ints_VH avgt 30 7.017 ? >> 0.110 ms/op >> LoopOverPollutedSegments.heap_segment_ints_instanceavgt 30 7.175 ? >> 0.048 ms/op >> LoopOverPollutedSegments.heap_unsafe avgt 30 0.243 ? >> 0.004 ms/op >> LoopOverPollutedSegments.native_segment_VH avgt 30 7.366 ? >> 0.036 ms/op >> LoopOverPollutedSegments.native_segment_instance avgt 30 7.305 ? >> 0.098 ms/op >> LoopOverPollutedSegments.native_unsafe avgt 30 0.238 ? >> 0.002 ms/op >> >> >> That is, since now we have *three* different kinds of scopes (confined, >> shared and global), the call to the liveness check can no longer be inlined. >> One solution could be, as we do for the *base* accessor, to add a scope >> accessor to all memory segment implementation classes. But doing so only >> works ok for heap segments (for which the scope accessor just returns the >> global scope constants). For native segments, we're still megamorphic (as a >> native segment can be backed by all kinds of scopes). >> >> In the end, it turned out to be much simpler to just make the liveness check >> monomorphic, since there's so much sharing between the code paths already. >> With that change, numbers of the tweaked benchmark go back to normal: >> >> >> Benchmark Mode Cnt Score >> Error Units >> LoopOverPollutedSegments.heap_segment_floats_VHavgt 30 0.241 ? >> 0.003 ms/op >> LoopOverPollutedSegments.heap_segment_floats_instance avgt 30 0.244 ? >> 0.003 ms/op >> LoopOverPollutedSegments.heap_segment_ints_VH avgt 30 0.242 ? >> 0.003 ms/op >> LoopOverPollutedSegments.heap_segment_ints_instanceavgt 30 0.248 ? >> 0.001 ms/op >> LoopOverPollutedSegments.heap_unsafe avgt 30 0.247 ? >> 0.013 ms/op >> LoopOverPollutedSegments.native_segment_VH avgt 30 0.245 ? >> 0.004 ms/op >> LoopOverPollutedSegments.native_segment_instance avgt 30 0.245 ? >> 0.001 ms/op >> LoopOverPollutedSegments.native_unsafe avgt 30 0.247 ? >> 0.005 ms/op >> >> >> Note that this patch tidies up a bit the usage of `checkValidState` vs. >> `checkValidStateSlow`. The former should only really be used in the hot >> path, while the latter is a more general routine which should be used in >> non-performance critical code. Making `checkValidState` monomorphic caused >> the `ScopeAccessError` to be generated in more places, so I needed to either >> update the usage to use the safer `checkValidStateSlow` (where possible) or, >> (in `Buffer` and `ConfinedScope`) just add extra wrapping. > > Maurizio Cimadamore has updated the pull request incrementally with one > additional commit since the last revision: > > Use owner field instead of accessor in checkValidStateSlow Marked as reviewed by psandoz (Reviewer). - PR: https://git.openjdk.java.net/jdk18/pull/82
Re: [jdk18] RFR: 8279527: Dereferencing segments backed by different scopes leads to pollution [v2]
On Wed, 5 Jan 2022 17:57:44 GMT, Maurizio Cimadamore wrote: >> src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/ResourceScopeImpl.java >> line 190: >> >>> 188: @ForceInline >>> 189: public final void checkValidState() { >>> 190: if (owner != null && owner != Thread.currentThread()) { >> >> For consistency we could change code `checkValidStateSlow` to refer directly >> to `owner`. >> >> It would be satisfying, but I don't know if it's possible, to compose >> `checkValidStateSlow` from `checkValidState` e.g. >> >> public final checkValidStateSlow() { >> checkValidState(); >> if (!isAlive() { ... } >> } > > I'll change use of `owner`. It's not really possible to write > checkValidStateSlow in terms of checkValidState, because the latter does a > plain read of the state, whereas the former does a volatile read. Reusing one > from the other would result in two reads (a plain and a volatile). Ok. My thought was that since this is slow two reads do not matter, but i did not reason fully about the concurrent implications (if the fast alive check returns false, the slow alive check can still return true so that seems good, if the fast check returns true i was presume the slow alive check would also be true, given the way state changes monotonically?) - PR: https://git.openjdk.java.net/jdk18/pull/82
Re: RFR: 8273322: Enhance macro logic optimization for masked logic operations. [v4]
On Wed, 5 Jan 2022 08:59:00 GMT, Jatin Bhateja wrote: >> Patch extends existing macrologic inferencing algorithm to handle masked >> logic operations. >> >> Existing algorithm: >> >> 1. Identify logic cone roots. >> 2. Packs parent and logic child nodes into a MacroLogic node in bottom up >> traversal if input constraint are met. >> i.e. maximum number of inputs which a macro logic node can have. >> 3. Perform symbolic evaluation of logic expression tree by assigning value >> corresponding to a truth table column >> to each input. >> 4. Inputs along with encoded function together represents a macro logic node >> which mimics a truth table. >> >> Modification: >> Extended the packing algorithm to operate on both predicated or >> non-predicated logic nodes. Following >> rules define the criteria under which nodes gets packed into a macro logic >> node:- >> >> 1. Parent and both child nodes are all unmasked or masked with same >> predicates. >> 2. Masked parent can be packed with left child if it is predicated and both >> have same prediates. >> 3. Masked parent can be packed with right child if its un-predicated or has >> matching predication condition. >> 4. An unmasked parent can be packed with an unmasked child. >> >> New jtreg test case added with the patch exhaustively covers all the >> different combinations of predications of parent and >> child nodes. >> >> Following are the performance number for JMH benchmark included with the >> patch. >> >> Machine Configuration: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S >> Icelake Server) >> >> Benchmark | ARRAYLEN | Baseline (ops/s) | Withopt (ops/s) | Gain ( >> withopt/baseline) >> -- | -- | -- | -- | -- >> o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 64 | 2365.421 | 5136.283 >> | 2.171403315 >> o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 128 | 2034.1 | 4073.381 | >> 2.002547072 >> o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 256 | 1568.694 | 2811.975 >> | 1.792558013 >> o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 512 | 883.261 | 1662.771 >> | 1.882536419 >> o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 1024 | 469.513 | 732.81 | >> 1.560787454 >> o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 64 | 273.049 | 552.106 | >> 2.022003377 >> o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 128 | 219.624 | 359.775 | >> 1.63814064 >> o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 256 | 131.649 | 182.23 | >> 1.384211046 >> o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 512 | 71.452 | 81.522 | >> 1.140933774 >> o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 1024 | 37.427 | 41.966 | >> 1.121276084 >> o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 64 | 2805.759 | 3383.16 | >> 1.205791374 >> o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 128 | 2069.012 | 2250.37 >> | 1.087654397 >> o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 256 | 1098.766 | 1101.996 >> | 1.002939661 >> o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 512 | 470.035 | 484.732 | >> 1.031267884 >> o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 1024 | 202.827 | 209.073 >> | 1.030794717 >> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 256 >> | 3435.989 | 4418.09 | 1.285827749 >> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 512 >> | 1524.803 | 1678.201 | 1.100601848 >> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | >> 1024 | 972.501 | 1166.734 | 1.199725244 >> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 256 >> | 5980.85 | 7584.17 | 1.268075608 >> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 512 >> | 3258.108 | 3939.23 | 1.209054457 >> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | >> 1024 | 1475.365 | 1511.159 | 1.024261115 >> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 256 >> | 4208.766 | 4220.678 | 1.002830283 >> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 512 >> | 2056.651 | 2049.489 | 0.99651764 >> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | >> 1024 | 1110.461 | 1116.448 | 1.005391455 >> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | >> 256 | 3259.348 | 3947.94 | 1.211266793 >> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | >> 512 | 1515.147 | 1536.647 | 1.014190042 >> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | >> 1024 | 911.58 | 1030.54 | 1.130498695 >> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | >> 256 | 2034.611 | 2073.764 | 1.019243482 >> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | >> 512 | 1110.659 | 1116.093 | 1.004892591 >> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | >> 1024 | 559.269 | 559.651 | 1.000683034 >> o.o.b.jdk.incubator.vector.MaskedLogicOpt
Re: [jdk18] RFR: 8279527: Dereferencing segments backed by different scopes leads to pollution [v2]
> This patch fixes a performance issue when dereferencing memory segments > backed by different kinds of scopes. When looking at inline traces, I found > that one of our benchmark, namely `LoopOverPollutedSegment` was already > hitting the ceiling of the bimorphic inline cache, specifically when checking > liveness of the segment scope in the memory access hotpath > (`ResourceScopeImpl::checkValidState`). The benchmark only used segments > backed by confined and global scope. I then added (in the initialization > "polluting" loop) segments backed by a shared scope, and then the benchmark > numbers started to look as follows: > > > Benchmark Mode Cnt Score > Error Units > LoopOverPollutedSegments.heap_segment_floats_VHavgt 30 7.004 ? > 0.089 ms/op > LoopOverPollutedSegments.heap_segment_floats_instance avgt 30 7.159 ? > 0.016 ms/op > LoopOverPollutedSegments.heap_segment_ints_VH avgt 30 7.017 ? > 0.110 ms/op > LoopOverPollutedSegments.heap_segment_ints_instanceavgt 30 7.175 ? > 0.048 ms/op > LoopOverPollutedSegments.heap_unsafe avgt 30 0.243 ? > 0.004 ms/op > LoopOverPollutedSegments.native_segment_VH avgt 30 7.366 ? > 0.036 ms/op > LoopOverPollutedSegments.native_segment_instance avgt 30 7.305 ? > 0.098 ms/op > LoopOverPollutedSegments.native_unsafe avgt 30 0.238 ? > 0.002 ms/op > > > That is, since now we have *three* different kinds of scopes (confined, > shared and global), the call to the liveness check can no longer be inlined. > One solution could be, as we do for the *base* accessor, to add a scope > accessor to all memory segment implementation classes. But doing so only > works ok for heap segments (for which the scope accessor just returns the > global scope constants). For native segments, we're still megamorphic (as a > native segment can be backed by all kinds of scopes). > > In the end, it turned out to be much simpler to just make the liveness check > monomorphic, since there's so much sharing between the code paths already. > With that change, numbers of the tweaked benchmark go back to normal: > > > Benchmark Mode Cnt Score > Error Units > LoopOverPollutedSegments.heap_segment_floats_VHavgt 30 0.241 ? > 0.003 ms/op > LoopOverPollutedSegments.heap_segment_floats_instance avgt 30 0.244 ? > 0.003 ms/op > LoopOverPollutedSegments.heap_segment_ints_VH avgt 30 0.242 ? > 0.003 ms/op > LoopOverPollutedSegments.heap_segment_ints_instanceavgt 30 0.248 ? > 0.001 ms/op > LoopOverPollutedSegments.heap_unsafe avgt 30 0.247 ? > 0.013 ms/op > LoopOverPollutedSegments.native_segment_VH avgt 30 0.245 ? > 0.004 ms/op > LoopOverPollutedSegments.native_segment_instance avgt 30 0.245 ? > 0.001 ms/op > LoopOverPollutedSegments.native_unsafe avgt 30 0.247 ? > 0.005 ms/op > > > Note that this patch tidies up a bit the usage of `checkValidState` vs. > `checkValidStateSlow`. The former should only really be used in the hot path, > while the latter is a more general routine which should be used in > non-performance critical code. Making `checkValidState` monomorphic caused > the `ScopeAccessError` to be generated in more places, so I needed to either > update the usage to use the safer `checkValidStateSlow` (where possible) or, > (in `Buffer` and `ConfinedScope`) just add extra wrapping. Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Use owner field instead of accessor in checkValidStateSlow - Changes: - all: https://git.openjdk.java.net/jdk18/pull/82/files - new: https://git.openjdk.java.net/jdk18/pull/82/files/c6082953..04a1e9f2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk18&pr=82&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk18&pr=82&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk18/pull/82.diff Fetch: git fetch https://git.openjdk.java.net/jdk18 pull/82/head:pull/82 PR: https://git.openjdk.java.net/jdk18/pull/82
Re: [jdk18] RFR: 8279527: Dereferencing segments backed by different scopes leads to pollution
On Wed, 5 Jan 2022 17:23:40 GMT, Paul Sandoz wrote: >> This patch fixes a performance issue when dereferencing memory segments >> backed by different kinds of scopes. When looking at inline traces, I found >> that one of our benchmark, namely `LoopOverPollutedSegment` was already >> hitting the ceiling of the bimorphic inline cache, specifically when >> checking liveness of the segment scope in the memory access hotpath >> (`ResourceScopeImpl::checkValidState`). The benchmark only used segments >> backed by confined and global scope. I then added (in the initialization >> "polluting" loop) segments backed by a shared scope, and then the benchmark >> numbers started to look as follows: >> >> >> Benchmark Mode Cnt Score >> Error Units >> LoopOverPollutedSegments.heap_segment_floats_VHavgt 30 7.004 ? >> 0.089 ms/op >> LoopOverPollutedSegments.heap_segment_floats_instance avgt 30 7.159 ? >> 0.016 ms/op >> LoopOverPollutedSegments.heap_segment_ints_VH avgt 30 7.017 ? >> 0.110 ms/op >> LoopOverPollutedSegments.heap_segment_ints_instanceavgt 30 7.175 ? >> 0.048 ms/op >> LoopOverPollutedSegments.heap_unsafe avgt 30 0.243 ? >> 0.004 ms/op >> LoopOverPollutedSegments.native_segment_VH avgt 30 7.366 ? >> 0.036 ms/op >> LoopOverPollutedSegments.native_segment_instance avgt 30 7.305 ? >> 0.098 ms/op >> LoopOverPollutedSegments.native_unsafe avgt 30 0.238 ? >> 0.002 ms/op >> >> >> That is, since now we have *three* different kinds of scopes (confined, >> shared and global), the call to the liveness check can no longer be inlined. >> One solution could be, as we do for the *base* accessor, to add a scope >> accessor to all memory segment implementation classes. But doing so only >> works ok for heap segments (for which the scope accessor just returns the >> global scope constants). For native segments, we're still megamorphic (as a >> native segment can be backed by all kinds of scopes). >> >> In the end, it turned out to be much simpler to just make the liveness check >> monomorphic, since there's so much sharing between the code paths already. >> With that change, numbers of the tweaked benchmark go back to normal: >> >> >> Benchmark Mode Cnt Score >> Error Units >> LoopOverPollutedSegments.heap_segment_floats_VHavgt 30 0.241 ? >> 0.003 ms/op >> LoopOverPollutedSegments.heap_segment_floats_instance avgt 30 0.244 ? >> 0.003 ms/op >> LoopOverPollutedSegments.heap_segment_ints_VH avgt 30 0.242 ? >> 0.003 ms/op >> LoopOverPollutedSegments.heap_segment_ints_instanceavgt 30 0.248 ? >> 0.001 ms/op >> LoopOverPollutedSegments.heap_unsafe avgt 30 0.247 ? >> 0.013 ms/op >> LoopOverPollutedSegments.native_segment_VH avgt 30 0.245 ? >> 0.004 ms/op >> LoopOverPollutedSegments.native_segment_instance avgt 30 0.245 ? >> 0.001 ms/op >> LoopOverPollutedSegments.native_unsafe avgt 30 0.247 ? >> 0.005 ms/op >> >> >> Note that this patch tidies up a bit the usage of `checkValidState` vs. >> `checkValidStateSlow`. The former should only really be used in the hot >> path, while the latter is a more general routine which should be used in >> non-performance critical code. Making `checkValidState` monomorphic caused >> the `ScopeAccessError` to be generated in more places, so I needed to either >> update the usage to use the safer `checkValidStateSlow` (where possible) or, >> (in `Buffer` and `ConfinedScope`) just add extra wrapping. > > src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/ResourceScopeImpl.java > line 190: > >> 188: @ForceInline >> 189: public final void checkValidState() { >> 190: if (owner != null && owner != Thread.currentThread()) { > > For consistency we could change code `checkValidStateSlow` to refer directly > to `owner`. > > It would be satisfying, but I don't know if it's possible, to compose > `checkValidStateSlow` from `checkValidState` e.g. > > public final checkValidStateSlow() { > checkValidState(); > if (!isAlive() { ... } > } I'll change use of `owner`. It's not really possible to write checkValidStateSlow in terms of checkValidState, because the latter does a plain read of the state, whereas the former does a volatile read. Reusing one from the other would result in two reads (a plain and a volatile). - PR: https://git.openjdk.java.net/jdk18/pull/82
Re: [jdk18] RFR: 8279529: ProblemList java/nio/channels/DatagramChannel/ManySourcesAndTargets.java on macosx-aarch64
On Wed, 5 Jan 2022 17:22:54 GMT, Daniel D. Daugherty wrote: > A couple of trivial ProblemListings: > > JDK-8279529 ProblemList > java/nio/channels/DatagramChannel/ManySourcesAndTargets.java on macosx-aarch64 > JDK-8279532 ProblemList > sun/security/ssl/SSLSessionImpl/NoInvalidateSocketException.java Looks good to me. - Marked as reviewed by jnimeh (Reviewer). PR: https://git.openjdk.java.net/jdk18/pull/83
[jdk18] RFR: 8279529: ProblemList java/nio/channels/DatagramChannel/ManySourcesAndTargets.java on macosx-aarch64
A couple of trivial ProblemListings: JDK-8279529 ProblemList java/nio/channels/DatagramChannel/ManySourcesAndTargets.java on macosx-aarch64 JDK-8279532 ProblemList sun/security/ssl/SSLSessionImpl/NoInvalidateSocketException.java - Commit messages: - 8279532: ProblemList sun/security/ssl/SSLSessionImpl/NoInvalidateSocketException.java - 8279529: ProblemList java/nio/channels/DatagramChannel/ManySourcesAndTargets.java on macosx-aarch64 Changes: https://git.openjdk.java.net/jdk18/pull/83/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk18&pr=83&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8279529 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk18/pull/83.diff Fetch: git fetch https://git.openjdk.java.net/jdk18 pull/83/head:pull/83 PR: https://git.openjdk.java.net/jdk18/pull/83
Idea: Collection Literals
Hello! Hopefully this is the right place for this suggestion - I originally posted this on the discuss list, but was told this list would be more appropriate. Recently, I've read some of the discussion linked in JEP 269 (Convenience Factory Methods for Collections), including why Collection Literals were shelved when brought up in Project Coin and Lambda. I believe that things may be different enough now (with dynamic constants possible, local variable inference and lambdas present, value types closer to completion, and Project Amber bringing "smaller" features under consideration) to bring them up yet again. I've tried writing an informal proposal for them that tries to answer the main questions brought up in those original discussions: https://gist.github.com/l-Luna/08b6574d0c840de93634cf8d1e43c494 I understand that it's still unlikely to be added in the near future, but thought it wouldn't hurt to bring up again.
Re: [jdk18] RFR: 8279527: Dereferencing segments backed by different scopes leads to pollution
On Wed, 5 Jan 2022 16:59:30 GMT, Maurizio Cimadamore wrote: > This patch fixes a performance issue when dereferencing memory segments > backed by different kinds of scopes. When looking at inline traces, I found > that one of our benchmark, namely `LoopOverPollutedSegment` was already > hitting the ceiling of the bimorphic inline cache, specifically when checking > liveness of the segment scope in the memory access hotpath > (`ResourceScopeImpl::checkValidState`). The benchmark only used segments > backed by confined and global scope. I then added (in the initialization > "polluting" loop) segments backed by a shared scope, and then the benchmark > numbers started to look as follows: > > > Benchmark Mode Cnt Score > Error Units > LoopOverPollutedSegments.heap_segment_floats_VHavgt 30 7.004 ? > 0.089 ms/op > LoopOverPollutedSegments.heap_segment_floats_instance avgt 30 7.159 ? > 0.016 ms/op > LoopOverPollutedSegments.heap_segment_ints_VH avgt 30 7.017 ? > 0.110 ms/op > LoopOverPollutedSegments.heap_segment_ints_instanceavgt 30 7.175 ? > 0.048 ms/op > LoopOverPollutedSegments.heap_unsafe avgt 30 0.243 ? > 0.004 ms/op > LoopOverPollutedSegments.native_segment_VH avgt 30 7.366 ? > 0.036 ms/op > LoopOverPollutedSegments.native_segment_instance avgt 30 7.305 ? > 0.098 ms/op > LoopOverPollutedSegments.native_unsafe avgt 30 0.238 ? > 0.002 ms/op > > > That is, since now we have *three* different kinds of scopes (confined, > shared and global), the call to the liveness check can no longer be inlined. > One solution could be, as we do for the *base* accessor, to add a scope > accessor to all memory segment implementation classes. But doing so only > works ok for heap segments (for which the scope accessor just returns the > global scope constants). For native segments, we're still megamorphic (as a > native segment can be backed by all kinds of scopes). > > In the end, it turned out to be much simpler to just make the liveness check > monomorphic, since there's so much sharing between the code paths already. > With that change, numbers of the tweaked benchmark go back to normal: > > > Benchmark Mode Cnt Score > Error Units > LoopOverPollutedSegments.heap_segment_floats_VHavgt 30 0.241 ? > 0.003 ms/op > LoopOverPollutedSegments.heap_segment_floats_instance avgt 30 0.244 ? > 0.003 ms/op > LoopOverPollutedSegments.heap_segment_ints_VH avgt 30 0.242 ? > 0.003 ms/op > LoopOverPollutedSegments.heap_segment_ints_instanceavgt 30 0.248 ? > 0.001 ms/op > LoopOverPollutedSegments.heap_unsafe avgt 30 0.247 ? > 0.013 ms/op > LoopOverPollutedSegments.native_segment_VH avgt 30 0.245 ? > 0.004 ms/op > LoopOverPollutedSegments.native_segment_instance avgt 30 0.245 ? > 0.001 ms/op > LoopOverPollutedSegments.native_unsafe avgt 30 0.247 ? > 0.005 ms/op > > > Note that this patch tidies up a bit the usage of `checkValidState` vs. > `checkValidStateSlow`. The former should only really be used in the hot path, > while the latter is a more general routine which should be used in > non-performance critical code. Making `checkValidState` monomorphic caused > the `ScopeAccessError` to be generated in more places, so I needed to either > update the usage to use the safer `checkValidStateSlow` (where possible) or, > (in `Buffer` and `ConfinedScope`) just add extra wrapping. src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/ResourceScopeImpl.java line 190: > 188: @ForceInline > 189: public final void checkValidState() { > 190: if (owner != null && owner != Thread.currentThread()) { For consistency we could change code `checkValidStateSlow` to refer directly to `owner`. It would be satisfying, but I don't know if it's possible, to compose `checkValidStateSlow` from `checkValidState` e.g. public final checkValidStateSlow() { checkValidState(); if (!isAlive() { ... } } - PR: https://git.openjdk.java.net/jdk18/pull/82
Re: [jdk18] RFR: 8279527: Dereferencing segments backed by different scopes leads to pollution
On Wed, 5 Jan 2022 16:59:30 GMT, Maurizio Cimadamore wrote: > This patch fixes a performance issue when dereferencing memory segments > backed by different kinds of scopes. When looking at inline traces, I found > that one of our benchmark, namely `LoopOverPollutedSegment` was already > hitting the ceiling of the bimorphic inline cache, specifically when checking > liveness of the segment scope in the memory access hotpath > (`ResourceScopeImpl::checkValidState`). The benchmark only used segments > backed by confined and global scope. I then added (in the initialization > "polluting" loop) segments backed by a shared scope, and then the benchmark > numbers started to look as follows: > > > Benchmark Mode Cnt Score > Error Units > LoopOverPollutedSegments.heap_segment_floats_VHavgt 30 7.004 ? > 0.089 ms/op > LoopOverPollutedSegments.heap_segment_floats_instance avgt 30 7.159 ? > 0.016 ms/op > LoopOverPollutedSegments.heap_segment_ints_VH avgt 30 7.017 ? > 0.110 ms/op > LoopOverPollutedSegments.heap_segment_ints_instanceavgt 30 7.175 ? > 0.048 ms/op > LoopOverPollutedSegments.heap_unsafe avgt 30 0.243 ? > 0.004 ms/op > LoopOverPollutedSegments.native_segment_VH avgt 30 7.366 ? > 0.036 ms/op > LoopOverPollutedSegments.native_segment_instance avgt 30 7.305 ? > 0.098 ms/op > LoopOverPollutedSegments.native_unsafe avgt 30 0.238 ? > 0.002 ms/op > > > That is, since now we have *three* different kinds of scopes (confined, > shared and global), the call to the liveness check can no longer be inlined. > One solution could be, as we do for the *base* accessor, to add a scope > accessor to all memory segment implementation classes. But doing so only > works ok for heap segments (for which the scope accessor just returns the > global scope constants). For native segments, we're still megamorphic (as a > native segment can be backed by all kinds of scopes). > > In the end, it turned out to be much simpler to just make the liveness check > monomorphic, since there's so much sharing between the code paths already. > With that change, numbers of the tweaked benchmark go back to normal: > > > Benchmark Mode Cnt Score > Error Units > LoopOverPollutedSegments.heap_segment_floats_VHavgt 30 0.241 ? > 0.003 ms/op > LoopOverPollutedSegments.heap_segment_floats_instance avgt 30 0.244 ? > 0.003 ms/op > LoopOverPollutedSegments.heap_segment_ints_VH avgt 30 0.242 ? > 0.003 ms/op > LoopOverPollutedSegments.heap_segment_ints_instanceavgt 30 0.248 ? > 0.001 ms/op > LoopOverPollutedSegments.heap_unsafe avgt 30 0.247 ? > 0.013 ms/op > LoopOverPollutedSegments.native_segment_VH avgt 30 0.245 ? > 0.004 ms/op > LoopOverPollutedSegments.native_segment_instance avgt 30 0.245 ? > 0.001 ms/op > LoopOverPollutedSegments.native_unsafe avgt 30 0.247 ? > 0.005 ms/op > > > Note that this patch tidies up a bit the usage of `checkValidState` vs. > `checkValidStateSlow`. The former should only really be used in the hot path, > while the latter is a more general routine which should be used in > non-performance critical code. Making `checkValidState` monomorphic caused > the `ScopeAccessError` to be generated in more places, so I needed to either > update the usage to use the safer `checkValidStateSlow` (where possible) or, > (in `Buffer` and `ConfinedScope`) just add extra wrapping. src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/AbstractMemorySegmentImpl.java line 367: > 365: > 366: void checkValidState() { > 367: scope.checkValidStateSlow(); Not to be confused with `ResourceScope::checkValidState` - this method is only really used by other non-performance critical code around `AbstractMemorySegmentImpl`. - PR: https://git.openjdk.java.net/jdk18/pull/82
[jdk18] RFR: 8279527: Dereferencing segments backed by different scopes leads to pollution
This patch fixes a performance issue when dereferencing memory segments backed by different kinds of scopes. When looking at inline traces, I found that one of our benchmark, namely `LoopOverPollutedSegment` was already hitting the ceiling of the bimorphic inline cache, specifically when checking liveness of the segment scope in the memory access hotpath (`ResourceScopeImpl::checkValidState`). The benchmark only used segments backed by confined and global scope. I then added (in the initialization "polluting" loop) segments backed by a shared scope, and then the benchmark numbers started to look as follows: Benchmark Mode Cnt Score Error Units LoopOverPollutedSegments.heap_segment_floats_VHavgt 30 7.004 ? 0.089 ms/op LoopOverPollutedSegments.heap_segment_floats_instance avgt 30 7.159 ? 0.016 ms/op LoopOverPollutedSegments.heap_segment_ints_VH avgt 30 7.017 ? 0.110 ms/op LoopOverPollutedSegments.heap_segment_ints_instanceavgt 30 7.175 ? 0.048 ms/op LoopOverPollutedSegments.heap_unsafe avgt 30 0.243 ? 0.004 ms/op LoopOverPollutedSegments.native_segment_VH avgt 30 7.366 ? 0.036 ms/op LoopOverPollutedSegments.native_segment_instance avgt 30 7.305 ? 0.098 ms/op LoopOverPollutedSegments.native_unsafe avgt 30 0.238 ? 0.002 ms/op That is, since now we have *three* different kinds of scopes (confined, shared and global), the call to the liveness check can no longer be inlined. One solution could be, as we do for the *base* accessor, to add a scope accessor to all memory segment implementation classes. But doing so only works ok for heap segments (for which the scope accessor just returns the global scope constants). For native segments, we're still megamorphic (as a native segment can be backed by all kinds of scopes). In the end, it turned out to be much simpler to just make the liveness check monomorphic, since there's so much sharing between the code paths already. With that change, numbers of the tweaked benchmark go back to normal: Benchmark Mode Cnt Score Error Units LoopOverPollutedSegments.heap_segment_floats_VHavgt 30 0.241 ? 0.003 ms/op LoopOverPollutedSegments.heap_segment_floats_instance avgt 30 0.244 ? 0.003 ms/op LoopOverPollutedSegments.heap_segment_ints_VH avgt 30 0.242 ? 0.003 ms/op LoopOverPollutedSegments.heap_segment_ints_instanceavgt 30 0.248 ? 0.001 ms/op LoopOverPollutedSegments.heap_unsafe avgt 30 0.247 ? 0.013 ms/op LoopOverPollutedSegments.native_segment_VH avgt 30 0.245 ? 0.004 ms/op LoopOverPollutedSegments.native_segment_instance avgt 30 0.245 ? 0.001 ms/op LoopOverPollutedSegments.native_unsafe avgt 30 0.247 ? 0.005 ms/op Note that this patch tidies up a bit the usage of `checkValidState` vs. `checkValidStateSlow`. The former should only really be used in the hot path, while the latter is a more general routine which should be used in non-performance critical code. Making `checkValidState` monomorphic caused the `ScopeAccessError` to be generated in more places, so I needed to either update the usage to use the safer `checkValidStateSlow` (where possible) or, (in `Buffer` and `ConfinedScope`) just add extra wrapping. - Commit messages: - Initial push Changes: https://git.openjdk.java.net/jdk18/pull/82/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk18&pr=82&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8279527 Stats: 108 lines in 8 files changed: 41 ins; 49 del; 18 mod Patch: https://git.openjdk.java.net/jdk18/pull/82.diff Fetch: git fetch https://git.openjdk.java.net/jdk18 pull/82/head:pull/82 PR: https://git.openjdk.java.net/jdk18/pull/82
Integrated: 8279453: Disable tools/jar/ReproducibleJar.java on 32-bit platforms
On Tue, 4 Jan 2022 16:46:09 GMT, Aleksey Shipilev wrote: > The real problem is Y2038 > ([JDK-8279444](https://bugs.openjdk.java.net/browse/JDK-8279444)), which does > not look solvable at this time. So for test cleanliness, we might just > disable this test on 32-bit platforms. > > Additional testing: > - [x] Linux x86_64 fastdebug, affected test still passes > - [x] Linux x86_32 fastdebug, affected test is now skipped This pull request has now been integrated. Changeset: a741b927 Author:Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/a741b927a3cdc8e339ae557c77886ea850aa06b6 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8279453: Disable tools/jar/ReproducibleJar.java on 32-bit platforms Reviewed-by: alanb, bpb - PR: https://git.openjdk.java.net/jdk/pull/6957
Re: RFR: 8279453: Disable tools/jar/ReproducibleJar.java on 32-bit platforms
On Tue, 4 Jan 2022 16:46:09 GMT, Aleksey Shipilev wrote: > The real problem is Y2038 > ([JDK-8279444](https://bugs.openjdk.java.net/browse/JDK-8279444)), which does > not look solvable at this time. So for test cleanliness, we might just > disable this test on 32-bit platforms. > > Additional testing: > - [x] Linux x86_64 fastdebug, affected test still passes > - [x] Linux x86_32 fastdebug, affected test is now skipped Thanks for reviews! - PR: https://git.openjdk.java.net/jdk/pull/6957
[jdk18] Integrated: 8278897: Alignment of heap segments is not enforced correctly
On Thu, 16 Dec 2021 12:31:01 GMT, Maurizio Cimadamore wrote: > This PR fixes an issue with alignment constraints not being enforced > correctly on on-heap segments dereference/copy operations. Alignment of > on-heap segments cannot be computed exactly, as alignment of elements in > arrays is, ultimately a VM implementation detail. Because of this, alignment > checks on heap segments can fail or pass depending on the platform being used. > > For more details about the problem and the solution please refer to: > https://mail.openjdk.java.net/pipermail/panama-dev/2021-November/015852.html This pull request has now been integrated. Changeset: 9d43d25d Author:Maurizio Cimadamore URL: https://git.openjdk.java.net/jdk18/commit/9d43d25da8bcfff425a795dcc230914a384a5c82 Stats: 600 lines in 20 files changed: 566 ins; 0 del; 34 mod 8278897: Alignment of heap segments is not enforced correctly Reviewed-by: jvernee - PR: https://git.openjdk.java.net/jdk18/pull/37
Re: RFR: 8272746: ZipFile can't open big file (NegativeArraySizeException) [v2]
On Thu, 23 Dec 2021 16:42:50 GMT, Alan Bateman wrote: >> Masanori Yano has updated the pull request incrementally with one additional >> commit since the last revision: >> >> 8272746: ZipFile can't open big file (NegativeArraySizeException) > > src/java.base/share/classes/java/util/zip/ZipFile.java line 1501: > >> 1499: // read in the CEN and END >> 1500: if (end.cenlen + ENDHDR >= Integer.MAX_VALUE) { >> 1501: zerror("invalid END header (too large central >> directory size)"); > > This check looks correct. It might be a bit clearer to say that "central > directory size too large" rather than "too large central directory size". > > The bug report says that JDK 8 and the native zip handle these zip files, > were you able to check that? @AlanBateman Could you please review the above comments. - PR: https://git.openjdk.java.net/jdk/pull/6927
Re: RFR: 8276694: Pattern trailing unescaped backslash causes internal error
On Mon, 20 Dec 2021 09:57:14 GMT, Masanori Yano wrote: > Could you please review the 8276694 bug fixes? > > A message specific for this exception should be printed instead of an > internal error. This fix adds a new check to output an appropriate exception > message when the regular expression ends with an unescaped backslash. This > fix also checks the position of the cursor to rule out other syntax errors at > the middle position of the regular expression. Could someone please review this pull request? - PR: https://git.openjdk.java.net/jdk/pull/6891
Re: RFR: 8273322: Enhance macro logic optimization for masked logic operations. [v3]
On Tue, 4 Jan 2022 15:11:47 GMT, Jatin Bhateja wrote: >> Patch extends existing macrologic inferencing algorithm to handle masked >> logic operations. >> >> Existing algorithm: >> >> 1. Identify logic cone roots. >> 2. Packs parent and logic child nodes into a MacroLogic node in bottom up >> traversal if input constraint are met. >> i.e. maximum number of inputs which a macro logic node can have. >> 3. Perform symbolic evaluation of logic expression tree by assigning value >> corresponding to a truth table column >> to each input. >> 4. Inputs along with encoded function together represents a macro logic node >> which mimics a truth table. >> >> Modification: >> Extended the packing algorithm to operate on both predicated or >> non-predicated logic nodes. Following >> rules define the criteria under which nodes gets packed into a macro logic >> node:- >> >> 1. Parent and both child nodes are all unmasked or masked with same >> predicates. >> 2. Masked parent can be packed with left child if it is predicated and both >> have same prediates. >> 3. Masked parent can be packed with right child if its un-predicated or has >> matching predication condition. >> 4. An unmasked parent can be packed with an unmasked child. >> >> New jtreg test case added with the patch exhaustively covers all the >> different combinations of predications of parent and >> child nodes. >> >> Following are the performance number for JMH benchmark included with the >> patch. >> >> Machine Configuration: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S >> Icelake Server) >> >> Benchmark | ARRAYLEN | Baseline (ops/s) | Withopt (ops/s) | Gain ( >> withopt/baseline) >> -- | -- | -- | -- | -- >> o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 64 | 2365.421 | 5136.283 >> | 2.171403315 >> o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 128 | 2034.1 | 4073.381 | >> 2.002547072 >> o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 256 | 1568.694 | 2811.975 >> | 1.792558013 >> o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 512 | 883.261 | 1662.771 >> | 1.882536419 >> o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 1024 | 469.513 | 732.81 | >> 1.560787454 >> o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 64 | 273.049 | 552.106 | >> 2.022003377 >> o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 128 | 219.624 | 359.775 | >> 1.63814064 >> o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 256 | 131.649 | 182.23 | >> 1.384211046 >> o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 512 | 71.452 | 81.522 | >> 1.140933774 >> o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 1024 | 37.427 | 41.966 | >> 1.121276084 >> o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 64 | 2805.759 | 3383.16 | >> 1.205791374 >> o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 128 | 2069.012 | 2250.37 >> | 1.087654397 >> o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 256 | 1098.766 | 1101.996 >> | 1.002939661 >> o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 512 | 470.035 | 484.732 | >> 1.031267884 >> o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 1024 | 202.827 | 209.073 >> | 1.030794717 >> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 256 >> | 3435.989 | 4418.09 | 1.285827749 >> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 512 >> | 1524.803 | 1678.201 | 1.100601848 >> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | >> 1024 | 972.501 | 1166.734 | 1.199725244 >> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 256 >> | 5980.85 | 7584.17 | 1.268075608 >> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 512 >> | 3258.108 | 3939.23 | 1.209054457 >> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | >> 1024 | 1475.365 | 1511.159 | 1.024261115 >> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 256 >> | 4208.766 | 4220.678 | 1.002830283 >> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 512 >> | 2056.651 | 2049.489 | 0.99651764 >> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | >> 1024 | 1110.461 | 1116.448 | 1.005391455 >> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | >> 256 | 3259.348 | 3947.94 | 1.211266793 >> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | >> 512 | 1515.147 | 1536.647 | 1.014190042 >> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | >> 1024 | 911.58 | 1030.54 | 1.130498695 >> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | >> 256 | 2034.611 | 2073.764 | 1.019243482 >> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | >> 512 | 1110.659 | 1116.093 | 1.004892591 >> o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | >> 1024 | 559.269 | 559.651 | 1.000683034 >> o.o.b.jdk.incubator.vector.MaskedLogicOpt
Re: RFR: 8273322: Enhance macro logic optimization for masked logic operations. [v4]
> Patch extends existing macrologic inferencing algorithm to handle masked > logic operations. > > Existing algorithm: > > 1. Identify logic cone roots. > 2. Packs parent and logic child nodes into a MacroLogic node in bottom up > traversal if input constraint are met. > i.e. maximum number of inputs which a macro logic node can have. > 3. Perform symbolic evaluation of logic expression tree by assigning value > corresponding to a truth table column > to each input. > 4. Inputs along with encoded function together represents a macro logic node > which mimics a truth table. > > Modification: > Extended the packing algorithm to operate on both predicated or > non-predicated logic nodes. Following > rules define the criteria under which nodes gets packed into a macro logic > node:- > > 1. Parent and both child nodes are all unmasked or masked with same > predicates. > 2. Masked parent can be packed with left child if it is predicated and both > have same prediates. > 3. Masked parent can be packed with right child if its un-predicated or has > matching predication condition. > 4. An unmasked parent can be packed with an unmasked child. > > New jtreg test case added with the patch exhaustively covers all the > different combinations of predications of parent and > child nodes. > > Following are the performance number for JMH benchmark included with the > patch. > > Machine Configuration: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S > Icelake Server) > > Benchmark | ARRAYLEN | Baseline (ops/s) | Withopt (ops/s) | Gain ( > withopt/baseline) > -- | -- | -- | -- | -- > o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 64 | 2365.421 | 5136.283 | > 2.171403315 > o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 128 | 2034.1 | 4073.381 | > 2.002547072 > o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 256 | 1568.694 | 2811.975 > | 1.792558013 > o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 512 | 883.261 | 1662.771 | > 1.882536419 > o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 1024 | 469.513 | 732.81 | > 1.560787454 > o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 64 | 273.049 | 552.106 | > 2.022003377 > o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 128 | 219.624 | 359.775 | > 1.63814064 > o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 256 | 131.649 | 182.23 | > 1.384211046 > o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 512 | 71.452 | 81.522 | > 1.140933774 > o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 1024 | 37.427 | 41.966 | > 1.121276084 > o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 64 | 2805.759 | 3383.16 | > 1.205791374 > o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 128 | 2069.012 | 2250.37 | > 1.087654397 > o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 256 | 1098.766 | 1101.996 > | 1.002939661 > o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 512 | 470.035 | 484.732 | > 1.031267884 > o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 1024 | 202.827 | 209.073 | > 1.030794717 > o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 256 > | 3435.989 | 4418.09 | 1.285827749 > o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 512 > | 1524.803 | 1678.201 | 1.100601848 > o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 1024 > | 972.501 | 1166.734 | 1.199725244 > o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 256 > | 5980.85 | 7584.17 | 1.268075608 > o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 512 > | 3258.108 | 3939.23 | 1.209054457 > o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 1024 > | 1475.365 | 1511.159 | 1.024261115 > o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 256 > | 4208.766 | 4220.678 | 1.002830283 > o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 512 > | 2056.651 | 2049.489 | 0.99651764 > o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 1024 > | 1110.461 | 1116.448 | 1.005391455 > o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 256 > | 3259.348 | 3947.94 | 1.211266793 > o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 512 > | 1515.147 | 1536.647 | 1.014190042 > o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | > 1024 | 911.58 | 1030.54 | 1.130498695 > o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 256 > | 2034.611 | 2073.764 | 1.019243482 > o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 512 > | 1110.659 | 1116.093 | 1.004892591 > o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | > 1024 | 559.269 | 559.651 | 1.000683034 > o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 256 > | 3636.141 | 4446.505 | 1.222863745 > o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 512 > |
Re: RFR: 8273322: Enhance macro logic optimization for masked logic operations. [v2]
On Tue, 4 Jan 2022 02:25:36 GMT, Vladimir Kozlov wrote: > I think whole "Bitwise operation packing optimization" code should be moved > out from `compile.cpp`. May be to `vectornode.cpp where `MacroLogicVNode` > code is located. > Hi @vnkozlov , Yes we can also extended AndV/OrV/XorV/AndVMask/OrVMask/XorVMask idealizations to perform macro logic folding, current changes keeps the implementation clean and limited to one optimization stage. > Copyright year should be updated to 2022 in all changed files. - PR: https://git.openjdk.java.net/jdk/pull/6893