Patch extends existing macrologic inferencing algorithm to handle masked logic 
operations.

Existing algorithm:

1. Identify logic cone roots.
2. Packs parent and logic child nodes into a MacroLogic node in bottom up 
traversal if input constraint are met.
i.e. maximum number of inputs which a macro logic node can have.
3. Perform symbolic evaluation of logic expression tree by assigning value 
corresponding to a truth table column
to each input.
4. Inputs along with encoded function together represents a macro logic node 
which mimics a truth table.

Modification:
Extended the packing algorithm to operate on both predicated or non-predicated 
logic nodes. Following
rules define the criteria under which nodes gets packed into a macro logic 
node:-

1. Parent and both child nodes are all unmasked or masked with same predicates.
2. Masked parent can be packed with left child if it is predicated and both 
have same prediates.
3. Masked parent can be packed with right child if its un-predicated or has 
matching predication condition.
4. An unmasked parent can be packed with an unmasked child.

New jtreg test case added with the patch exhaustively covers all the different 
combinations of predications of parent and
child nodes.

Following are the performance number for JMH benchmark included with the patch.

Machine Configuration:  Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S 
Icelake Server)

Benchmark | ARRAYLEN | Baseline (ops/s) | Withopt (ops/s) | Gain ( 
withopt/baseline)
-- | -- | -- | -- | --
o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 64 | 2365.421 | 5136.283 | 
2.171403315
o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 128 | 2034.1 | 4073.381 | 
2.002547072
o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 256 | 1568.694 | 2811.975 | 
1.792558013
o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 512 | 883.261 | 1662.771 | 
1.882536419
o.o.b.vm.compiler.MacroLogicOpt.workload1_caller | 1024 | 469.513 | 732.81 | 
1.560787454
o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 64 | 273.049 | 552.106 | 
2.022003377
o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 128 | 219.624 | 359.775 | 
1.63814064
o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 256 | 131.649 | 182.23 | 
1.384211046
o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 512 | 71.452 | 81.522 | 
1.140933774
o.o.b.vm.compiler.MacroLogicOpt.workload2_caller | 1024 | 37.427 | 41.966 | 
1.121276084
o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 64 | 2805.759 | 3383.16 | 
1.205791374
o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 128 | 2069.012 | 2250.37 | 
1.087654397
o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 256 | 1098.766 | 1101.996 | 
1.002939661
o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 512 | 470.035 | 484.732 | 
1.031267884
o.o.b.vm.compiler.MacroLogicOpt.workload3_caller | 1024 | 202.827 | 209.073 | 
1.030794717
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 256 | 
3435.989 | 4418.09 | 1.285827749
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 512 | 
1524.803 | 1678.201 | 1.100601848
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt128 | 1024 | 
972.501 | 1166.734 | 1.199725244
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 256 | 
5980.85 | 7584.17 | 1.268075608
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 512 | 
3258.108 | 3939.23 | 1.209054457
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt256 | 1024 | 
1475.365 | 1511.159 | 1.024261115
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 256 | 
4208.766 | 4220.678 | 1.002830283
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 512 | 
2056.651 | 2049.489 | 0.99651764
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationInt512 | 1024 | 
1110.461 | 1116.448 | 1.005391455
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 256 | 
3259.348 | 3947.94 | 1.211266793
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 512 | 
1515.147 | 1536.647 | 1.014190042
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong256 | 1024 
| 911.58 | 1030.54 | 1.130498695
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 256 | 
2034.611 | 2073.764 | 1.019243482
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 512 | 
1110.659 | 1116.093 | 1.004892591
o.o.b.jdk.incubator.vector.MaskedLogicOpts.bitwiseBlendOperationLong512 | 1024 
| 559.269 | 559.651 | 1.000683034
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 256 | 
3636.141 | 4446.505 | 1.222863745
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 512 | 
1433.145 | 1681.261 | 1.173126934
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt128 | 1024 | 
1000.107 | 1172.866 | 1.172740517
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 256 | 
5568.313 | 7670.259 | 1.37748345
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 512 | 
3350.108 | 3927.803 | 1.172440709
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt256 | 1024 | 
1495.966 | 1541.56 | 1.030477965
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 256 | 
4230.379 | 4282.154 | 1.012238856
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 512 | 
2029.801 | 2049.638 | 1.009772879
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsInt512 | 1024 | 
1108.738 | 1118.897 | 1.00916267
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 256 | 
3802.801 | 3783.537 | 0.99493426
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 512 | 
1546.244 | 1552.691 | 1.004169458
o.o.b.jdk.incubator.vector.MaskedLogicOpts.maskedLogicOperationsLong256 | 1024 
| 1017.512 | 1020.075 | 1.002518889
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 
| 256 | 4159.835 | 4527.676 | 1.088426825
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 
| 512 | 1665.335 | 1733.04 | 1.040655484
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt128 
| 1024 | 1150.319 | 1181.935 | 1.02748455
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 
| 256 | 6989.791 | 7382.883 | 1.056238019
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 
| 512 | 3711.362 | 3911.921 | 1.054039191
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt256 
| 1024 | 1540.341 | 1554.175 | 1.008981128
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 
| 256 | 4164.559 | 4213.546 | 1.01176283
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 
| 512 | 2072.91 | 2079.105 | 1.002988552
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsInt512 
| 1024 | 1112.678 | 1116.675 | 1.003592234
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256
 | 256 | 3702.998 | 3906.093 | 1.0548461
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256
 | 512 | 1536.571 | 1546.043 | 1.006164375
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong256
 | 1024 | 996.906 | 1013.649 | 1.016794964
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512
 | 256 | 2045.594 | 2048.966 | 1.001648421
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512
 | 512 | 1111.933 | 1117.689 | 1.005176571
o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512
 | 1024 | 559.971 | 561.144 | 1.002094751


Kindly review and share your feedback.

Best Regards,
Jatin

-------------

Commit messages:
 - 8273322: Enhance macro logic optimization for masked logic operations.

Changes: https://git.openjdk.java.net/jdk/pull/6893/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6893&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8273322
  Stats: 1413 lines in 12 files changed: 1370 ins; 6 del; 37 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6893.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6893/head:pull/6893

PR: https://git.openjdk.java.net/jdk/pull/6893

Reply via email to