On Mon, 17 Jun 2024 23:02:47 GMT, Shaojin Wen <d...@openjdk.org> wrote:

>> [8318446](https://github.com/openjdk/jdk/pull/16245)  brings MergeStore. We 
>> need a JMH Benchmark to evaluate the performance of various batch operations 
>> and the effect of MergeStore.
>
> Shaojin Wen has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains 13 additional 
> commits since the last revision:
> 
>  - Merge remote-tracking branch 'upstream/master' into merge_store_bench
>  - bug fix for `putChars4C`
>  - bug fix for `putChars4C` and `putChars4S`
>  - use VarHandler CHAR_L & CHAR_B
>  - copyright
>  - bug fix for putIntBU
>  - add cases for `getChar` & `putChar`
>  - code format
>  - add `setIntRL` & `setIntRLU`
>  - add comments
>  - ... and 3 more: https://git.openjdk.org/jdk/compare/d4942ac0...4c9b9418

I re-ran the performance test based on WebRevs 04: 
[Full](https://webrevs.openjdk.org/?repo=jdk&pr=19734&range=04) - 
[Incremental](https://webrevs.openjdk.org/?repo=jdk&pr=19734&range=03-04) 
([4c9b9418](https://git.openjdk.org/jdk/pull/19734/files/4c9b9418fc4a95504867b6019b3e94605917f747))
 .

# 1. Cases MergeStore does not work
@eme64 
I found `putChars4BV` and `putChars4LV` to be two cases where MergeStore didn't 
work, if support can be enhanced, it would be useful for people using VarHandle.


putChars4BV
putChars4LV


I also found that the performance of the case using VarHandle is particularly 
good. Why? For example:

setIntBV
setIntLV
setLongBV
setLongLV


# 2. Performance numbers
The names of these cases have the following B/L/V/U suffixes, which are:

B BigEndian
L LittleEndian
V VarHandle
U Unsafe
R reverseBytes
C Unsafe.getChar & putChar
S Unsafe.getShort & putShort


## 2.1 MacBook M1 Pro (aarch64)

Benchmark                    Mode  Cnt      Score    Error  Units
MergeStoreBench.getCharB     avgt   15   5340.200 ?  7.038  ns/op
MergeStoreBench.getCharBU    avgt   15   5482.163 ?  7.922  ns/op
MergeStoreBench.getCharBV    avgt   15   5074.165 ?  6.759  ns/op
MergeStoreBench.getCharC     avgt   15   5051.763 ?  6.552  ns/op
MergeStoreBench.getCharL     avgt   15   5374.464 ?  9.783  ns/op
MergeStoreBench.getCharLU    avgt   15   5487.532 ?  6.368  ns/op
MergeStoreBench.getCharLV    avgt   15   5071.263 ?  9.717  ns/op
MergeStoreBench.getIntB      avgt   15   6277.984 ?  6.284  ns/op
MergeStoreBench.getIntBU     avgt   15   5232.984 ? 10.384  ns/op
MergeStoreBench.getIntBV     avgt   15   1206.264 ?  1.193  ns/op
MergeStoreBench.getIntL      avgt   15   6172.779 ?  1.962  ns/op
MergeStoreBench.getIntLU     avgt   15   5157.317 ? 16.077  ns/op
MergeStoreBench.getIntLV     avgt   15   2558.110 ?  3.402  ns/op
MergeStoreBench.getIntRB     avgt   15   6889.916 ? 36.955  ns/op
MergeStoreBench.getIntRBU    avgt   15   5769.950 ? 11.499  ns/op
MergeStoreBench.getIntRL     avgt   15   6625.605 ? 10.662  ns/op
MergeStoreBench.getIntRLU    avgt   15   5746.742 ? 11.945  ns/op
MergeStoreBench.getIntRU     avgt   15   2544.586 ?  2.769  ns/op
MergeStoreBench.getIntU      avgt   15   2541.119 ?  3.252  ns/op
MergeStoreBench.getLongB     avgt   15  12098.129 ? 31.451  ns/op
MergeStoreBench.getLongBU    avgt   15   9760.621 ? 16.427  ns/op
MergeStoreBench.getLongBV    avgt   15   2593.635 ?  4.698  ns/op
MergeStoreBench.getLongL     avgt   15  12031.065 ? 19.820  ns/op
MergeStoreBench.getLongLU    avgt   15   9653.938 ? 18.372  ns/op
MergeStoreBench.getLongLV    avgt   15   2557.521 ?  3.338  ns/op
MergeStoreBench.getLongRB    avgt   15  12092.061 ? 18.026  ns/op
MergeStoreBench.getLongRBU   avgt   15   9763.489 ? 17.347  ns/op
MergeStoreBench.getLongRL    avgt   15  12027.686 ? 17.472  ns/op
MergeStoreBench.getLongRLU   avgt   15   9649.433 ?  8.384  ns/op
MergeStoreBench.getLongRU    avgt   15   2546.239 ?  2.088  ns/op
MergeStoreBench.getLongU     avgt   15   2539.762 ?  1.439  ns/op
MergeStoreBench.putChars4B   avgt   15   8487.381 ? 23.170  ns/op
MergeStoreBench.putChars4BU  avgt   15   3830.198 ?  7.083  ns/op
MergeStoreBench.putChars4BV  avgt   15   5154.819 ? 10.348  ns/op
MergeStoreBench.putChars4C   avgt   15   5162.766 ? 15.041  ns/op
MergeStoreBench.putChars4L   avgt   15   8381.231 ? 20.135  ns/op
MergeStoreBench.putChars4LU  avgt   15   3827.784 ?  3.163  ns/op
MergeStoreBench.putChars4LV  avgt   15   5151.508 ?  4.907  ns/op
MergeStoreBench.putChars4S   avgt   15   5152.123 ?  7.407  ns/op
MergeStoreBench.setCharBS    avgt   15   5317.319 ? 28.445  ns/op
MergeStoreBench.setCharBV    avgt   15   5175.400 ?  7.110  ns/op
MergeStoreBench.setCharC     avgt   15   5085.752 ?  6.222  ns/op
MergeStoreBench.setCharLS    avgt   15   5294.766 ?  9.742  ns/op
MergeStoreBench.setCharLV    avgt   15   5108.269 ?  6.692  ns/op
MergeStoreBench.setIntB      avgt   15   5095.236 ?  2.838  ns/op
MergeStoreBench.setIntBU     avgt   15   5097.007 ?  4.249  ns/op
MergeStoreBench.setIntBV     avgt   15   1224.506 ?  0.976  ns/op
MergeStoreBench.setIntL      avgt   15   2764.388 ?  2.400  ns/op
MergeStoreBench.setIntLU     avgt   15   2573.624 ?  6.677  ns/op
MergeStoreBench.setIntLV     avgt   15   5105.804 ? 11.551  ns/op
MergeStoreBench.setIntRB     avgt   15   5348.785 ?  4.974  ns/op
MergeStoreBench.setIntRBU    avgt   15   5422.049 ? 31.009  ns/op
MergeStoreBench.setIntRL     avgt   15   5293.414 ?  8.204  ns/op
MergeStoreBench.setIntRLU    avgt   15   5126.889 ?  7.435  ns/op
MergeStoreBench.setIntRU     avgt   15   5097.927 ?  3.588  ns/op
MergeStoreBench.setIntU      avgt   15   5087.192 ? 11.806  ns/op
MergeStoreBench.setLongB     avgt   15  10249.037 ? 19.538  ns/op
MergeStoreBench.setLongBU    avgt   15  10238.910 ? 11.998  ns/op
MergeStoreBench.setLongBV    avgt   15   2663.647 ?  4.147  ns/op
MergeStoreBench.setLongL     avgt   15   6304.458 ?  4.588  ns/op
MergeStoreBench.setLongLU    avgt   15   2921.575 ? 10.649  ns/op
MergeStoreBench.setLongLV    avgt   15   2663.323 ?  1.188  ns/op
MergeStoreBench.setLongRB    avgt   15  10255.875 ? 19.754  ns/op
MergeStoreBench.setLongRBU   avgt   15  10227.856 ?  9.970  ns/op
MergeStoreBench.setLongRL    avgt   15   6641.173 ?  3.836  ns/op
MergeStoreBench.setLongRLU   avgt   15   3241.057 ? 22.250  ns/op
MergeStoreBench.setLongRU    avgt   15   2608.399 ?  2.243  ns/op
MergeStoreBench.setLongU     avgt   15   2594.970 ?  3.490  ns/op


## 2.2 Aliyun ecs.c8a.xlarge (x64)
* CPU AMD EPYCTM Genoa

Benchmark                    Mode  Cnt      Score     Error  Units
MergeStoreBench.getCharB     avgt   15   5969.667 ±  75.660  ns/op
MergeStoreBench.getCharBU    avgt   15   4576.650 ±  27.489  ns/op
MergeStoreBench.getCharBV    avgt   15   3085.061 ±   3.206  ns/op
MergeStoreBench.getCharC     avgt   15   2237.624 ±   1.383  ns/op
MergeStoreBench.getCharL     avgt   15   6044.112 ±   8.960  ns/op
MergeStoreBench.getCharLU    avgt   15   4538.252 ±   3.747  ns/op
MergeStoreBench.getCharLV    avgt   15   2221.833 ±   0.727  ns/op
MergeStoreBench.getIntB      avgt   15  11983.238 ±  74.190  ns/op
MergeStoreBench.getIntBU     avgt   15   9039.309 ±   6.332  ns/op
MergeStoreBench.getIntBV     avgt   15    303.874 ±   0.305  ns/op
MergeStoreBench.getIntL      avgt   15  10521.992 ±  15.238  ns/op
MergeStoreBench.getIntLU     avgt   15   8867.106 ±   7.014  ns/op
MergeStoreBench.getIntLV     avgt   15   2226.223 ±   0.887  ns/op
MergeStoreBench.getIntRB     avgt   15  12332.136 ±  19.948  ns/op
MergeStoreBench.getIntRBU    avgt   15  11114.256 ±   8.652  ns/op
MergeStoreBench.getIntRL     avgt   15  11206.728 ±  15.291  ns/op
MergeStoreBench.getIntRLU    avgt   15   9349.279 ±   7.379  ns/op
MergeStoreBench.getIntRU     avgt   15   2507.213 ±   1.222  ns/op
MergeStoreBench.getIntU      avgt   15   2495.432 ±   1.278  ns/op
MergeStoreBench.getLongB     avgt   15  26832.797 ±  19.316  ns/op
MergeStoreBench.getLongBU    avgt   15  13996.454 ±  17.628  ns/op
MergeStoreBench.getLongBV    avgt   15    605.548 ±   0.538  ns/op
MergeStoreBench.getLongL     avgt   15  26859.909 ±  31.234  ns/op
MergeStoreBench.getLongLU    avgt   15  14519.709 ±  23.482  ns/op
MergeStoreBench.getLongLV    avgt   15   2227.782 ±   0.535  ns/op
MergeStoreBench.getLongRB    avgt   15  26846.549 ±  17.321  ns/op
MergeStoreBench.getLongRBU   avgt   15  13994.948 ±  14.752  ns/op
MergeStoreBench.getLongRL    avgt   15  26838.819 ±  14.425  ns/op
MergeStoreBench.getLongRLU   avgt   15  14547.807 ±  73.859  ns/op
MergeStoreBench.getLongRU    avgt   15   3061.373 ±   1.690  ns/op
MergeStoreBench.getLongU     avgt   15   3049.441 ±   1.162  ns/op
MergeStoreBench.putChars4B   avgt   15  13411.014 ±   4.491  ns/op
MergeStoreBench.putChars4BU  avgt   15   4206.040 ±   4.317  ns/op
MergeStoreBench.putChars4BV  avgt   15   7948.154 ± 904.918  ns/op
MergeStoreBench.putChars4C   avgt   15   5316.859 ±   3.066  ns/op
MergeStoreBench.putChars4L   avgt   15  13419.757 ±  11.175  ns/op
MergeStoreBench.putChars4LU  avgt   15   4205.094 ±   5.079  ns/op
MergeStoreBench.putChars4LV  avgt   15   6734.543 ±   6.452  ns/op
MergeStoreBench.putChars4S   avgt   15   5323.487 ±  10.605  ns/op
MergeStoreBench.setCharBS    avgt   15   9225.082 ±  11.461  ns/op
MergeStoreBench.setCharBV    avgt   15   5242.360 ±  12.546  ns/op
MergeStoreBench.setCharC     avgt   15   4497.345 ±   7.426  ns/op
MergeStoreBench.setCharLS    avgt   15   8991.865 ±   7.281  ns/op
MergeStoreBench.setCharLV    avgt   15   2535.475 ±   4.230  ns/op
MergeStoreBench.setIntB      avgt   15   8036.698 ±   6.763  ns/op
MergeStoreBench.setIntBU     avgt   15  10332.333 ±  10.071  ns/op
MergeStoreBench.setIntBV     avgt   15    586.392 ±   1.024  ns/op
MergeStoreBench.setIntL      avgt   15   2541.327 ±   4.538  ns/op
MergeStoreBench.setIntLU     avgt   15   6122.574 ±  46.593  ns/op
MergeStoreBench.setIntLV     avgt   15    597.930 ±   0.672  ns/op
MergeStoreBench.setIntRB     avgt   15   9740.301 ±   3.367  ns/op
MergeStoreBench.setIntRBU    avgt   15  10648.285 ±  29.338  ns/op
MergeStoreBench.setIntRL     avgt   15   6227.445 ±  15.378  ns/op
MergeStoreBench.setIntRLU    avgt   15   8409.781 ±  61.847  ns/op
MergeStoreBench.setIntRU     avgt   15    631.337 ±   6.930  ns/op
MergeStoreBench.setIntU      avgt   15    604.432 ±   0.682  ns/op
MergeStoreBench.setLongB     avgt   15  17184.183 ±  11.490  ns/op
MergeStoreBench.setLongBU    avgt   15  21377.695 ±  51.384  ns/op
MergeStoreBench.setLongBV    avgt   15   1191.037 ±  10.983  ns/op
MergeStoreBench.setLongL     avgt   15   3342.476 ±   4.704  ns/op
MergeStoreBench.setLongLU    avgt   15   6194.791 ±  13.241  ns/op
MergeStoreBench.setLongLV    avgt   15   1194.042 ±   2.943  ns/op
MergeStoreBench.setLongRB    avgt   15  17946.742 ±  26.888  ns/op
MergeStoreBench.setLongRBU   avgt   15  21342.899 ±  22.937  ns/op
MergeStoreBench.setLongRL    avgt   15   4034.050 ±   3.792  ns/op
MergeStoreBench.setLongRLU   avgt   15   4825.627 ±  11.409  ns/op
MergeStoreBench.setLongRU    avgt   15   1170.252 ±   1.582  ns/op
MergeStoreBench.setLongU     avgt   15   1192.220 ±   1.060  ns/op


## 2.3 Aliyun ecs.c8i.xlarge (x64)
* CPU CPU Intel® Xeon® Emerald

Benchmark                    Mode  Cnt      Score     Error  Units
MergeStoreBench.getCharB     avgt   15   5374.604 ±  11.001  ns/op
MergeStoreBench.getCharBU    avgt   15   4760.386 ±  20.612  ns/op
MergeStoreBench.getCharBV    avgt   15   3068.661 ±   2.712  ns/op
MergeStoreBench.getCharC     avgt   15   2591.548 ±   0.428  ns/op
MergeStoreBench.getCharL     avgt   15   5224.986 ±   3.388  ns/op
MergeStoreBench.getCharLU    avgt   15   4781.157 ±  19.001  ns/op
MergeStoreBench.getCharLV    avgt   15   2577.009 ±   1.374  ns/op
MergeStoreBench.getIntB      avgt   15  10512.241 ±  17.214  ns/op
MergeStoreBench.getIntBU     avgt   15   9271.460 ±  17.628  ns/op
MergeStoreBench.getIntBV     avgt   15    255.186 ±   0.731  ns/op
MergeStoreBench.getIntL      avgt   15   9728.629 ±   2.364  ns/op
MergeStoreBench.getIntLU     avgt   15   8983.810 ±   2.463  ns/op
MergeStoreBench.getIntLV     avgt   15   2569.886 ±   1.389  ns/op
MergeStoreBench.getIntRB     avgt   15  11285.198 ±  15.566  ns/op
MergeStoreBench.getIntRBU    avgt   15  10321.709 ±   4.604  ns/op
MergeStoreBench.getIntRL     avgt   15  10567.777 ±   3.931  ns/op
MergeStoreBench.getIntRLU    avgt   15   9436.647 ±  16.046  ns/op
MergeStoreBench.getIntRU     avgt   15   2327.805 ±   0.495  ns/op
MergeStoreBench.getIntU      avgt   15   2310.299 ±   2.477  ns/op
MergeStoreBench.getLongB     avgt   15  21698.862 ±  58.286  ns/op
MergeStoreBench.getLongBU    avgt   15  14682.074 ±  22.913  ns/op
MergeStoreBench.getLongBV    avgt   15    649.422 ±   2.738  ns/op
MergeStoreBench.getLongL     avgt   15  21584.034 ±  29.685  ns/op
MergeStoreBench.getLongLU    avgt   15  14346.370 ±   5.548  ns/op
MergeStoreBench.getLongLV    avgt   15   2574.877 ±   0.748  ns/op
MergeStoreBench.getLongRB    avgt   15  21689.446 ±  31.897  ns/op
MergeStoreBench.getLongRBU   avgt   15  14678.181 ±   3.447  ns/op
MergeStoreBench.getLongRL    avgt   15  21578.598 ±   4.353  ns/op
MergeStoreBench.getLongRLU   avgt   15  14350.201 ±  37.668  ns/op
MergeStoreBench.getLongRU    avgt   15   2988.364 ±   3.983  ns/op
MergeStoreBench.getLongU     avgt   15   2941.190 ±   0.582  ns/op
MergeStoreBench.putChars4B   avgt   15  10434.718 ±   3.309  ns/op
MergeStoreBench.putChars4BU  avgt   15   3008.607 ±   1.378  ns/op
MergeStoreBench.putChars4BV  avgt   15   7151.913 ± 483.572  ns/op
MergeStoreBench.putChars4C   avgt   15   6489.426 ±   1.369  ns/op
MergeStoreBench.putChars4L   avgt   15  10436.577 ±   5.568  ns/op
MergeStoreBench.putChars4LU  avgt   15   2837.432 ±   0.697  ns/op
MergeStoreBench.putChars4LV  avgt   15   7024.161 ±   9.887  ns/op
MergeStoreBench.putChars4S   avgt   15   6495.194 ±  12.316  ns/op
MergeStoreBench.setCharBS    avgt   15   8865.676 ±   6.476  ns/op
MergeStoreBench.setCharBV    avgt   15   5002.613 ±  20.300  ns/op
MergeStoreBench.setCharC     avgt   15   3936.314 ±   7.415  ns/op
MergeStoreBench.setCharLS    avgt   15   6989.120 ±  23.404  ns/op
MergeStoreBench.setCharLV    avgt   15   2589.797 ±   2.805  ns/op
MergeStoreBench.setIntB      avgt   15   6891.353 ±  13.239  ns/op
MergeStoreBench.setIntBU     avgt   15  10188.827 ±  21.409  ns/op
MergeStoreBench.setIntBV     avgt   15    899.335 ±   2.777  ns/op
MergeStoreBench.setIntL      avgt   15   2889.929 ±   6.582  ns/op
MergeStoreBench.setIntLU     avgt   15   5314.714 ±   5.170  ns/op
MergeStoreBench.setIntLV     avgt   15    945.432 ±   1.255  ns/op
MergeStoreBench.setIntRB     avgt   15   8159.294 ±  16.214  ns/op
MergeStoreBench.setIntRBU    avgt   15  10625.120 ±  12.809  ns/op
MergeStoreBench.setIntRL     avgt   15   6035.911 ±  47.780  ns/op
MergeStoreBench.setIntRLU    avgt   15   7148.487 ±  73.927  ns/op
MergeStoreBench.setIntRU     avgt   15    969.966 ±   6.127  ns/op
MergeStoreBench.setIntU      avgt   15    988.272 ±   2.214  ns/op
MergeStoreBench.setLongB     avgt   15  15857.394 ±   9.621  ns/op
MergeStoreBench.setLongBU    avgt   15  22955.799 ±   6.266  ns/op
MergeStoreBench.setLongBV    avgt   15   1831.898 ±   5.519  ns/op
MergeStoreBench.setLongL     avgt   15   4344.954 ±   4.273  ns/op
MergeStoreBench.setLongLU    avgt   15   5452.006 ±   9.333  ns/op
MergeStoreBench.setLongLV    avgt   15   1910.294 ±  22.688  ns/op
MergeStoreBench.setLongRB    avgt   15  16990.616 ±  59.974  ns/op
MergeStoreBench.setLongRBU   avgt   15  24951.367 ±  47.760  ns/op
MergeStoreBench.setLongRL    avgt   15   4484.135 ±   5.756  ns/op
MergeStoreBench.setLongRLU   avgt   15   4891.413 ±  26.743  ns/op
MergeStoreBench.setLongRU    avgt   15   1820.416 ±  11.285  ns/op
MergeStoreBench.setLongU     avgt   15   1932.694 ±  28.488  ns/op


[MergeStoreBench.txt](https://github.com/user-attachments/files/15878863/MergeStoreBench.txt)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19734#issuecomment-2174717821

Reply via email to