On Sun, 16 Jun 2024 07:17:16 GMT, Shaojin Wen <d...@openjdk.org> wrote:
> [8318446](https://github.com/openjdk/jdk/pull/16245) brings MergeStore. We > need a JMH Benchmark to evaluate the performance of various batch operations > and the effect of MergeStore. A few extra comments: There is already a `MergeStore` benchmark. I would prefer if you put yours next to it, unless if you have a good reason. About your list: getIntB getIntBU getIntL getIntLU getIntRB getIntRBU getIntRL getIntRLU getLongB getLongBU getLongL getLongLU getLongRB getLongRBU getLongRL getLongRLU -> Obviously a "MergeStore" optimization does not work for loads. But if it is important, then maybe we could generalize the optimizations from stores to loads. putChars4UC -> Does the putChars4UC get inlined? Because here you are storing 4 variables, this pattern is not handled by MergeStore. setIntB -> You say that here MergeStore does not work. That is because the indices are increasing, but the shifts decreasing. So that does not work on little-endian machines (most architectures), but I would expect it to work on big-endian machines with https://github.com/openjdk/jdk/pull/19218. setIntBU -> order seems messed up setIntRB setIntRBU setLongB setLongBU setLongRB setLongRBU -> I leave the rest for you to investigate. test/micro/org/openjdk/bench/java/lang/MergeStoreBench.java line 2: > 1: /* > 2: * Copyright (c) 2014, 2024, Oracle and/or its affiliates. All rights > reserved. I think for a new file, you don't want to backdate the copyright to `2014` ;) test/micro/org/openjdk/bench/java/lang/MergeStoreBench.java line 57: > 55: HI_BYTE_SHIFT = 0; > 56: LO_BYTE_SHIFT = 8; > 57: } I wonder if it would just be better to duplicate the tests. Because we may extend the `MergeStore` optimization such that it allows either order, and just adds a `Reverse` operations on the bytes/chars/ints. Then it would be nice to benchmark both orders on both little and big endian architectures. What do you think? test/micro/org/openjdk/bench/java/lang/MergeStoreBench.java line 656: > 654: array[offset + 2] = (byte) (value >> 8); > 655: array[offset + 3] = (byte) (value ); > 656: } You say that here `MergeStore` does not work. That is because the indices are increasing, but the shifts decreasing. So that does not work on little-endian machines (most architectures), but I would expect it to work on big-endian machines with https://github.com/openjdk/jdk/pull/19218. test/micro/org/openjdk/bench/java/lang/MergeStoreBench.java line 664: > 662: UNSAFE.putByte(array, address + 2, (byte) (value >> 16)); > 663: UNSAFE.putByte(array, address + 3, (byte) (value )); > 664: } Order messed up? test/micro/org/openjdk/bench/java/lang/MergeStoreBench.java line 850: > 848: UNSAFE.putChar(array, address + 4, c2); > 849: UNSAFE.putChar(array, address + 6, c3); > 850: } Does the `putChars4UC` get inlined? Because here you are storing 4 variables, this pattern is not handled by `MergeStore`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19734#issuecomment-2172548153 PR Review Comment: https://git.openjdk.org/jdk/pull/19734#discussion_r1642327120 PR Review Comment: https://git.openjdk.org/jdk/pull/19734#discussion_r1642329554 PR Review Comment: https://git.openjdk.org/jdk/pull/19734#discussion_r1642341934 PR Review Comment: https://git.openjdk.org/jdk/pull/19734#discussion_r1642350184 PR Review Comment: https://git.openjdk.org/jdk/pull/19734#discussion_r1642343781