On Fri, 30 Apr 2021 12:59:34 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:
>> Current VectorAPI Java side implementation expresses rotateLeft and >> rotateRight operation using following operations:- >> >> vec1 = lanewise(VectorOperators.LSHL, n) >> vec2 = lanewise(VectorOperators.LSHR, n) >> res = lanewise(VectorOperations.OR, vec1 , vec2) >> >> This patch moves above handling from Java side to C2 compiler which >> facilitates dismantling the rotate operation if target ISA does not support >> a direct rotate instruction. >> >> AVX512 added vector rotate instructions vpro[rl][v][dq] which operate over >> long and integer type vectors. For other cases (i.e. sub-word type vectors >> or for targets which do not support direct rotate operations ) instruction >> sequence comprising of vector SHIFT (LEFT/RIGHT) and vector OR is emitted. >> >> Please find below the performance data for included JMH benchmark. >> Machine: Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz (Cascadelake Server) >> >> `` >> Benchmark | (TESTSIZE) | Baseline (ops/min) | Withopt (ops/min) | Gain % >> -- | -- | -- | -- | -- >> RotateBenchmark.testRotateLeftI | 64 | 6813.033 | 6936.507 | 1.81 >> RotateBenchmark.testRotateLeftI | 64 | 11549.524 | 12109.845 | 4.85 >> RotateBenchmark.testRotateLeftI | 64 | 17780.166 | 18180.083 | 2.25 >> RotateBenchmark.testRotateLeftI | 128 | 3355.631 | 3426.436 | 2.11 >> RotateBenchmark.testRotateLeftI | 128 | 5925.534 | 6096.71 | 2.89 >> RotateBenchmark.testRotateLeftI | 128 | 8847.645 | 8964.224 | 1.32 >> RotateBenchmark.testRotateLeftI | 256 | 1704.192 | 1779.964 | 4.45 >> RotateBenchmark.testRotateLeftI | 256 | 2919.158 | 3000.392 | 2.78 >> RotateBenchmark.testRotateLeftI | 256 | 4386.134 | 4514.211 | 2.92 >> RotateBenchmark.testRotateLeftL | 64 | 3419.193 | 3500.522 | 2.38 >> RotateBenchmark.testRotateLeftL | 64 | 5982.87 | 6056.589 | 1.23 >> RotateBenchmark.testRotateLeftL | 64 | 8829.172 | 8949.157 | 1.36 >> RotateBenchmark.testRotateLeftL | 128 | 1684.745 | 1784.49 | 5.92 >> RotateBenchmark.testRotateLeftL | 128 | 2942.226 | 2947.684 | 0.19 >> RotateBenchmark.testRotateLeftL | 128 | 4385.488 | 4554.113 | 3.85 >> RotateBenchmark.testRotateLeftL | 256 | 834.195 | 856.939 | 2.73 >> RotateBenchmark.testRotateLeftL | 256 | 1463.802 | 1551.051 | 5.96 >> RotateBenchmark.testRotateLeftL | 256 | 2187.005 | 2268.596 | 3.73 >> RotateBenchmark.testRotateRightI | 64 | 6676.132 | 7077.587 | 6.01 >> RotateBenchmark.testRotateRightI | 64 | 11452.825 | 11855.45 | 3.52 >> RotateBenchmark.testRotateRightI | 64 | 17507.286 | 18164.757 | 3.76 >> RotateBenchmark.testRotateRightI | 128 | 3412.394 | 3519.829 | 3.15 >> RotateBenchmark.testRotateRightI | 128 | 5905.677 | 5875.698 | -0.51 >> RotateBenchmark.testRotateRightI | 128 | 8745.95 | 8890.757 | 1.66 >> RotateBenchmark.testRotateRightI | 256 | 1681.176 | 1708.54 | 1.63 >> RotateBenchmark.testRotateRightI | 256 | 3004.008 | 3005.606 | 0.05 >> RotateBenchmark.testRotateRightI | 256 | 4466.633 | 4548.232 | 1.83 >> RotateBenchmark.testRotateRightL | 64 | 3361.499 | 3461.121 | 2.96 >> RotateBenchmark.testRotateRightL | 64 | 5873.37 | 6072.209 | 3.39 >> RotateBenchmark.testRotateRightL | 64 | 8820.284 | 9016.172 | 2.22 >> RotateBenchmark.testRotateRightL | 128 | 1715.556 | 1720.67 | 0.30 >> RotateBenchmark.testRotateRightL | 128 | 2957.337 | 3149.193 | 6.49 >> RotateBenchmark.testRotateRightL | 128 | 4440.468 | 4473.221 | 0.74 >> RotateBenchmark.testRotateRightL | 256 | 851.391 | 875.371 | 2.82 >> RotateBenchmark.testRotateRightL | 256 | 1452.422 | 1522.942 | 4.86 >> RotateBenchmark.testRotateRightL | 256 | 2208.454 | 2263.349 | 2.49 >> >> `` > > Jatin Bhateja has updated the pull request with a new target base due to a > merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains four additional > commits since the last revision: > > - 8266054: Review comments resolution. > - Merge http://github.com/openjdk/jdk into JDK-8266054 > - 8266054: Changing gen-src.sh file permissions > - 8266054: VectorAPI rotate operation optimization src/hotspot/cpu/x86/x86.ad line 1652: > 1650: case Op_RotateRightV: > 1651: case Op_RotateLeftV: > 1652: if (is_subword_type(bt)) { Does that have the effect of not intrinsifying for `byte` or `short`? src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 728: > 726: case VECTOR_OP_URSHIFT: return (v0, v1) -> > 727: v0.bOp(v1, (i, a, n) -> ($type$)((a & > LSHR_SETUP_MASK) >>> n)); > 728: #if[long] I recommend you create new methods in `IntVector` etc called `rotateLeft` and `rotateRight` that do what is in the lambda expressions. Then you can collapse this to non-conditional cases calling those methods. Do the same for the tests (like i did with the unsigned support), see https://github.com/openjdk/jdk/blob/a31777959abc7cac1bf6d2a453d6fc15b628d866/test/jdk/jdk/incubator/vector/templates/Unit-header.template#L1429 and https://github.com/openjdk/jdk/blob/a31777959abc7cac1bf6d2a453d6fc15b628d866/test/jdk/jdk/incubator/vector/gen-template.sh#L491 That will avoid the embedding of complex expressions. test/micro/org/openjdk/bench/jdk/incubator/vector/RotateBenchmark.java line 37: > 35: > 36: @Param({"64","128","256"}) > 37: public int TESTSIZE; Suggestion: int size; Lower case for instance field names (same applies to the species). No need for `public`. test/micro/org/openjdk/bench/jdk/incubator/vector/RotateBenchmark.java line 55: > 53: > 54: public final long[] specialValsL = {0L, -0L, Long.MIN_VALUE, > Long.MAX_VALUE}; > 55: public final int[] specialValsI = {0, -0, Integer.MIN_VALUE, > Integer.MAX_VALUE}; Suggestion: static final long[] specialValsL = {0L, -0L, Long.MIN_VALUE, Long.MAX_VALUE}; static final int[] specialValsI = {0, -0, Integer.MIN_VALUE, Integer.MAX_VALUE}; test/micro/org/openjdk/bench/jdk/incubator/vector/RotateBenchmark.java line 83: > 81: @Benchmark > 82: public void testRotateLeftI(Blackhole bh) { > 83: for(int i = 0 ; i < 10000; i++) { No need for the outer loop. JMH will do that for you. test/micro/org/openjdk/bench/jdk/incubator/vector/RotateBenchmark.java line 85: > 83: for(int i = 0 ; i < 10000; i++) { > 84: for (int j = 0 ; j < TESTSIZE; j+= ISPECIES.length()) { > 85: vecI = IntVector.fromArray(ISPECIES, inpI, j); Suggestion: var vecI = IntVector.fromArray(ISPECIES, inpI, j); Use a local variable instead of storing to a field test/micro/org/openjdk/bench/jdk/incubator/vector/RotateBenchmark.java line 91: > 89: vecI = vecI.lanewise(VectorOperators.ROL, i); > 90: vecI.lanewise(VectorOperators.ROL, i).intoArray(resI, j); > 91: } Suggestion: } return resI; return the result to better ensure HotSpot does not detect the result is unused. ------------- PR: https://git.openjdk.java.net/jdk/pull/3720