On Thu, 4 Mar 2021 16:47:30 GMT, Paul Sandoz <psan...@openjdk.org> wrote:
>> The hot path of VectorShuffle checkIndexes, wrapIndexes and laneIsValid >> methods can be implemented using Vector API methods. >> >> For the attached jmh TestSlice.java, performance improves as below. >> >> Before: >> Benchmark (size) Mode Cnt Score >> Error Units >> TestSlice.vectorSliceOrigin 1024 thrpt 5 1224.698 ± >> 53.825 ops/ms >> TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 657.895 ± >> 31.945 ops/ms >> >> After: >> Benchmark (size) Mode Cnt Score >> Error Units >> TestSlice.vectorSliceOrigin 1024 thrpt 5 11221.532 >> ± 88.616 ops/ms >> TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6509.519 >> ± 18.102 ops/ms > > Looks good, a nice incremental improvement. > > I suppose `checkIndexes` and `wrapIndexes` could call `laneIsValid`, and then > call `anyFalse` on the resulting mask. Dunno if that would affect the > generated code. Calling laneIsValid from checkIndexes and wrapIndexes would look like as below: VectorMask<E> vecmask = this.laneIsValid(); if (!vecmask.allTrue()) { I observe a small overhead (~2%) due to couple of extra instructions generated for allTrue. ------------- PR: https://git.openjdk.java.net/jdk/pull/2819