On Thu, 29 Aug 2024 05:42:58 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:
>> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support >> for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector<E>.selectFrom(Vector<E> v1, Vector<E> v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the >> values stored in first (v1) and second (v2) vector arguments. Thus, first >> and second vector serves as a table, whose elements are selected based on >> index value vector. API is applicable to all integral and floating-point >> types. The result of this operation is semantically equivalent to >> expression v1.rearrange(this.toShuffle(), v2). Values held in index vector >> lanes must lie within valid two vector index range [0, 2*VLEN) else an >> IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a >> lowering transformation dismantles new IR into constituent IR supported by >> target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing >> rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt >> Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 >> ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 >> ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 >> ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 >> ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 >> ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 >> ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 >> ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 >> ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 >> ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 >> ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 >> ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional > commit since the last revision: > > Adding descriptive comments Ok, I left a few more comments. Generally, this looks like a nice feature, thanks for implementing it @jatin-bhateja ! 😊 A few issues with code style (camelCase vs snake_case). I'm also wondering about good naming. Why did we/you chose "select" for this? Why not "shuffle"? Does "select" not often get used as synonym of "blend", which has different semantics? Also: I'm a little worried about the semantics change of the RearrangeNode that you did with the changes in `RearrangeNode::Ideal`. It looks a little "hacky", especially in conjunction with the `vector_indexes_needs_massaging` method. Can you give a clear definition of the semantics of `RearrangeNode` and `vector_indexes_needs_massaging`, please? I also added some control questions for testing. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 6446: > 6444: } > 6445: > 6446: void C2_MacroAssembler::select_from_two_vector_evex(BasicType elem_bt, > XMMRegister dst, XMMRegister src1, I also wonder if you could use the plural in these cases? You are selecting from two vectors, with the plural "s". Of course it is a bit annoying if you would have to name the IR node `SelectFromTwoVectors`, because we usually name the vector nodes `...Vector`, without the plural "s". src/hotspot/share/opto/library_call.cpp line 749: > 747: return inline_vector_compress_expand(); > 748: case vmIntrinsics::_VectorSelectFromTwoVectorOp: > 749: return inline_vector_select_from_two_vectors(); Interesting, here you use the correct plural "vectors". src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 544: > 542: byte[] vpayload1 = ((ByteVector)v1).vec(); > 543: byte[] vpayload2 = ((ByteVector)v2).vec(); > 544: byte[] vpayload3 = ((ByteVector)v3).vec(); Is there a reason you are not using more descriptive names here instead of `vpayload1`? I also wonder if the `selectFromHelper` should not be named more specifically: `selectFromTwoVector(s)Helper`? src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 2595: > 2593: @ForceInline > 2594: final ByteVector selectFromTemplate(ByteVector v1, ByteVector v2) { > 2595: int twovectorlen = length() * 2; `twovectorlen` -> `twoVectorLen` I think in Java we are supposed to use camelCase src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Vector.java line 2770: > 2768: > 2769: /** > 2770: * Rearranges the lane elements of two vectors, selecting lanes I have a bit of a name concern here. Why are we calling it "select" and not "rearrange"? Because for a single "from" vector we also call it "rearrange", right? Is "select" not often synonymous to "blend", which works also with two "from" vectors, but with a mask and not indexing for "selection/rearranging"? test/jdk/jdk/incubator/vector/Byte128VectorTests.java line 324: > 322: boolean is_exceptional_idx = (int)order[idx] >= > vector_len; > 323: int oidx = is_exceptional_idx ? ((int)order[idx] - > vector_len) : (int)order[idx]; > 324: Assert.assertEquals(r[idx], (is_exceptional_idx ? > b[i + oidx] : a[i + oidx])); I thought general Java style is camelCase? Is that not followed in the VectorAPI code? test/jdk/jdk/incubator/vector/ShortMaxVectorTests.java line 1048: > 1046: return > SHORT_GENERATOR_SELECT_FROM_TRIPLES.stream().map(List::toArray). > 1047: toArray(Object[][]::new); > 1048: } Just a control question: does this also occasionally generate examples with out-of-bounds indices? Negative out of bounds and positive out of bounds? test/jdk/jdk/incubator/vector/ShortMaxVectorTests.java line 5812: > 5810: ShortVector bv = ShortVector.fromArray(SPECIES, b, i); > 5811: ShortVector idxv = ShortVector.fromArray(SPECIES, idx, > i); > 5812: idxv.selectFrom(av, bv).intoArray(r, i); Would this test catch a bug where the backend would generate vectors that are too long or too short? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20508#pullrequestreview-2276944129 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741766060 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741773766 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741914524 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741911809 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741919025 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741920940 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741947885 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741949290