"`VectorSupport.indexVector()`" is used to compute a vector that contains the 
index values based on a given vector and a scale value (`i.e. index = vec + 
iota * scale`). This function is widely used in other APIs like 
"`VectorMask.indexInRange`" which is useful to the tail loop vectorization. And 
it can be easily implemented with the vector instructions.

This patch adds the vector intrinsic implementation of it. The steps are:

  1) Load the const "iota" vector.

  We extend the "`vector_iota_indices`" stubs from byte to other integral 
types. For floating point vectors, it needs an additional vector cast to get 
the right iota values.

  2) Compute indexes with "`vec + iota * scale`"

Here is the performance result to the new added micro benchmark on ARM NEON:

Benchmark                              Gain
IndexVectorBenchmark.byteIndexVector   1.477
IndexVectorBenchmark.doubleIndexVector 5.031
IndexVectorBenchmark.floatIndexVector  5.342
IndexVectorBenchmark.intIndexVector    5.529
IndexVectorBenchmark.longIndexVector   3.177
IndexVectorBenchmark.shortIndexVector  5.841


Please help to review and share the feedback! Thanks in advance!

-------------

Commit messages:
 - 8293409: [vectorapi] Intrinsify VectorSupport.indexVector

Changes: https://git.openjdk.org/jdk/pull/10332/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10332&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8293409
  Stats: 358 lines in 14 files changed: 328 ins; 6 del; 24 mod
  Patch: https://git.openjdk.org/jdk/pull/10332.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/10332/head:pull/10332

PR: https://git.openjdk.org/jdk/pull/10332

Reply via email to