Re: RFR: 8338023: Support two vector selectFrom API [v10]

Paul Sandoz Mon, 16 Sep 2024 14:21:53 -0700

On Mon, 16 Sep 2024 02:58:41 GMT, Jatin Bhateja <[email protected]> wrote:


>> Hi All,
>> 
>> As per the discussion on panama-dev mailing list[1], patch adds the support 
>> for following new two vector permutation APIs.
>> 
>> 
>> Declaration:-
>>     Vector<E>.selectFrom(Vector<E> v1, Vector<E> v2)
>> 
>> 
>> Semantics:-
>>     Using index values stored in the lanes of "this" vector, assemble the 
>> values stored in first (v1) and second (v2) vector arguments. Thus, first 
>> and second vector serves as a table, whose elements are selected based on 
>> index value vector. API is applicable to all integral and floating-point 
>> types.  The result of this operation is semantically equivalent to 
>> expression v1.rearrange(this.toShuffle(), v2). Values held in index vector 
>> lanes must lie within valid two vector index range [0, 2*VLEN) else an 
>> IndexOutOfBoundException is thrown.  
>> 
>> Summary of changes:
>> -  Java side implementation of new selectFrom API.
>> -  C2 compiler IR and inline expander changes.
>> -  In absence of direct two vector permutation instruction in target ISA, a 
>> lowering transformation dismantles new IR into constituent IR supported by 
>> target platforms. 
>> -  Optimized x86 backend implementation for AVX512 and legacy target.
>> -  Function tests covering new API.
>> 
>> JMH micro included with this patch shows around 10-15x gain over existing 
>> rearrange API :-
>> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server]
>> 
>> 
>>   Benchmark                                     (size)   Mode  Cnt      
>> Score   Error   Units
>> SelectFromBenchmark.rearrangeFromByteVector     1024  thrpt    2   2041.762  
>>         ops/ms
>> SelectFromBenchmark.rearrangeFromByteVector     2048  thrpt    2   1028.550  
>>         ops/ms
>> SelectFromBenchmark.rearrangeFromIntVector      1024  thrpt    2    962.605  
>>         ops/ms
>> SelectFromBenchmark.rearrangeFromIntVector      2048  thrpt    2    479.004  
>>         ops/ms
>> SelectFromBenchmark.rearrangeFromLongVector     1024  thrpt    2    359.758  
>>         ops/ms
>> SelectFromBenchmark.rearrangeFromLongVector     2048  thrpt    2    178.192  
>>         ops/ms
>> SelectFromBenchmark.rearrangeFromShortVector    1024  thrpt    2   1463.459  
>>         ops/ms
>> SelectFromBenchmark.rearrangeFromShortVector    2048  thrpt    2    727.556  
>>         ops/ms
>> SelectFromBenchmark.selectFromByteVector        1024  thrpt    2  33254.830  
>>         ops/ms
>> SelectFromBenchmark.selectFromByteVector        2048  thrpt    2  17313.174  
>>         ops/ms
>> SelectFromBenchmark.selectFromIntVector         1024  thrpt    2  10756.804  
>>         ops/ms
>> S...
>
> Jatin Bhateja has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Disabling VectorLoadShuffle bypassing optimization to comply with rearrange 
> semantics at IR level.

src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template
 line 2970:

> 2968: 
> 2969: 
> 2970:     /*package-private*/

I think we can simplify with:

    /*package-private*/
    @ForceInline
    final $abstractvectortype$ selectFromTemplate(Class<? extends 
Vector<$Boxbitstype$>> indexVecClass,
                                                  $abstractvectortype$ v1, 
$abstractvectortype$ v2) {
        int twoVectorLenMask = (length() << 1) - 1;
#if[FP]
        Vector<$Boxbitstype$> wrapped_indexes = 
this.convert(VectorOperators.{#if[intOrFloat]?F2I:D2L}, 0)
                                                   
.lanewise(VectorOperators.AND, twoVectorLenMask);
        return VectorSupport.selectFromTwoVectorOp(getClass(), indexVecClass , 
$type$.class, $bitstype$.class,
                                                   length(), wrapped_indexes, 
v1, v2,
                                                   (vec1, vec2, vec3) -> 
selectFromTwoVectorHelper(vec1, vec2, vec3)
        );
#else[FP]
        $abstractvectortype$ wrapped_indexes = 
this.lanewise(VectorOperators.AND, twoVectorLenMask);
        return VectorSupport.selectFromTwoVectorOp(getClass(), indexVecClass, 
$type$.class, $type$.class,
                                                   length(), wrapped_indexes, 
v1, v2,
                                                   (vec1, vec2, vec3) -> 
selectFromTwoVectorHelper(vec1, vec2, vec3)
        );
#end[FP]
    }

(Note that's without the assert - see separate comment).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1761977004

Re: RFR: 8338023: Support two vector selectFrom API [v10]

Reply via email to