On Mon, 25 Aug 2025 07:13:43 GMT, Galder Zamarreño <[email protected]> wrote:

>> I've added support to vectorize `MoveD2L`, `MoveL2D`, `MoveF2I` and 
>> `MoveI2F` nodes. The implementation follows a similar pattern to what is 
>> done with conversion (`Conv*`) nodes. The tests in 
>> `TestCompatibleUseDefTypeSize` have been updated with the new expectations.
>> 
>> Also added a JMH benchmark which measures throughput (the higher the number 
>> the better) for methods that exercise these nodes. On darwin/aarch64 it 
>> shows:
>> 
>> 
>> Benchmark                                (seed)  (size)   Mode  Cnt      
>> Base      Patch   Units   Diff
>> VectorBitConversion.doubleToLongBits          0    2048  thrpt    8  
>> 1168.782   1157.717  ops/ms    -1%
>> VectorBitConversion.doubleToRawLongBits       0    2048  thrpt    8  
>> 3999.387   7353.936  ops/ms   +83%
>> VectorBitConversion.floatToIntBits            0    2048  thrpt    8  
>> 1200.338   1188.206  ops/ms    -1%
>> VectorBitConversion.floatToRawIntBits         0    2048  thrpt    8  
>> 4058.248  14792.474  ops/ms  +264%
>> VectorBitConversion.intBitsToFloat            0    2048  thrpt    8  
>> 3050.313  14984.246  ops/ms  +391%
>> VectorBitConversion.longBitsToDouble          0    2048  thrpt    8  
>> 3022.691   7379.360  ops/ms  +144%
>> 
>> 
>> The improvements observed are a result of vectorization. The lack of 
>> vectorization in `doubleToLongBits` and `floatToIntBits` demonstrates that 
>> these changes do not affect their performance. These methods do not 
>> vectorize because of flow control.
>> 
>> I've run the tier1-3 tests on linux/aarch64 and didn't observe any 
>> regressions.
>
> Galder Zamarreño has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains 22 additional 
> commits since the last revision:
> 
>  - Merge branch 'master' into topic.fp-bits-vector
>  - Add more IR node positive assertions
>  - Fix source of data for benchmarks
>  - Refactor benchmarks to TypeVectorOperations
>  - Check at the very least that auto vectorization is supported
>  - Avoid VectorReinterpret::implemented
>  - Refactor and add copyright header
>  - Rephrase comment
>  - Removed unnecessary assert methods
>  - Adjust IR test after adding Move* vector support
>  - ... and 12 more: https://git.openjdk.org/jdk/compare/fc6e0b6f...e7e4d801

test/hotspot/jtreg/compiler/loopopts/superword/TestCompatibleUseDefTypeSize.java
 line 460:

> 458:     @IR(counts = {IRNode.LOAD_VECTOR_L, "> 0",
> 459:                   IRNode.STORE_VECTOR, "> 0",
> 460:                   IRNode.VECTOR_REINTERPRET, "> 0"},

Ah, I just saw that `VECTOR_REINTERPRET` is no `vectorNode`, so we don't check 
the size for it. Would it have a type and size though?

If so, we could consider making it more precise, like all the vector casts.
Would be a little bit of work, but it would make the rules more precise.
Could also be a separate RFE.


  2458     public static final String VECTOR_REINTERPRET = PREFIX + 
"VECTOR_REINTERPRET" + POSTFIX;
  2459     static {
  2460         beforeMatchingNameRegex(VECTOR_REINTERPRET, 
"VectorReinterpret");                                                           
                                                                                
                                                                                
                                                                                
                                                 
  2461     }
  2462 
  2463     public static final String VECTOR_UCAST_B2S = VECTOR_PREFIX + 
"VECTOR_UCAST_B2S" + POSTFIX;
  2464     static {
  2465         vectorNode(VECTOR_UCAST_B2S, "VectorUCastB2X", TYPE_SHORT);
  2466     }


Depending on the dump, it may not be so easy though. Not sure.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26457#discussion_r2313298675

Reply via email to