On Mon, 1 Sep 2025 08:17:08 GMT, Galder ZamarreƱo <[email protected]> wrote:

>> @galderz I got a failure  in out testing:
>> 
>> With VM flag: `-XX:UseAVX=1`.
>> 
>> 
>> Failed IR Rules (2) of Methods (2)
>> ----------------------------------
>> 1) Method "static java.lang.Object[] 
>> compiler.loopopts.superword.TestCompatibleUseDefTypeSize.test6(int[],float[])"
>>  - [Failed IR rules: 1]:
>>    * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, 
>> applyIfPlatformAnd={}, applyIfCPUFeatureOr={"sse4.1", "true", "asimd", 
>> "true", "rvv", "true"}, counts={"_#V#LOAD_VECTOR_F#_", "> 0", 
>> "_#STORE_VECTOR#_", "> 0", "_#VECTOR_REINTERPRET#_", "> 0"}, 
>> applyIfPlatformOr={}, applyIfPlatform={"64-bit", "true"}, failOn={}, 
>> applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, 
>> applyIfAnd={}, applyIfNot={})"
>>      > Phase "PrintIdeal":
>>        - counts: Graph contains wrong number of nodes:
>>          * Constraint 1: 
>> "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]<F,8>)"
>>            - Failed comparison: [found] 0 > 0 [given]
>>            - No nodes matched!
>> 
>> 2) Method "static java.lang.Object[] 
>> compiler.loopopts.superword.TestCompatibleUseDefTypeSize.test9(long[],double[])"
>>  - [Failed IR rules: 1]:
>>    * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, 
>> applyIfPlatformAnd={}, applyIfCPUFeatureOr={"sse4.1", "true", "asimd", 
>> "true", "rvv", "true"}, counts={"_#V#LOAD_VECTOR_D#_", "> 0", 
>> "_#STORE_VECTOR#_", "> 0", "_#VECTOR_REINTERPRET#_", "> 0"}, 
>> applyIfPlatformOr={}, applyIfPlatform={"64-bit", "true"}, failOn={}, 
>> applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, 
>> applyIfAnd={}, applyIfNot={})"
>>      > Phase "PrintIdeal":
>>        - counts: Graph contains wrong number of nodes:
>>          * Constraint 1: 
>> "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]<D,4>)"
>>            - Failed comparison: [found] 0 > 0 [given]
>>            - No nodes matched!
>> 
>> 
>> I suspect that `test6` with `floatToRawIntBits` and `test9` with 
>> `doubleToRawLongBits` are only supported with `AVX2`. Question is if that is 
>> really supposed to be like that, or if we should even file an RFE to extend 
>> support for `AVX1` and lower.
>> 
>> Can you find out why we don't vectorize with `AVX1` here?
>
>> Can you find out why we don't vectorize with AVX1 here?
> 
> This was a fun little rabbit hole. The explanation below is for `test6` but I 
> think the same logic applies to `test9`:
> 
> The problem comes from the IR node definition, what JTreg does with that, and 
> the what HotSpot code actually does.
> 
> The annotation definition is:
> 
>     @IR(counts = {IRNode.LOAD_VECTOR_F, "> 0",
> 
> 
> So JTreg assumes that the regex should match a vector size of 8. With 
> `UseAVX=1` and floats, `IRNode.getMaxElementsForTypeOnX86` returns 8 and so 
> that's how the constraint is set:
> 
> 
>          * Constraint 1: 
> "(\d+(\s){2}(LoadVector.*)+(\s){2}===.*vector[A-Za-z]<F,8>)"
> 
> 
> But the issue is that at runtime the vector size is 4:
> 
>   844  LoadVector  === ... #vectorx<F,4>
> 
> 
> HotSpot logic is more nuanced, with the key being what happens in 
> `SuperWord::unrolling_analysis`. The thing that JTreg doesn't know is that 
> there are 2 types involved in the loop, float **and** int:
> 
> 
>         for (int i = 0; i < a.length; i++) {
>             a[i] = Float.floatToRawIntBits(b[i]);
>         }
> 
> 
> With `UseAVX=1`, the max vector size for floats is 8, but for ints is 4. So 
> the JVM picks the minimum value and uses that. Hence that is how unrolling is 
> 4... all the way to the load vector size which is 4.
> 
> IMO the right thing to do would be to fix the annotation to be:
> 
> 
>     @IR(counts = {IRNode.LOAD_VECTOR_F, IRNode.VECTOR_SIZE_4, "> 0",
> 
> 
> And explain it in javadoc why the expected size is 4.
> 
> The same with `test9`
> 
> WDYT @eme64?

@galderz Ah, maybe we just need to do it like here then:
`test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java:192:50:
        counts = {IRNode.VECTOR_CAST_I2F, IRNode.VECTOR_SIZE + "min(max_int, 
max_float)", ">0"})`

When doing cast/reinterpret/move between types this always happens ;)

I think this should generalize over all platforms.

Does that work?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26457#issuecomment-3241438142

Reply via email to