On Mon, 1 Sep 2025 08:17:08 GMT, Galder ZamarreƱo <[email protected]> wrote:
>> @galderz I got a failure in out testing:
>>
>> With VM flag: `-XX:UseAVX=1`.
>>
>>
>> Failed IR Rules (2) of Methods (2)
>> ----------------------------------
>> 1) Method "static java.lang.Object[]
>> compiler.loopopts.superword.TestCompatibleUseDefTypeSize.test6(int[],float[])"
>> - [Failed IR rules: 1]:
>> * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT},
>> applyIfPlatformAnd={}, applyIfCPUFeatureOr={"sse4.1", "true", "asimd",
>> "true", "rvv", "true"}, counts={"_#V#LOAD_VECTOR_F#_", "> 0",
>> "_#STORE_VECTOR#_", "> 0", "_#VECTOR_REINTERPRET#_", "> 0"},
>> applyIfPlatformOr={}, applyIfPlatform={"64-bit", "true"}, failOn={},
>> applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={},
>> applyIfAnd={}, applyIfNot={})"
>> > Phase "PrintIdeal":
>> - counts: Graph contains wrong number of nodes:
>> * Constraint 1:
>> "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]<F,8>)"
>> - Failed comparison: [found] 0 > 0 [given]
>> - No nodes matched!
>>
>> 2) Method "static java.lang.Object[]
>> compiler.loopopts.superword.TestCompatibleUseDefTypeSize.test9(long[],double[])"
>> - [Failed IR rules: 1]:
>> * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT},
>> applyIfPlatformAnd={}, applyIfCPUFeatureOr={"sse4.1", "true", "asimd",
>> "true", "rvv", "true"}, counts={"_#V#LOAD_VECTOR_D#_", "> 0",
>> "_#STORE_VECTOR#_", "> 0", "_#VECTOR_REINTERPRET#_", "> 0"},
>> applyIfPlatformOr={}, applyIfPlatform={"64-bit", "true"}, failOn={},
>> applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={},
>> applyIfAnd={}, applyIfNot={})"
>> > Phase "PrintIdeal":
>> - counts: Graph contains wrong number of nodes:
>> * Constraint 1:
>> "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]<D,4>)"
>> - Failed comparison: [found] 0 > 0 [given]
>> - No nodes matched!
>>
>>
>> I suspect that `test6` with `floatToRawIntBits` and `test9` with
>> `doubleToRawLongBits` are only supported with `AVX2`. Question is if that is
>> really supposed to be like that, or if we should even file an RFE to extend
>> support for `AVX1` and lower.
>>
>> Can you find out why we don't vectorize with `AVX1` here?
>
>> Can you find out why we don't vectorize with AVX1 here?
>
> This was a fun little rabbit hole. The explanation below is for `test6` but I
> think the same logic applies to `test9`:
>
> The problem comes from the IR node definition, what JTreg does with that, and
> the what HotSpot code actually does.
>
> The annotation definition is:
>
> @IR(counts = {IRNode.LOAD_VECTOR_F, "> 0",
>
>
> So JTreg assumes that the regex should match a vector size of 8. With
> `UseAVX=1` and floats, `IRNode.getMaxElementsForTypeOnX86` returns 8 and so
> that's how the constraint is set:
>
>
> * Constraint 1:
> "(\d+(\s){2}(LoadVector.*)+(\s){2}===.*vector[A-Za-z]<F,8>)"
>
>
> But the issue is that at runtime the vector size is 4:
>
> 844 LoadVector === ... #vectorx<F,4>
>
>
> HotSpot logic is more nuanced, with the key being what happens in
> `SuperWord::unrolling_analysis`. The thing that JTreg doesn't know is that
> there are 2 types involved in the loop, float **and** int:
>
>
> for (int i = 0; i < a.length; i++) {
> a[i] = Float.floatToRawIntBits(b[i]);
> }
>
>
> With `UseAVX=1`, the max vector size for floats is 8, but for ints is 4. So
> the JVM picks the minimum value and uses that. Hence that is how unrolling is
> 4... all the way to the load vector size which is 4.
>
> IMO the right thing to do would be to fix the annotation to be:
>
>
> @IR(counts = {IRNode.LOAD_VECTOR_F, IRNode.VECTOR_SIZE_4, "> 0",
>
>
> And explain it in javadoc why the expected size is 4.
>
> The same with `test9`
>
> WDYT @eme64?
@galderz Ah, maybe we just need to do it like here then:
`test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java:192:50:
counts = {IRNode.VECTOR_CAST_I2F, IRNode.VECTOR_SIZE + "min(max_int,
max_float)", ">0"})`
When doing cast/reinterpret/move between types this always happens ;)
I think this should generalize over all platforms.
Does that work?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/26457#issuecomment-3241438142