On Fri, 7 Apr 2023 18:04:16 GMT, Quan Anh Mai <qa...@openjdk.org> wrote:

>> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractShuffle.java
>>  line 96:
>> 
>>> 94:         }
>>> 95:         Vector<?> shufvec = this.toBitsVector();
>>> 96:         VectorMask<?> vecmask = shufvec.compare(VectorOperators.LT, 0);
>> 
>> This may impact the intrinsification over AVX1 targets for floating point 
>> shuffles. Since bits vector is an integral vector and AVX1 does support 32 
>> byte floats but not 32 byte integral vectors.
>
> Yes I think it is a drawback of this approach, however currently we do not 
> support shuffling for 256-bit vectors on AVX1 machines either, and AVX1 seems 
> to be a special case in this regard. This species of float and double may 
> also be less common in the usage of Vector API since it is larger than 
> SPECIES_PREFERRED.

Hi @merykitty , Agree with you that SPECIES_PREFERRED is preferred for vector 
algorithms intercepting both integral and floating point vectors.

FTR, we see a perf regression with Float256 based micro now on AVX=1 targets,


  public static short micro() {
     VectorShuffle<Float> iota = FloatVector.SPECIES_256.iotaShuffle(0, 1, 
true);
     return 
iota.cast(ShortVector.SPECIES_128).toVector().reinterpretAsShorts().lane(1);
  }

CPROMPT>javad --add-modules=jdk.incubator.vector -XX:UseAVX=1 
-XX:+PrintIntrinsics -XX:CompileCommand=compileonly,shufflef::micro -cp . 
shufflef
CompileCommand: compileonly shufflef.micro bool compileonly = true
  ** not supported: arity=1 op=reinterpret/1 vlen1=8 etype1=int ismask=0
  ** not supported: arity=1 op=cast/1 vlen1=8 etype1=int ismask=0
                                    @ 17   java.lang.Object::getClass (0 bytes) 
  (intrinsic)
                                    @ 24   java.lang.Object::getClass (0 bytes) 
  (intrinsic)
                                    @ 45   
jdk.internal.vm.vector.VectorSupport::convert (36 bytes)   failed to inline 
(intrinsic)
                                  @ 34   java.lang.Object::getClass (0 bytes)   
(intrinsic)
                                  @ 54   
jdk.internal.vm.vector.VectorSupport::convert (36 bytes)   failed to inline 
(intrinsic)
                                    @ 17   java.lang.Object::getClass (0 bytes) 
  (intrinsic)
                                    @ 24   java.lang.Object::getClass (0 bytes) 
  (intrinsic)
                                    @ 45   
jdk.internal.vm.vector.VectorSupport::convert (36 bytes)   (intrinsic)
                                      @ 292   java.lang.Object::getClass (0 
bytes)   (intrinsic)
                                      @ 298   java.lang.Object::getClass (0 
bytes)   (intrinsic)
                                      @ 322   
jdk.internal.vm.vector.VectorSupport::convert (36 bytes)   (intrinsic)
                                      @ 292   java.lang.Object::getClass (0 
bytes)   (intrinsic)
                                      @ 298   java.lang.Object::getClass (0 
bytes)   (intrinsic)
                                      @ 322   
jdk.internal.vm.vector.VectorSupport::convert (36 bytes)   (intrinsic)
                                @ 16   
jdk.internal.vm.vector.VectorSupport::extract (35 bytes)   (intrinsic)
[time] 386ms  [res]3392
CPROMPT>export JAVA_HOME=/home/jatinbha/softwares/jdk-20/
CPROMPT>export PATH=$JAVA_HOME/bin:$PATH
CPROMPT>javad --add-modules=jdk.incubator.vector -XX:UseAVX=1 
-XX:+PrintIntrinsics -XX:CompileCommand=compileonly,shufflef::micro -cp . 
shufflef
CompileCommand: compileonly shufflef.micro bool compileonly = true
WARNING: Using incubator modules: jdk.incubator.vector
                                      @ 3   jdk.internal.misc.Unsafe::loadFence 
(5 bytes)   (intrinsic)
                                        @ 3   
jdk.internal.misc.Unsafe::loadFence (5 bytes)   (intrinsic)
                                @ 17   
jdk.internal.vm.vector.VectorSupport::shuffleToVector (33 bytes)   (intrinsic)
                                      @ 292   java.lang.Object::getClass (0 
bytes)   (intrinsic)
                                      @ 298   java.lang.Object::getClass (0 
bytes)   (intrinsic)
                                      @ 322   
jdk.internal.vm.vector.VectorSupport::convert (36 bytes)   (intrinsic)
                                @ 16   
jdk.internal.vm.vector.VectorSupport::extract (35 bytes)   (intrinsic)
[time] 7ms  [res]3392

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1161810585

Reply via email to