On Fri, 7 Apr 2023 18:04:16 GMT, Quan Anh Mai <qa...@openjdk.org> wrote:
>> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractShuffle.java >> line 96: >> >>> 94: } >>> 95: Vector<?> shufvec = this.toBitsVector(); >>> 96: VectorMask<?> vecmask = shufvec.compare(VectorOperators.LT, 0); >> >> This may impact the intrinsification over AVX1 targets for floating point >> shuffles. Since bits vector is an integral vector and AVX1 does support 32 >> byte floats but not 32 byte integral vectors. > > Yes I think it is a drawback of this approach, however currently we do not > support shuffling for 256-bit vectors on AVX1 machines either, and AVX1 seems > to be a special case in this regard. This species of float and double may > also be less common in the usage of Vector API since it is larger than > SPECIES_PREFERRED. Hi @merykitty , Agree with you that SPECIES_PREFERRED is preferred for vector algorithms intercepting both integral and floating point vectors. FTR, we see a perf regression with Float256 based micro now on AVX=1 targets, public static short micro() { VectorShuffle<Float> iota = FloatVector.SPECIES_256.iotaShuffle(0, 1, true); return iota.cast(ShortVector.SPECIES_128).toVector().reinterpretAsShorts().lane(1); } CPROMPT>javad --add-modules=jdk.incubator.vector -XX:UseAVX=1 -XX:+PrintIntrinsics -XX:CompileCommand=compileonly,shufflef::micro -cp . shufflef CompileCommand: compileonly shufflef.micro bool compileonly = true ** not supported: arity=1 op=reinterpret/1 vlen1=8 etype1=int ismask=0 ** not supported: arity=1 op=cast/1 vlen1=8 etype1=int ismask=0 @ 17 java.lang.Object::getClass (0 bytes) (intrinsic) @ 24 java.lang.Object::getClass (0 bytes) (intrinsic) @ 45 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) failed to inline (intrinsic) @ 34 java.lang.Object::getClass (0 bytes) (intrinsic) @ 54 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) failed to inline (intrinsic) @ 17 java.lang.Object::getClass (0 bytes) (intrinsic) @ 24 java.lang.Object::getClass (0 bytes) (intrinsic) @ 45 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) (intrinsic) @ 292 java.lang.Object::getClass (0 bytes) (intrinsic) @ 298 java.lang.Object::getClass (0 bytes) (intrinsic) @ 322 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) (intrinsic) @ 292 java.lang.Object::getClass (0 bytes) (intrinsic) @ 298 java.lang.Object::getClass (0 bytes) (intrinsic) @ 322 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) (intrinsic) @ 16 jdk.internal.vm.vector.VectorSupport::extract (35 bytes) (intrinsic) [time] 386ms [res]3392 CPROMPT>export JAVA_HOME=/home/jatinbha/softwares/jdk-20/ CPROMPT>export PATH=$JAVA_HOME/bin:$PATH CPROMPT>javad --add-modules=jdk.incubator.vector -XX:UseAVX=1 -XX:+PrintIntrinsics -XX:CompileCommand=compileonly,shufflef::micro -cp . shufflef CompileCommand: compileonly shufflef.micro bool compileonly = true WARNING: Using incubator modules: jdk.incubator.vector @ 3 jdk.internal.misc.Unsafe::loadFence (5 bytes) (intrinsic) @ 3 jdk.internal.misc.Unsafe::loadFence (5 bytes) (intrinsic) @ 17 jdk.internal.vm.vector.VectorSupport::shuffleToVector (33 bytes) (intrinsic) @ 292 java.lang.Object::getClass (0 bytes) (intrinsic) @ 298 java.lang.Object::getClass (0 bytes) (intrinsic) @ 322 jdk.internal.vm.vector.VectorSupport::convert (36 bytes) (intrinsic) @ 16 jdk.internal.vm.vector.VectorSupport::extract (35 bytes) (intrinsic) [time] 7ms [res]3392 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1161810585