On Mon, 10 Apr 2023 15:16:59 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:
>> Yes I think it is a drawback of this approach, however currently we do not >> support shuffling for 256-bit vectors on AVX1 machines either, and AVX1 >> seems to be a special case in this regard. This species of float and double >> may also be less common in the usage of Vector API since it is larger than >> SPECIES_PREFERRED. > > Hi @merykitty , Agree with you that SPECIES_PREFERRED is preferred for vector > algorithms intercepting both integral and floating point vectors. > > FTR, we see a perf regression with Float256 based micro now on AVX=1 targets, > > > public static short micro() { > VectorShuffle<Float> iota = FloatVector.SPECIES_256.iotaShuffle(0, 1, > true); > return > iota.cast(ShortVector.SPECIES_128).toVector().reinterpretAsShorts().lane(1); > } > > CPROMPT>javad --add-modules=jdk.incubator.vector -XX:UseAVX=1 > -XX:+PrintIntrinsics -XX:CompileCommand=compileonly,shufflef::micro -cp . > shufflef > CompileCommand: compileonly shufflef.micro bool compileonly = true > ** not supported: arity=1 op=reinterpret/1 vlen1=8 etype1=int ismask=0 > ** not supported: arity=1 op=cast/1 vlen1=8 etype1=int ismask=0 > @ 17 java.lang.Object::getClass (0 > bytes) (intrinsic) > @ 24 java.lang.Object::getClass (0 > bytes) (intrinsic) > @ 45 > jdk.internal.vm.vector.VectorSupport::convert (36 bytes) failed to inline > (intrinsic) > @ 34 java.lang.Object::getClass (0 bytes) > (intrinsic) > @ 54 > jdk.internal.vm.vector.VectorSupport::convert (36 bytes) failed to inline > (intrinsic) > @ 17 java.lang.Object::getClass (0 > bytes) (intrinsic) > @ 24 java.lang.Object::getClass (0 > bytes) (intrinsic) > @ 45 > jdk.internal.vm.vector.VectorSupport::convert (36 bytes) (intrinsic) > @ 292 java.lang.Object::getClass (0 > bytes) (intrinsic) > @ 298 java.lang.Object::getClass (0 > bytes) (intrinsic) > @ 322 > jdk.internal.vm.vector.VectorSupport::convert (36 bytes) (intrinsic) > @ 292 java.lang.Object::getClass (0 > bytes) (intrinsic) > @ 298 java.lang.Object::getClass (0 > bytes) (intrinsic) > @ 322 > jdk.internal.vm.vector.VectorSupport::convert (36 bytes) (intrinsic) > @ 16 > jdk.internal.vm.vector.VectorSupport::extract (35 bytes) (intrinsic) > [time] 386ms [res]3392 > CPROMPT>export JAVA_HOME=/home/jatinbha/softwares/jdk-20/ > CPROMPT>export PATH=$JAVA_HOME/bin:$PATH > CPROMPT>javad --add-modules=jdk.incubator.vector -XX:UseAVX=1 > -XX:+PrintIntrinsics -XX:CompileCommand=compileonly,shufflef::micro -cp . > shufflef > CompileCommand: compileonly shufflef.micro bool compileonly = true > WARNING: Using incubator modules: jdk.incubator.vector > @ 3 > jdk.internal.misc.Unsafe::loadFence (5 bytes) (intrinsic) > @ 3 > jdk.internal.misc.Unsafe::loadFence (5 bytes) (intrinsic) > @ 17 > jdk.internal.vm.vector.VectorSupport::shuffleToVector (33 bytes) (intrinsic) > @ 292 java.lang.Object::getClass (0 > bytes) (intrinsic) > @ 298 java.lang.Object::getClass (0 > bytes) (intrinsic) > @ 322 > jdk.internal.vm.vector.VectorSupport::convert (36 bytes) (intrinsic) > @ 16 > jdk.internal.vm.vector.VectorSupport::extract (35 bytes) (intrinsic) > [time] 7ms [res]3392 I see, what do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1161994748