Hi, one more, I forgot.
On Sun, May 19, 2024 at 8:46 PM Stone Chen <chen.stonec...@gmail.com> wrote: > +pw_1: dw 1 > [..] > + vpbroadcastw m4, [pw_1] > We typically suggest to use vpbroadcastd, not w (and then pw_1: times 2 dw 1). agner shows that on e.g. Haswell, the former (d) is 1 uops with 5 cycles latency, whereas the latter (w) is 3 uops with 7 cycles latency, or more generally d is faster then w. Ronald _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".