cpu: Adds fast gather detection.

Lynne Fri, 25 Jun 2021 01:40:09 -0700

Jun 25, 2021, 09:54 by alankelly-at-google....@ffmpeg.org:

> Broadwell and later and Zen3 and later have fast gather instructions.
> ---
>  Gather requires between 9 and 12 cycles on Haswell, 5 to 7 on Broadwell,
>  and 2 to 5 on Skylake and newer. It is also slow on AMD before Zen 3.
>  libavutil/cpu.h     |  2 ++
>  libavutil/x86/cpu.c | 18 ++++++++++++++++--
>  libavutil/x86/cpu.h |  1 +
>  3 files changed, 19 insertions(+), 2 deletions(-)
>


No, we really don't need more FAST/SLOW flags, especially for
something like this which is just fixable by _not_using_vgather_.
Take a look at libavutil/x86/tx_float.asm, we only use vgather
if it's guaranteed to either be faster for what we're gathering or
is just as fast "slow". If neither is true, we use manual lookups,
which is actually advantageous since for AVX2 we can interleave
the lookups that happen in each lane.

Even if we disregard this, I've extensively benchmarked vgather
on Zen 3, Zen 2, Cascade Lake and Skylake, and there's hardly
a great vgather improvement to be found in Zen 3 to justify
using a new CPU flag for this.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Adds fast gather detection.

Reply via email to