vf_blackdetect: add AVX2 SIMD version

Kieran Kunhya via ffmpeg-devel Fri, 18 Jul 2025 06:33:29 -0700

On Fri, Jul 18, 2025 at 2:22 PM Kacper Michajlow <[email protected]> wrote:
>
> On Fri, 18 Jul 2025 at 14:46, Kieran Kunhya via ffmpeg-devel
> <[email protected]> wrote:
> >
> > On Fri, Jul 18, 2025 at 1:41 PM Kacper Michajlow <[email protected]> wrote:
> > >
> > > On Fri, 18 Jul 2025 at 14:14, Kieran Kunhya via ffmpeg-devel
> > > <[email protected]> wrote:
> > > >
> > > > > blackdetect8_c:                                        820.8 ( 1.00x)
> > > > > blackdetect8_avx2:                                     219.2 ( 3.74x)
> > > > > blackdetect16_c:                                       372.8 ( 1.00x)
> > > > > blackdetect16_avx2:                                    201.4 ( 1.85x)
> > > > >
> > > > > Again, sorry for being pedantic here, but it gives the wrong
> > > > > impression especially if you look at this from outside.
> > > >
> > > > Also misleading as far as I understand because GCC doesn't have
> > > > runtime detection like FFmpeg.
> > >
> > > Speak of... actually GCC does have runtime detection. All you have to
> > > do is mark the function with `target_clones` with requested
> > > architectures and it will dispatch automatically during runtime the
> > > best function to use.
> > >
> > > See for more information:
> > > https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-target_005fclones-function-attribute
> >
> > It's not as sophisticated as our runtime detection (e.g avx512 vs
> > avx512icl which we support).
> > Comparing C vs autovectorised code that works only on some platforms
> > with forced compilation settings is also unfair.
>
> In my original message clang build was completely default, no forced options.
>
> Handwritten avx512 also works on this specific platform. So comparing
> this to autovectorized code (that works on exactly the same platform)
> as a baseline makes sense. Furthermore autovectorized code can scale
> onto more platforms than handwritten avx512. IMHO comparing things in
> the same domain makes more sense.
>
> The point of my message was that we should have defined a baseline
> target, if it is GCC without autovectorization, so be it. But it
> should be specified and not implied in the commit description that the
> compared result is autovectorized.
>
> To be honest, I agree with you. It's misleading and unfair, so we
> shouldn't make any comparisons. This is not only limited to
> autovectorization, scalar code generation also differs. It just
> happens to give the biggest difference.
>
> Context matters, saying "C code performance " is vague. I'm not saying
> one way is better than the other, but it doesn't cost anything to
> specify it better to avoid miscommunication.


It's not fair to compare autovectorised output that's AVX512 that will
be called *on any system with AVX512 support including ones that
downclock heavily* with AVX512(ICL) checked properly in FFmpeg to run
on only non-downlocking systems.
Outside the land of the theoretical compiler world, this is a
practical problem. If FFmpeg used compiler runtime detection I
personally would have a significant number of systems downclock
drastically.
I don't believe compilers are smart enough to generate AVX512 with YMM
for that use-case.

It's substantially uglier to use compiler-specific runtime detection.
Compiler autovectorisation is inconsistent across compiler versions.
It's nothing that can be relied upon.

Kieran
_______________________________________________
ffmpeg-devel mailing list
[email protected]
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
[email protected] with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v2 1/2] avfilter/vf_blackdetect: add AVX2 SIMD version

Reply via email to