On Fri, Jul 18, 2025 at 2:22 PM Kacper Michajlow <kaspe...@gmail.com> wrote: > > On Fri, 18 Jul 2025 at 14:46, Kieran Kunhya via ffmpeg-devel > <ffmpeg-devel@ffmpeg.org> wrote: > > > > On Fri, Jul 18, 2025 at 1:41 PM Kacper Michajlow <kaspe...@gmail.com> wrote: > > > > > > On Fri, 18 Jul 2025 at 14:14, Kieran Kunhya via ffmpeg-devel > > > <ffmpeg-devel@ffmpeg.org> wrote: > > > > > > > > > blackdetect8_c: 820.8 ( 1.00x) > > > > > blackdetect8_avx2: 219.2 ( 3.74x) > > > > > blackdetect16_c: 372.8 ( 1.00x) > > > > > blackdetect16_avx2: 201.4 ( 1.85x) > > > > > > > > > > Again, sorry for being pedantic here, but it gives the wrong > > > > > impression especially if you look at this from outside. > > > > > > > > Also misleading as far as I understand because GCC doesn't have > > > > runtime detection like FFmpeg. > > > > > > Speak of... actually GCC does have runtime detection. All you have to > > > do is mark the function with `target_clones` with requested > > > architectures and it will dispatch automatically during runtime the > > > best function to use. > > > > > > See for more information: > > > https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-target_005fclones-function-attribute > > > > It's not as sophisticated as our runtime detection (e.g avx512 vs > > avx512icl which we support). > > Comparing C vs autovectorised code that works only on some platforms > > with forced compilation settings is also unfair. > > In my original message clang build was completely default, no forced options. > > Handwritten avx512 also works on this specific platform. So comparing > this to autovectorized code (that works on exactly the same platform) > as a baseline makes sense. Furthermore autovectorized code can scale > onto more platforms than handwritten avx512. IMHO comparing things in > the same domain makes more sense. > > The point of my message was that we should have defined a baseline > target, if it is GCC without autovectorization, so be it. But it > should be specified and not implied in the commit description that the > compared result is autovectorized. > > To be honest, I agree with you. It's misleading and unfair, so we > shouldn't make any comparisons. This is not only limited to > autovectorization, scalar code generation also differs. It just > happens to give the biggest difference. > > Context matters, saying "C code performance " is vague. I'm not saying > one way is better than the other, but it doesn't cost anything to > specify it better to avoid miscommunication.
It's not fair to compare autovectorised output that's AVX512 that will be called *on any system with AVX512 support including ones that downclock heavily* with AVX512(ICL) checked properly in FFmpeg to run on only non-downlocking systems. Outside the land of the theoretical compiler world, this is a practical problem. If FFmpeg used compiler runtime detection I personally would have a significant number of systems downclock drastically. I don't believe compilers are smart enough to generate AVX512 with YMM for that use-case. It's substantially uglier to use compiler-specific runtime detection. Compiler autovectorisation is inconsistent across compiler versions. It's nothing that can be relied upon. Kieran _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".