On Thu, 17 Jul 2025 at 12:45, Niklas Haas <ffm...@haasn.xyz> wrote: > > From: Niklas Haas <g...@haasn.dev> > > Requested by a user. Even with autovectorization enabled, the compiler > performs a quite poor job of optimizing this function, due to not being > able to take advantage of the pmaxub + pcmpeqb trick for counting the number > of pixels less than or equal-to a threshold. > > blackdetect8_c: 4625.0 ( 1.00x) > blackdetect8_avx2: 155.1 (29.83x) > blackdetect16_c: 2529.4 ( 1.00x) > blackdetect16_avx2: 163.6 (15.46x)
I think we should try to have better standards for reporting performance metrics. Those numbers without context mean not so much. What compiler, flags, cpu were used? Sure, we can omit some information if we want to show only the scaling, but if it highly depends on those things, then we should at least try to be more specific. Sorry for being pedantic about those things, but I think it's important, especially if we put those values in a commit message which will live forever in the repository as a vague reference. > Even with autovectorization enabled You mention the auto vectorization enabled, yet the reported numbers are without it. In my mind this description implies that shown performance comparison is with auto vectorization enabled. When we compare apples to apples, with avx2 we get a more expectable 3.74x (gcc) / 2.38x (clang) depending on the compiler. It's still a good improvement, no reason to oversell it. For reference some metrics on me end: clang 20.1.7 march=generic (default config): blackdetect8_c: 1591.1 ( 1.00x) blackdetect8_avx2: 225.1 ( 7.07x) blackdetect16_c: 643.5 ( 1.00x) blackdetect16_avx2: 220.6 ( 2.92x) march=core-avx2: blackdetect8_c: 526.0 ( 1.00x) blackdetect8_avx2: 220.9 ( 2.38x) blackdetect16_c: 318.8 ( 1.00x) blackdetect16_avx2: 225.9 ( 1.41x) gcc 14.2.0 -fno-tree-vectorize (default config): blackdetect8_c: 5126.6 ( 1.00x) blackdetect8_avx2: 198.0 (25.89x) blackdetect16_c: 2151.9 ( 1.00x) blackdetect16_avx2: 196.8 (10.93x) march=generic -ftree-vectorize: blackdetect8_c: 1354.4 ( 1.00x) blackdetect8_avx2: 196.9 ( 6.88x) blackdetect16_c: 644.2 ( 1.00x) blackdetect16_avx2: 249.8 ( 2.58x) march=core-avx2 -ftree-vectorize: blackdetect8_c: 820.8 ( 1.00x) blackdetect8_avx2: 219.2 ( 3.74x) blackdetect16_c: 372.8 ( 1.00x) blackdetect16_avx2: 201.4 ( 1.85x) Again, sorry for being pedantic here, but it gives the wrong impression especially if you look at this from outside. - Kacper _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".