On Fri, 18 Jul 2025 14:38:04 +0200 Kacper Michajlow <kaspe...@gmail.com> wrote:
> > +static inline int ff_detect_range_c(const uint8_t *data, ptrdiff_t stride,
> > +                                    ptrdiff_t width, ptrdiff_t height,
> > +                                    int mpeg_min, int mpeg_max)
> > +{
> > +    while (height--) {
> > +        for (int x = 0; x < width; x++) {
> > +            const uint8_t val = data[x];
> > +            if (val < mpeg_min || val > mpeg_max)
> > +                return 1;
> > +        }
> > +        data += stride;
> > +    }
> > +
> > +    return 0;
> > +}
>
> You could process width as a whole to allow better vectorization.
> Assuming you don't process 10000x1 images, it will be faster on average.

That's what I had in v1 of my patch, but it is significantly (50%) slower
on GCC, which prefers the version I have written above.

There is the not insignificant point that this C routine is also being used
to handle remaining elements that don't fit into a multiple of the SIMD
kernel, for which the scalar code is actually preferred.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to