Hi Team!

Apologies for maybe breaking submit rules but as of now I don't know better :-)

I figured out on arm "hwdownload" is quite slow.
I turns out this is caused by imgutils.c image_copy_plane which does a memcpy 
loop

     for (;height > 0; height--) {

        memcpy(dst, src, bytewidth);

        dst += dst_linesize;
        src += src_linesize;
    }

As a POC, quick'n dirty I create 4 threads and split the copy. In my case this 
improved fps from about ~26 to 51

./ffmpeg -hide_banner -hwaccel v4l2request -hwaccel_output_format drm_prime \
 -threads 4 \
 -i ../Big_Buck_Bunny_720_10s_10MB.mp4 \
 -filter_complex "[0:v]hwdownload,format=nv12[myOut]" -map "[myOut]"  \
 -f null -

Maybe someone is interested in this improvement with cleaned code. 
My PoC uses hard coded 4 threads which is for sure bad ...

Btw. This may apply to other locations as well.


Also, but specific for arm there is a tuned memcpy replacement:
https://github.com/simonjhall/copies-and-fills/
which also speeds up ffmpeg (and of course everything else).


 Best from Germany
     Thilo

_______________________________________________
ffmpeg-devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to