Hi Team!
Apologies for maybe breaking submit rules but as of now I don't know better :-)
I figured out on arm "hwdownload" is quite slow.
I turns out this is caused by imgutils.c image_copy_plane which does a memcpy
loop
for (;height > 0; height--) {
memcpy(dst, src, bytewidth);
dst += dst_linesize;
src += src_linesize;
}
As a POC, quick'n dirty I create 4 threads and split the copy. In my case this
improved fps from about ~26 to 51
./ffmpeg -hide_banner -hwaccel v4l2request -hwaccel_output_format drm_prime \
-threads 4 \
-i ../Big_Buck_Bunny_720_10s_10MB.mp4 \
-filter_complex "[0:v]hwdownload,format=nv12[myOut]" -map "[myOut]" \
-f null -
Maybe someone is interested in this improvement with cleaned code.
My PoC uses hard coded 4 threads which is for sure bad ...
Btw. This may apply to other locations as well.
Also, but specific for arm there is a tuned memcpy replacement:
https://github.com/simonjhall/copies-and-fills/
which also speeds up ffmpeg (and of course everything else).
Best from Germany
Thilo
_______________________________________________
ffmpeg-devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]