On 7/13/2022 12:54 PM, Marco Vianini wrote:
Sorry, my mail client was using html format.
I hope now the mail will be sent correctly.
You can get a very big improvement of performances in the special (but very likely) case of:
"(dst_linesize == bytewidth && src_linesize == bytewidth)"
In this case in fact We can "Coalesce rows", that is using ONLY ONE MEMCPY,
instead of a smaller memcpy for every row (that is looping for height times).
Code:
"
static void image_copy_plane(uint8_t *dst, ptrdiff_t dst_linesize,
const uint8_t *src, ptrdiff_t src_linesize,
ptrdiff_t bytewidth, int height)
{
if (!dst || !src)
return;
av_assert0(abs(src_linesize) >= bytewidth);
av_assert0(abs(dst_linesize) >= bytewidth);
/// MY PATCH START
/// Coalesce rows.
if (dst_linesize == bytewidth && src_linesize == bytewidth) {
bytewidth *= height;
height = 1;
src_linesize = dst_linesize = 0;
}
/// MY PATCH STOP
for (;height > 0; height--) {
memcpy(dst, src, bytewidth);
dst += dst_linesize;
src += src_linesize;
}
}
"
I did following tests on Windows 10 64bit.
I compiled code in Release.
I copied my pc camera frames 1000 times (resolution 1920x1080):
With Coalesce:
copy_cnt=100 size=1920x1080 tot_time_copy(us)=36574 (average=365.74)
copy_cnt=200 size=1920x1080 tot_time_copy(us)=78207 (average=391.035)
copy_cnt=300 size=1920x1080 tot_time_copy(us)=122170(average=407.233)
copy_cnt=400 size=1920x1080 tot_time_copy(us)=163678(average=409.195)
copy_cnt=500 size=1920x1080 tot_time_copy(us)=201872(average=403.744)
copy_cnt=600 size=1920x1080 tot_time_copy(us)=246174(average=410.29)
copy_cnt=700 size=1920x1080 tot_time_copy(us)=287043(average=410.061)
copy_cnt=800 size=1920x1080 tot_time_copy(us)=326462(average=408.077)
copy_cnt=900 size=1920x1080 tot_time_copy(us)=356882(average=396.536)
copy_cnt=1000 size=1920x1080 tot_time_copy(us)=394566(average=394.566)
Without Coalesce:
copy_cnt=100 size=1920x1080 tot_time_copy(us)=44303 (average=443.03)
copy_cnt=200 size=1920x1080 tot_time_copy(us)=100501(average=502.505)
copy_cnt=300 size=1920x1080 tot_time_copy(us)=150097(average=500.323)
copy_cnt=400 size=1920x1080 tot_time_copy(us)=201010(average=502.525)
copy_cnt=500 size=1920x1080 tot_time_copy(us)=256818(average=513.636)
copy_cnt=600 size=1920x1080 tot_time_copy(us)=303273(average=505.455)
copy_cnt=700 size=1920x1080 tot_time_copy(us)=359152(average=513.074)
copy_cnt=800 size=1920x1080 tot_time_copy(us)=414413(average=518.016)
copy_cnt=900 size=1920x1080 tot_time_copy(us)=465315(average=517.017)
copy_cnt=1000 size=1920x1080 tot_time_copy(us)=520381(average=520.381)
I think the results are very good.
What do you think about?
It looks like a good speed up, but we need a patch created with git
format-patch that can be applied to the source tree to properly review
this. Can you send that?
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".