Re: [FFmpeg-devel] Memcpy Operation Duration
On Oct 18, 2016 11:08 PM, "Ronald S. Bultje" wrote: > > Hi Ali, > > On Tue, Oct 18, 2016 at 3:57 PM, Ali KIZIL wrote: > > > 2016-10-18 22:44 GMT+03:00 Sven C. Dack : > > > > > On 18/10/16 20:26, Ali KIZIL wrote: > > > > > >> Hi Everyone, > > >> > > >> Today, I was analyzing memcpy duration in FFmpeg. I noticed that it is > > >> taking longer time compared to an optimized SSE, SSE2, MMX, MMX2, AVX or > > >> AVX2 based memcpy operation. > > >> > > >> I tried march=corei7-avx2 compiled FFmpeg version, it does not change > > the > > >> duration of memcpy operation. > > >> I also folowed https://trac.ffmpeg.org/wiki/C > > >> ompilationGuide#PerformanceTips > > >> .Same result. In addition, I tried gcc 6.2 if gcc if gcc is not > > selecting > > >> the correct flag. Same result again. > > >> > > >> This memcpy operations effect the fps decoding (and probably encoding) > > >> rates. > > >> > > >> In a case that uyvy422 to p010 3840x2160 unscaled convertion in > > rawvideo, > > >> fps rate increased from 44 fps to 52 fps on a Xeon E5 2630 v4. > > >> > > >> Do I miss anything when compiling FFmpeg for AVX2 or other flag > > optimised, > > >> or there need a fix in FFmpeg to direct some (or all) memcpy operations > > >> to > > >> a inherited memcpy operation which can decide flag for optimisation ? > > >> Or there is no such need and I am on a wrong path ? > > >> > > >> (As a side note, FFmpeg works performance on i7 Extreme cores compared > > to > > >> Xeon v4 processors.) > > > > > > Could be it's gcc's built-in version. It's been said that libc is > > > occasionally better at it than gcc's built-in version. > > > > > > Use -fno-builtin-memcpy and see what difference it makes. > > > > > > I see, tomorrow morning I will give it a try. > > Thank you for the good idea. If it increase performance, maybe it will be a > > good idea to make a configure option. > > > configure has --extra-cflags=.. and --extra-ldflags=.. options to add > custom CC CLI arguments. > > Ronald > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel Hi Ronald, Yes, I used extra flags to give march=native or march=corei7-avx2. Tomorrow, I will try -fno-builtin-memcpy option with extra-cflag. I will update the topic. Thank you, ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] Memcpy Operation Duration
Hi Ali, On Tue, Oct 18, 2016 at 3:57 PM, Ali KIZIL wrote: > 2016-10-18 22:44 GMT+03:00 Sven C. Dack : > > > On 18/10/16 20:26, Ali KIZIL wrote: > > > >> Hi Everyone, > >> > >> Today, I was analyzing memcpy duration in FFmpeg. I noticed that it is > >> taking longer time compared to an optimized SSE, SSE2, MMX, MMX2, AVX or > >> AVX2 based memcpy operation. > >> > >> I tried march=corei7-avx2 compiled FFmpeg version, it does not change > the > >> duration of memcpy operation. > >> I also folowed https://trac.ffmpeg.org/wiki/C > >> ompilationGuide#PerformanceTips > >> .Same result. In addition, I tried gcc 6.2 if gcc if gcc is not > selecting > >> the correct flag. Same result again. > >> > >> This memcpy operations effect the fps decoding (and probably encoding) > >> rates. > >> > >> In a case that uyvy422 to p010 3840x2160 unscaled convertion in > rawvideo, > >> fps rate increased from 44 fps to 52 fps on a Xeon E5 2630 v4. > >> > >> Do I miss anything when compiling FFmpeg for AVX2 or other flag > optimised, > >> or there need a fix in FFmpeg to direct some (or all) memcpy operations > >> to > >> a inherited memcpy operation which can decide flag for optimisation ? > >> Or there is no such need and I am on a wrong path ? > >> > >> (As a side note, FFmpeg works performance on i7 Extreme cores compared > to > >> Xeon v4 processors.) > > > > Could be it's gcc's built-in version. It's been said that libc is > > occasionally better at it than gcc's built-in version. > > > > Use -fno-builtin-memcpy and see what difference it makes. > > I see, tomorrow morning I will give it a try. > Thank you for the good idea. If it increase performance, maybe it will be a > good idea to make a configure option. configure has --extra-cflags=.. and --extra-ldflags=.. options to add custom CC CLI arguments. Ronald ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] Memcpy Operation Duration
2016-10-18 22:44 GMT+03:00 Sven C. Dack : > On 18/10/16 20:26, Ali KIZIL wrote: > >> Hi Everyone, >> >> Today, I was analyzing memcpy duration in FFmpeg. I noticed that it is >> taking longer time compared to an optimized SSE, SSE2, MMX, MMX2, AVX or >> AVX2 based memcpy operation. >> >> I tried march=corei7-avx2 compiled FFmpeg version, it does not change the >> duration of memcpy operation. >> I also folowed https://trac.ffmpeg.org/wiki/C >> ompilationGuide#PerformanceTips >> .Same result. In addition, I tried gcc 6.2 if gcc if gcc is not selecting >> the correct flag. Same result again. >> >> This memcpy operations effect the fps decoding (and probably encoding) >> rates. >> >> In a case that uyvy422 to p010 3840x2160 unscaled convertion in rawvideo, >> fps rate increased from 44 fps to 52 fps on a Xeon E5 2630 v4. >> >> Do I miss anything when compiling FFmpeg for AVX2 or other flag optimised, >> or there need a fix in FFmpeg to direct some (or all) memcpy operations >> to >> a inherited memcpy operation which can decide flag for optimisation ? >> Or there is no such need and I am on a wrong path ? >> >> (As a side note, FFmpeg works performance on i7 Extreme cores compared to >> Xeon v4 processors.) >> >> Kind Regards, >> Ali KIZIL >> ___ >> ffmpeg-devel mailing list >> ffmpeg-devel@ffmpeg.org >> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel >> > > Could be it's gcc's built-in version. It's been said that libc is > occasionally better at it than gcc's built-in version. > > Use -fno-builtin-memcpy and see what difference it makes. > > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel I see, tomorrow morning I will give it a try. Thank you for the good idea. If it increase performance, maybe it will be a good idea to make a configure option. Kind Regards, ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] Memcpy Operation Duration
On 18/10/16 20:26, Ali KIZIL wrote: Hi Everyone, Today, I was analyzing memcpy duration in FFmpeg. I noticed that it is taking longer time compared to an optimized SSE, SSE2, MMX, MMX2, AVX or AVX2 based memcpy operation. I tried march=corei7-avx2 compiled FFmpeg version, it does not change the duration of memcpy operation. I also folowed https://trac.ffmpeg.org/wiki/CompilationGuide#PerformanceTips .Same result. In addition, I tried gcc 6.2 if gcc if gcc is not selecting the correct flag. Same result again. This memcpy operations effect the fps decoding (and probably encoding) rates. In a case that uyvy422 to p010 3840x2160 unscaled convertion in rawvideo, fps rate increased from 44 fps to 52 fps on a Xeon E5 2630 v4. Do I miss anything when compiling FFmpeg for AVX2 or other flag optimised, or there need a fix in FFmpeg to direct some (or all) memcpy operations to a inherited memcpy operation which can decide flag for optimisation ? Or there is no such need and I am on a wrong path ? (As a side note, FFmpeg works performance on i7 Extreme cores compared to Xeon v4 processors.) Kind Regards, Ali KIZIL ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel Could be it's gcc's built-in version. It's been said that libc is occasionally better at it than gcc's built-in version. Use -fno-builtin-memcpy and see what difference it makes. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] Memcpy Operation Duration
Hi Everyone, Today, I was analyzing memcpy duration in FFmpeg. I noticed that it is taking longer time compared to an optimized SSE, SSE2, MMX, MMX2, AVX or AVX2 based memcpy operation. I tried march=corei7-avx2 compiled FFmpeg version, it does not change the duration of memcpy operation. I also folowed https://trac.ffmpeg.org/wiki/CompilationGuide#PerformanceTips .Same result. In addition, I tried gcc 6.2 if gcc if gcc is not selecting the correct flag. Same result again. This memcpy operations effect the fps decoding (and probably encoding) rates. In a case that uyvy422 to p010 3840x2160 unscaled convertion in rawvideo, fps rate increased from 44 fps to 52 fps on a Xeon E5 2630 v4. Do I miss anything when compiling FFmpeg for AVX2 or other flag optimised, or there need a fix in FFmpeg to direct some (or all) memcpy operations to a inherited memcpy operation which can decide flag for optimisation ? Or there is no such need and I am on a wrong path ? (As a side note, FFmpeg works performance on i7 Extreme cores compared to Xeon v4 processors.) Kind Regards, Ali KIZIL ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel