On Fri, Apr 1, 2016 at 4:15 PM, Matthieu Bouron <matthieu.bou...@gmail.com> wrote:
> > > On Mon, Mar 28, 2016 at 9:12 PM, Matthieu Bouron < > matthieu.bou...@gmail.com> wrote: > >> >> >> On Sun, Mar 27, 2016 at 5:58 PM, Matthieu Bouron < >> matthieu.bou...@gmail.com> wrote: >> >>> >>> >>> On Fri, Mar 25, 2016 at 11:45 PM, Matthieu Bouron < >>> matthieu.bou...@gmail.com> wrote: >>> >>>> The following patchset aims to make bitexact the yuv->rgba armv7 neon >>>> code path >>>> with the aarch64 one. It also aims to make the two code bases as close >>>> as >>>> possible. >>>> >>>> [PATCH 01/10] swscale/arm/yuv2rgb: remove 32bit code path >>>> >>>> The current 32bit code path which is unused is removed. >>>> >>>> [PATCH 06/10] swscale/arm/yuv2rgb: only process one line at a time >>>> >>>> The code process only one line at a time for the yuv420p,nv12 and nv21 >>>> formats >>>> with no regression in performance observed on a rpi2 (I've even >>>> observed a >>>> slight increase of performance for the nv12 and nv21 formats). >>>> >>>> [PATCH 10/10] swscale/arm/yuv2rgb: make the code bitexact with its >>>> >>>> The last patch of the serie makes the code bitexact with the aarch64 >>>> version. >>>> The increase of precision (which introduces a performance loss) is >>>> compensated >>>> by a refactor/optimisation that saves quite a few mov,vdup and vqdmulh. >>>> >>>> ./ffmpeg_g -nostats -f lavfi -i >>>> testsrc2=1920x1080:d=5,format=nv12,bench=start,format=bgra,bench=stop -f >>>> null - >>>> >>>> without patchset : >>>> [bench @ 0x3eb6a0] t:0.020660 avg:0.020813 max:0.039399 min:0.020605 >>>> >>>> with patchset: >>>> [bench @ 0xe5f6a0] t:0.018924 avg:0.019075 max:0.037472 min:0.01884 >>> >>> >>> I've managed tu run the code on a beagle bone black board, here are the >>> results: >>> >>> nv12->bgra >>> without patchset: [bench @ 0x1fc02d0] t:0.011618 avg:0.011743 >>> max:0.032600 min:0.011513 >>> with patches 01-06/10 applied: [bench @ 0x8052d0] t:0.013438 >>> avg:0.013659 max:0.034427 min:0.013411 >>> with patches 01-10/10 applied: [bench @ 0x1fbb2d0] t:0.012554 >>> avg:0.012751 max:0.034288 min:0.012523 >>> >>> yuv420p->bgra >>> without patchset: [bench @ 0x6d42d0] t:0.012954 avg:0.013159 >>> max:0.033866 min:0.012945 >>> with patches 01-06/10 applied: [bench @ 0x20172d0] t:0.015154 >>> avg:0.015358 max:0.036186 min:0.015134 >>> with patches 01-10/10 applied: [bench @ 0x1d162d0] t:0.014623 >>> avg:0.014784 max:0.035487 min:0.014568 >>> >>> So it looks like processing one line at a time as negative effect on >>> performance on this board (as opposed to the rpi2). I'll try to keep the >>> two line processing code and post some result (so we can decide, which >>> version to choose). >>> >> >> I've managed to update the patchset to keep processing two line at a time >> for the nv12,nv21 and yuv420p formats, here are the results: >> >> ./ffmpeg_g -nostats -f lavfi -i >> testsrc2=1920x1080:d=5,format=nv12,bench=start,format=bgra,bench=stop -f >> null - >> >> Beagle bone black: >> without patchset: [bench @ 0x1fc02d0] t:0.011618 avg:0.011743 >> max:0.032600 min:0.011513 >> with patchset v1: [bench @ 0x1fbb2d0] t:0.012554 avg:0.012751 >> max:0.034288 min:0.012523 >> with patchset v2: [bench @ 0x10f92d0] t:0.011239 avg:0.011408 >> max:0.032124 min:0.011202 >> >> Nexus5: >> without patchset: avg: ~2,869ms >> with patchset v1: avg: ~3,008ms >> with patchset v2: avg: ~2,702ms >> >> RPI2: >> without patchset: [bench @ 0x3eb6a0] t:0.020660 avg:0.020813 >> max:0.039399 min:0.020605 >> with patchset v1: [bench @ 0xe5f6a0] t:0.018924 avg:0.019075 >> max:0.037472 min:0.01884 >> with patchset v2: [bench @ 0xc1b6a0] t:0.020999 avg:0.021203 max:0.052184 >> min:0.020768 >> >> Given the following the results, i will drop the current patchset and >> submit another one (which keeps processing two lines at a time). >> > > I will push the updated patchset (which takes into account Benoit's > comments) in one hour~. > Pushed. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel