Re: [FFmpeg-devel] [PATCH v2] sws/aarch64: add {nv12, nv21, yuv420p, yuv422p}_to_{argb, rgba, abgr, rgba}_neon

2016-03-01 Thread Clément Bœsch
On Tue, Mar 01, 2016 at 05:18:36PM +0100, Michael Niedermayer wrote:
> On Tue, Mar 01, 2016 at 11:11:36AM +0100, Clément Bœsch wrote:
> > On Mon, Feb 29, 2016 at 10:55:49AM +0100, Clément Bœsch wrote:
> > > From: Clément Bœsch 
> > > 
> > > ---
> > > Changes since latest version:
> > > - remove unused 32-bit path
> > > - make 16-bit path more accurate by mirroring the MMX code (still not 
> > > bitexact)
> > > - the code as originally trying to process 2 lines at a time to save 
> > > chroma pre
> > >   mult computations and avoid re-reading the whole line; for some reason, 
> > > this
> > >   actually made the code around twice slower, for twice the complexity.
> > >   dropping that complexity was a win-win.
> > > ---
> > >  libswscale/aarch64/Makefile   |   3 +
> > >  libswscale/aarch64/swscale_unscaled.c | 132 ++
> > >  libswscale/aarch64/yuv2rgb_neon.S | 207 
> > > ++
> > >  libswscale/swscale_internal.h |   1 +
> > >  libswscale/swscale_unscaled.c |   2 +
> > >  5 files changed, 345 insertions(+)
> > >  create mode 100644 libswscale/aarch64/Makefile
> > >  create mode 100644 libswscale/aarch64/swscale_unscaled.c
> > >  create mode 100644 libswscale/aarch64/yuv2rgb_neon.S
> > > 
> > 
> > Random benchmark on Hikey (Cortex-A53):
> > 
> > ./ffmpeg -nostats -f lavfi -i testsrc2=s=uhd2160:d=1 -vf 
> > format=yuv420p,bench=start,format=rgba,bench=stop -f null -
> > 
> > (yuv420p to rgba in 3840x2160)
> > 
> > before:
> > [bench @ 0x2edfe1e0] t:0.181514 avg:0.181514 max:0.181514 min:0.181514
> > [bench @ 0x2edfe1e0] t:0.178870 avg:0.180192 max:0.181514 min:0.178870
> > [bench @ 0x2edfe1e0] t:0.164448 avg:0.174944 max:0.181514 min:0.164448
> > [bench @ 0x2edfe1e0] t:0.164801 avg:0.172408 max:0.181514 min:0.164448
> > [bench @ 0x2edfe1e0] t:0.164635 avg:0.170853 max:0.181514 min:0.164448
> > [bench @ 0x2edfe1e0] t:0.164756 avg:0.169837 max:0.181514 min:0.164448
> > [bench @ 0x2edfe1e0] t:0.164784 avg:0.169115 max:0.181514 min:0.164448
> > [bench @ 0x2edfe1e0] t:0.164413 avg:0.168527 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.164760 avg:0.168109 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.164647 avg:0.167762 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.164698 avg:0.167484 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.164600 avg:0.167243 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.164498 avg:0.167032 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.164765 avg:0.166870 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.164613 avg:0.166720 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.164781 avg:0.166598 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.164489 avg:0.166474 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.164432 avg:0.166361 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.164540 avg:0.166265 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.164524 avg:0.166178 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.165147 avg:0.166129 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.165484 avg:0.166099 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.165703 avg:0.166082 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.165643 avg:0.166064 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.165294 avg:0.166033 max:0.181514 min:0.164413
> > 
> > after:
> > [bench @ 0x16d871e0] t:0.042296 avg:0.042296 max:0.042296 min:0.042296
> > [bench @ 0x16d871e0] t:0.041986 avg:0.042141 max:0.042296 min:0.041986
> > [bench @ 0x16d871e0] t:0.027298 avg:0.037193 max:0.042296 min:0.027298
> > [bench @ 0x16d871e0] t:0.027388 avg:0.034742 max:0.042296 min:0.027298
> > [bench @ 0x16d871e0] t:0.027383 avg:0.033270 max:0.042296 min:0.027298
> > [bench @ 0x16d871e0] t:0.027366 avg:0.032286 max:0.042296 min:0.027298
> > [bench @ 0x16d871e0] t:0.027225 avg:0.031563 max:0.042296 min:0.027225
> > [bench @ 0x16d871e0] t:0.027685 avg:0.031078 max:0.042296 min:0.027225
> > [bench @ 0x16d871e0] t:0.027246 avg:0.030652 max:0.042296 min:0.027225
> > [bench @ 0x16d871e0] t:0.027363 avg:0.030323 max:0.042296 min:0.027225
> > [bench @ 0x16d871e0] t:0.027449 avg:0.030062 max:0.042296 min:0.027225
> > [bench @ 0x16d871e0] t:0.027582 avg:0.029855 max:0.042296 min:0.027225
> > [bench @ 0x16d871e0] t:0.027374 avg:0.029664 max:0.042296 min:0.027225
> > [bench @ 0x16d871e0] t:0.027429 avg:0.029505 max:0.042296 min:0.027225
> > [bench @ 0x16d871e0] t:0.027275 avg:0.029356 max:0.042296 min:0.027225
> > [bench @ 0x16d871e0] t:0.027573 avg:0.029244 max:0.042296 min:0.027225
> > [bench @ 0x16d871e0] t:0.027219 avg:0.029125 max:0.042296 min:0.027219
> > [bench @ 0x16d871e0] t:0.027392 avg:0.029029 max:0.042296 min:0.027219
> > [bench @ 0x16d871e0] t:0.027720 avg:0.028960 max:0.042296 min:0.027219
> > [bench @ 0x16d871e0] t:0.027449 avg:0.028884 max:0.042296 min:0.027219
> > [bench @ 0x16d871e0] t:0.027473 avg:0.028817 max:0.042296 

Re: [FFmpeg-devel] [PATCH v2] sws/aarch64: add {nv12, nv21, yuv420p, yuv422p}_to_{argb, rgba, abgr, rgba}_neon

2016-03-01 Thread Michael Niedermayer
On Tue, Mar 01, 2016 at 11:11:36AM +0100, Clément Bœsch wrote:
> On Mon, Feb 29, 2016 at 10:55:49AM +0100, Clément Bœsch wrote:
> > From: Clément Bœsch 
> > 
> > ---
> > Changes since latest version:
> > - remove unused 32-bit path
> > - make 16-bit path more accurate by mirroring the MMX code (still not 
> > bitexact)
> > - the code as originally trying to process 2 lines at a time to save chroma 
> > pre
> >   mult computations and avoid re-reading the whole line; for some reason, 
> > this
> >   actually made the code around twice slower, for twice the complexity.
> >   dropping that complexity was a win-win.
> > ---
> >  libswscale/aarch64/Makefile   |   3 +
> >  libswscale/aarch64/swscale_unscaled.c | 132 ++
> >  libswscale/aarch64/yuv2rgb_neon.S | 207 
> > ++
> >  libswscale/swscale_internal.h |   1 +
> >  libswscale/swscale_unscaled.c |   2 +
> >  5 files changed, 345 insertions(+)
> >  create mode 100644 libswscale/aarch64/Makefile
> >  create mode 100644 libswscale/aarch64/swscale_unscaled.c
> >  create mode 100644 libswscale/aarch64/yuv2rgb_neon.S
> > 
> 
> Random benchmark on Hikey (Cortex-A53):
> 
> ./ffmpeg -nostats -f lavfi -i testsrc2=s=uhd2160:d=1 -vf 
> format=yuv420p,bench=start,format=rgba,bench=stop -f null -
> 
> (yuv420p to rgba in 3840x2160)
> 
> before:
> [bench @ 0x2edfe1e0] t:0.181514 avg:0.181514 max:0.181514 min:0.181514
> [bench @ 0x2edfe1e0] t:0.178870 avg:0.180192 max:0.181514 min:0.178870
> [bench @ 0x2edfe1e0] t:0.164448 avg:0.174944 max:0.181514 min:0.164448
> [bench @ 0x2edfe1e0] t:0.164801 avg:0.172408 max:0.181514 min:0.164448
> [bench @ 0x2edfe1e0] t:0.164635 avg:0.170853 max:0.181514 min:0.164448
> [bench @ 0x2edfe1e0] t:0.164756 avg:0.169837 max:0.181514 min:0.164448
> [bench @ 0x2edfe1e0] t:0.164784 avg:0.169115 max:0.181514 min:0.164448
> [bench @ 0x2edfe1e0] t:0.164413 avg:0.168527 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.164760 avg:0.168109 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.164647 avg:0.167762 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.164698 avg:0.167484 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.164600 avg:0.167243 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.164498 avg:0.167032 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.164765 avg:0.166870 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.164613 avg:0.166720 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.164781 avg:0.166598 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.164489 avg:0.166474 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.164432 avg:0.166361 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.164540 avg:0.166265 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.164524 avg:0.166178 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.165147 avg:0.166129 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.165484 avg:0.166099 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.165703 avg:0.166082 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.165643 avg:0.166064 max:0.181514 min:0.164413
> [bench @ 0x2edfe1e0] t:0.165294 avg:0.166033 max:0.181514 min:0.164413
> 
> after:
> [bench @ 0x16d871e0] t:0.042296 avg:0.042296 max:0.042296 min:0.042296
> [bench @ 0x16d871e0] t:0.041986 avg:0.042141 max:0.042296 min:0.041986
> [bench @ 0x16d871e0] t:0.027298 avg:0.037193 max:0.042296 min:0.027298
> [bench @ 0x16d871e0] t:0.027388 avg:0.034742 max:0.042296 min:0.027298
> [bench @ 0x16d871e0] t:0.027383 avg:0.033270 max:0.042296 min:0.027298
> [bench @ 0x16d871e0] t:0.027366 avg:0.032286 max:0.042296 min:0.027298
> [bench @ 0x16d871e0] t:0.027225 avg:0.031563 max:0.042296 min:0.027225
> [bench @ 0x16d871e0] t:0.027685 avg:0.031078 max:0.042296 min:0.027225
> [bench @ 0x16d871e0] t:0.027246 avg:0.030652 max:0.042296 min:0.027225
> [bench @ 0x16d871e0] t:0.027363 avg:0.030323 max:0.042296 min:0.027225
> [bench @ 0x16d871e0] t:0.027449 avg:0.030062 max:0.042296 min:0.027225
> [bench @ 0x16d871e0] t:0.027582 avg:0.029855 max:0.042296 min:0.027225
> [bench @ 0x16d871e0] t:0.027374 avg:0.029664 max:0.042296 min:0.027225
> [bench @ 0x16d871e0] t:0.027429 avg:0.029505 max:0.042296 min:0.027225
> [bench @ 0x16d871e0] t:0.027275 avg:0.029356 max:0.042296 min:0.027225
> [bench @ 0x16d871e0] t:0.027573 avg:0.029244 max:0.042296 min:0.027225
> [bench @ 0x16d871e0] t:0.027219 avg:0.029125 max:0.042296 min:0.027219
> [bench @ 0x16d871e0] t:0.027392 avg:0.029029 max:0.042296 min:0.027219
> [bench @ 0x16d871e0] t:0.027720 avg:0.028960 max:0.042296 min:0.027219
> [bench @ 0x16d871e0] t:0.027449 avg:0.028884 max:0.042296 min:0.027219
> [bench @ 0x16d871e0] t:0.027473 avg:0.028817 max:0.042296 min:0.027219
> [bench @ 0x16d871e0] t:0.027444 avg:0.028755 max:0.042296 min:0.027219
> [bench @ 0x16d871e0] t:0.027535 avg:0.028702 max:0.042296 min:0.027219
> [bench @ 0x16d871e0] t:0.027607 avg:0.028656 max:0.042296 min:0.027219
> [bench 

Re: [FFmpeg-devel] [PATCH v2] sws/aarch64: add {nv12, nv21, yuv420p, yuv422p}_to_{argb, rgba, abgr, rgba}_neon

2016-03-01 Thread Clément Bœsch
On Mon, Feb 29, 2016 at 10:55:49AM +0100, Clément Bœsch wrote:
> From: Clément Bœsch 
> 
> ---
> Changes since latest version:
> - remove unused 32-bit path
> - make 16-bit path more accurate by mirroring the MMX code (still not 
> bitexact)
> - the code as originally trying to process 2 lines at a time to save chroma 
> pre
>   mult computations and avoid re-reading the whole line; for some reason, this
>   actually made the code around twice slower, for twice the complexity.
>   dropping that complexity was a win-win.
> ---
>  libswscale/aarch64/Makefile   |   3 +
>  libswscale/aarch64/swscale_unscaled.c | 132 ++
>  libswscale/aarch64/yuv2rgb_neon.S | 207 
> ++
>  libswscale/swscale_internal.h |   1 +
>  libswscale/swscale_unscaled.c |   2 +
>  5 files changed, 345 insertions(+)
>  create mode 100644 libswscale/aarch64/Makefile
>  create mode 100644 libswscale/aarch64/swscale_unscaled.c
>  create mode 100644 libswscale/aarch64/yuv2rgb_neon.S
> 

Random benchmark on Hikey (Cortex-A53):

./ffmpeg -nostats -f lavfi -i testsrc2=s=uhd2160:d=1 -vf 
format=yuv420p,bench=start,format=rgba,bench=stop -f null -

(yuv420p to rgba in 3840x2160)

before:
[bench @ 0x2edfe1e0] t:0.181514 avg:0.181514 max:0.181514 min:0.181514
[bench @ 0x2edfe1e0] t:0.178870 avg:0.180192 max:0.181514 min:0.178870
[bench @ 0x2edfe1e0] t:0.164448 avg:0.174944 max:0.181514 min:0.164448
[bench @ 0x2edfe1e0] t:0.164801 avg:0.172408 max:0.181514 min:0.164448
[bench @ 0x2edfe1e0] t:0.164635 avg:0.170853 max:0.181514 min:0.164448
[bench @ 0x2edfe1e0] t:0.164756 avg:0.169837 max:0.181514 min:0.164448
[bench @ 0x2edfe1e0] t:0.164784 avg:0.169115 max:0.181514 min:0.164448
[bench @ 0x2edfe1e0] t:0.164413 avg:0.168527 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.164760 avg:0.168109 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.164647 avg:0.167762 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.164698 avg:0.167484 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.164600 avg:0.167243 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.164498 avg:0.167032 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.164765 avg:0.166870 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.164613 avg:0.166720 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.164781 avg:0.166598 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.164489 avg:0.166474 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.164432 avg:0.166361 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.164540 avg:0.166265 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.164524 avg:0.166178 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.165147 avg:0.166129 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.165484 avg:0.166099 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.165703 avg:0.166082 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.165643 avg:0.166064 max:0.181514 min:0.164413
[bench @ 0x2edfe1e0] t:0.165294 avg:0.166033 max:0.181514 min:0.164413

after:
[bench @ 0x16d871e0] t:0.042296 avg:0.042296 max:0.042296 min:0.042296
[bench @ 0x16d871e0] t:0.041986 avg:0.042141 max:0.042296 min:0.041986
[bench @ 0x16d871e0] t:0.027298 avg:0.037193 max:0.042296 min:0.027298
[bench @ 0x16d871e0] t:0.027388 avg:0.034742 max:0.042296 min:0.027298
[bench @ 0x16d871e0] t:0.027383 avg:0.033270 max:0.042296 min:0.027298
[bench @ 0x16d871e0] t:0.027366 avg:0.032286 max:0.042296 min:0.027298
[bench @ 0x16d871e0] t:0.027225 avg:0.031563 max:0.042296 min:0.027225
[bench @ 0x16d871e0] t:0.027685 avg:0.031078 max:0.042296 min:0.027225
[bench @ 0x16d871e0] t:0.027246 avg:0.030652 max:0.042296 min:0.027225
[bench @ 0x16d871e0] t:0.027363 avg:0.030323 max:0.042296 min:0.027225
[bench @ 0x16d871e0] t:0.027449 avg:0.030062 max:0.042296 min:0.027225
[bench @ 0x16d871e0] t:0.027582 avg:0.029855 max:0.042296 min:0.027225
[bench @ 0x16d871e0] t:0.027374 avg:0.029664 max:0.042296 min:0.027225
[bench @ 0x16d871e0] t:0.027429 avg:0.029505 max:0.042296 min:0.027225
[bench @ 0x16d871e0] t:0.027275 avg:0.029356 max:0.042296 min:0.027225
[bench @ 0x16d871e0] t:0.027573 avg:0.029244 max:0.042296 min:0.027225
[bench @ 0x16d871e0] t:0.027219 avg:0.029125 max:0.042296 min:0.027219
[bench @ 0x16d871e0] t:0.027392 avg:0.029029 max:0.042296 min:0.027219
[bench @ 0x16d871e0] t:0.027720 avg:0.028960 max:0.042296 min:0.027219
[bench @ 0x16d871e0] t:0.027449 avg:0.028884 max:0.042296 min:0.027219
[bench @ 0x16d871e0] t:0.027473 avg:0.028817 max:0.042296 min:0.027219
[bench @ 0x16d871e0] t:0.027444 avg:0.028755 max:0.042296 min:0.027219
[bench @ 0x16d871e0] t:0.027535 avg:0.028702 max:0.042296 min:0.027219
[bench @ 0x16d871e0] t:0.027607 avg:0.028656 max:0.042296 min:0.027219
[bench @ 0x16d871e0] t:0.027476 avg:0.028609 max:0.042296 min:0.027219

[...]

-- 
Clément B.


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org

[FFmpeg-devel] [PATCH v2] sws/aarch64: add {nv12, nv21, yuv420p, yuv422p}_to_{argb, rgba, abgr, rgba}_neon

2016-02-29 Thread Clément Bœsch
From: Clément Bœsch 

---
Changes since latest version:
- remove unused 32-bit path
- make 16-bit path more accurate by mirroring the MMX code (still not bitexact)
- the code as originally trying to process 2 lines at a time to save chroma pre
  mult computations and avoid re-reading the whole line; for some reason, this
  actually made the code around twice slower, for twice the complexity.
  dropping that complexity was a win-win.
---
 libswscale/aarch64/Makefile   |   3 +
 libswscale/aarch64/swscale_unscaled.c | 132 ++
 libswscale/aarch64/yuv2rgb_neon.S | 207 ++
 libswscale/swscale_internal.h |   1 +
 libswscale/swscale_unscaled.c |   2 +
 5 files changed, 345 insertions(+)
 create mode 100644 libswscale/aarch64/Makefile
 create mode 100644 libswscale/aarch64/swscale_unscaled.c
 create mode 100644 libswscale/aarch64/yuv2rgb_neon.S

diff --git a/libswscale/aarch64/Makefile b/libswscale/aarch64/Makefile
new file mode 100644
index 000..823806e
--- /dev/null
+++ b/libswscale/aarch64/Makefile
@@ -0,0 +1,3 @@
+OBJS+= aarch64/swscale_unscaled.o
+
+NEON-OBJS   += aarch64/yuv2rgb_neon.o
diff --git a/libswscale/aarch64/swscale_unscaled.c 
b/libswscale/aarch64/swscale_unscaled.c
new file mode 100644
index 000..551daad
--- /dev/null
+++ b/libswscale/aarch64/swscale_unscaled.c
@@ -0,0 +1,132 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "config.h"
+#include "libswscale/swscale.h"
+#include "libswscale/swscale_internal.h"
+#include "libavutil/aarch64/cpu.h"
+
+#define YUV_TO_RGB_TABLE   
 \
+c->yuv2rgb_v2r_coeff,  
 \
+c->yuv2rgb_u2g_coeff,  
 \
+c->yuv2rgb_v2g_coeff,  
 \
+c->yuv2rgb_u2b_coeff,  
 \
+
+#define DECLARE_FF_YUVX_TO_RGBX_FUNCS(ifmt, ofmt)  
 \
+int ff_##ifmt##_to_##ofmt##_neon(int w, int h, 
 \
+ uint8_t *dst, int linesize,   
 \
+ const uint8_t *srcY, int linesizeY,   
 \
+ const uint8_t *srcU, int linesizeU,   
 \
+ const uint8_t *srcV, int linesizeV,   
 \
+ const int16_t *table, 
 \
+ int y_offset, 
 \
+ int y_coeff); 
 \
+   
 \
+static int ifmt##_to_##ofmt##_neon_wrapper(SwsContext *c, const uint8_t 
*src[], \
+   int srcStride[], int srcSliceY, int 
srcSliceH,   \
+   uint8_t *dst[], int dstStride[]) {  
 \
+const int16_t yuv2rgb_table[] = { YUV_TO_RGB_TABLE };  
 \
+   
 \
+ff_##ifmt##_to_##ofmt##_neon(c->srcW, srcSliceH,   
 \
+ dst[0] + srcSliceY * dstStride[0], 
dstStride[0],   \
+ src[0], srcStride[0], 
 \
+ src[1], srcStride[1], 
 \
+ src[2], srcStride[2], 
 \
+ yuv2rgb_table,
 \
+ c->yuv2rgb_y_offset >> 6, 
 \
+ c->yuv2rgb_y_coeff);