this patch series splits out the g722_apply_qmf() function so it can
be optimized by ARM NEON code

v2 addresses review comments by Timothy and Martin, thanks!

it turns out that the efficiency of the C code can be improved quite a bit
as well by unrolling :)

benchmarking a G722 encode/decode in a loop compiled with gcc 4.8.2

x86-64, Intel i5-2400:
340 ms baseline
300 ms after g722_qmf_apply() unrolling, -11.7%
275 ms after s_zero() unrolling, -19.1%

ARM Cortex-A8:
2720 ms baseline
2365 ms after g722_qmf_apply() unrolling, -13.1%
1935 ms after s_zero() unrolling, -28.8%
1850 ms after q722_qmf_apply() in NEON, -32.0%

Peter Meerwald (5):
  g722: Split out g722_qmf_apply() function into g722dsp.c
  g722: Reduce number of pointers passed to g722_apply_qmf() function
  g722: Unroll g722_apply_qmf()
  g722: Split out computation of band->s_zero and unroll code
  g722: Add ARM NEON implementation for g722_apply_qmf()

 libavcodec/Makefile               |  4 +--
 libavcodec/arm/Makefile           |  4 +++
 libavcodec/arm/g722dsp_init_arm.c | 35 +++++++++++++++++++
 libavcodec/arm/g722dsp_neon.S     | 66 ++++++++++++++++++++++++++++++++++++
 libavcodec/g722.c                 | 69 +++++++++++++++++--------------------
 libavcodec/g722.h                 |  5 +--
 libavcodec/g722dec.c              | 11 +++---
 libavcodec/g722dsp.c              | 71 +++++++++++++++++++++++++++++++++++++++
 libavcodec/g722dsp.h              | 33 ++++++++++++++++++
 libavcodec/g722enc.c              | 10 +++---
 10 files changed, 256 insertions(+), 52 deletions(-)
 create mode 100644 libavcodec/arm/g722dsp_init_arm.c
 create mode 100644 libavcodec/arm/g722dsp_neon.S
 create mode 100644 libavcodec/g722dsp.c
 create mode 100644 libavcodec/g722dsp.h

-- 
1.9.1

_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to