this patch series splits out the g722_apply_qmf() function so it can be optimized by ARM NEON code
v2 addresses review comments by Timothy and Martin, thanks! it turns out that the efficiency of the C code can be improved quite a bit as well by unrolling :) benchmarking a G722 encode/decode in a loop compiled with gcc 4.8.2 x86-64, Intel i5-2400: 340 ms baseline 300 ms after g722_qmf_apply() unrolling, -11.7% 275 ms after s_zero() unrolling, -19.1% ARM Cortex-A8: 2720 ms baseline 2365 ms after g722_qmf_apply() unrolling, -13.1% 1935 ms after s_zero() unrolling, -28.8% 1850 ms after q722_qmf_apply() in NEON, -32.0% Peter Meerwald (5): g722: Split out g722_qmf_apply() function into g722dsp.c g722: Reduce number of pointers passed to g722_apply_qmf() function g722: Unroll g722_apply_qmf() g722: Split out computation of band->s_zero and unroll code g722: Add ARM NEON implementation for g722_apply_qmf() libavcodec/Makefile | 4 +-- libavcodec/arm/Makefile | 4 +++ libavcodec/arm/g722dsp_init_arm.c | 35 +++++++++++++++++++ libavcodec/arm/g722dsp_neon.S | 66 ++++++++++++++++++++++++++++++++++++ libavcodec/g722.c | 69 +++++++++++++++++-------------------- libavcodec/g722.h | 5 +-- libavcodec/g722dec.c | 11 +++--- libavcodec/g722dsp.c | 71 +++++++++++++++++++++++++++++++++++++++ libavcodec/g722dsp.h | 33 ++++++++++++++++++ libavcodec/g722enc.c | 10 +++--- 10 files changed, 256 insertions(+), 52 deletions(-) create mode 100644 libavcodec/arm/g722dsp_init_arm.c create mode 100644 libavcodec/arm/g722dsp_neon.S create mode 100644 libavcodec/g722dsp.c create mode 100644 libavcodec/g722dsp.h -- 1.9.1 _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
