On 2015-02-02 11:49:30 +0100, Peter Meerwald wrote:
> Signed-off-by: Peter Meerwald <[email protected]>
> 
> ---
> 
> v2:
> drop unnecessary constants (yes, they were for as asm version of s_zero which 
> turned out not worthwhile) (Martin)
> fix NEON register clobbering, use d16 instead of d8 as per AAPCS, §5.1.2.1 
> (Martin)
> remove trailing whitespace (Martin)
> ---
>  libavcodec/arm/Makefile           |  4 +++
>  libavcodec/arm/g722dsp_init_arm.c | 35 +++++++++++++++++++++
>  libavcodec/arm/g722dsp_neon.S     | 66 
> +++++++++++++++++++++++++++++++++++++++
>  libavcodec/g722dsp.c              |  3 ++
>  libavcodec/g722dsp.h              |  1 +
>  5 files changed, 109 insertions(+)
>  create mode 100644 libavcodec/arm/g722dsp_init_arm.c
>  create mode 100644 libavcodec/arm/g722dsp_neon.S
> 
> diff --git a/libavcodec/arm/Makefile b/libavcodec/arm/Makefile
> index 6cbb0b9..8435f86 100644
> --- a/libavcodec/arm/Makefile
> +++ b/libavcodec/arm/Makefile
> @@ -35,6 +35,10 @@ OBJS-$(CONFIG_APE_DECODER)             += 
> arm/apedsp_init_arm.o
>  OBJS-$(CONFIG_DCA_DECODER)             += arm/dcadsp_init_arm.o
>  OBJS-$(CONFIG_FLAC_DECODER)            += arm/flacdsp_init_arm.o        \
>                                            arm/flacdsp_arm.o
> +OBJS-$(CONFIG_ADPCM_G722_DECODER)      += arm/g722dsp_init_arm.o        \
> +                                          arm/g722dsp_neon.o
> +OBJS-$(CONFIG_ADPCM_G722_ENCODER)      += arm/g722dsp_init_arm.o        \
> +                                          arm/g722dsp_neon.o
>  OBJS-$(CONFIG_MLP_DECODER)             += arm/mlpdsp_init_arm.o
>  OBJS-$(CONFIG_VC1_DECODER)             += arm/vc1dsp_init_arm.o
>  OBJS-$(CONFIG_VORBIS_DECODER)          += arm/vorbisdsp_init_arm.o

...

> diff --git a/libavcodec/arm/g722dsp_neon.S b/libavcodec/arm/g722dsp_neon.S
> new file mode 100644
> index 0000000..64b812c
> --- /dev/null
> +++ b/libavcodec/arm/g722dsp_neon.S
> @@ -0,0 +1,66 @@
> +/*
> + * ARM NEON optimised DSP functions for G722 coding
> + * Copyright (c) 2015 Peter Meerwald <[email protected]>
> + *
> + * This file is part of Libav.
> + *
> + * Libav is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * Libav is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + */
> +
> +#include "libavutil/arm/asm.S"
> +
> +function ff_g722_apply_qmf_neon, export=1, align=4
> +        movrel          r3, qmf_coeffs
> +        vld1.s16        {d2,d3,d4}, [r0]! /* load prev_samples */

The input is not guaranteed to be aligned?

> +        vld1.s16        {d16,d17,d18}, [r3,:64]! /* load qmf_coeffs */

it looks a little bit odd to load 2 times 3 64-bit registers. If you 
were to load 3 times 2 64-bit registers (or first 4 64-bit registers and 
then 2) you could use the 16-byte alignemt of the constants.

> +        vmull.s16       q0, d2, d16
> +        vmlal.s16       q0, d3, d17

it might be faster to accumumate in two registers and add the results at 
the end.

> +        vmlal.s16       q0, d4, d18
> +
> +        vld1.s16        {d5,d6,d7}, [r0]! /* load prev_samples */
> +        vld1.s16        {d19,d20,d21}, [r3,:64]! /* load qmf_coeffs */
> +        vmlal.s16       q0, d5, d19
> +        vmlal.s16       q0, d6, d20
> +        vmlal.s16       q0, d7, d21
> +
> +        vadd.s32        d0, d1, d0
> +        vrev64.32       d0, d0
> +        vst1.s32        {d0}, [r1]

no alignment? it might be faster then to avoid the vrev64 and store each
s32 individually.

Janne
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to