On 2015-02-02 11:49:30 +0100, Peter Meerwald wrote: > Signed-off-by: Peter Meerwald <[email protected]> > > --- > > v2: > drop unnecessary constants (yes, they were for as asm version of s_zero which > turned out not worthwhile) (Martin) > fix NEON register clobbering, use d16 instead of d8 as per AAPCS, §5.1.2.1 > (Martin) > remove trailing whitespace (Martin) > --- > libavcodec/arm/Makefile | 4 +++ > libavcodec/arm/g722dsp_init_arm.c | 35 +++++++++++++++++++++ > libavcodec/arm/g722dsp_neon.S | 66 > +++++++++++++++++++++++++++++++++++++++ > libavcodec/g722dsp.c | 3 ++ > libavcodec/g722dsp.h | 1 + > 5 files changed, 109 insertions(+) > create mode 100644 libavcodec/arm/g722dsp_init_arm.c > create mode 100644 libavcodec/arm/g722dsp_neon.S > > diff --git a/libavcodec/arm/Makefile b/libavcodec/arm/Makefile > index 6cbb0b9..8435f86 100644 > --- a/libavcodec/arm/Makefile > +++ b/libavcodec/arm/Makefile > @@ -35,6 +35,10 @@ OBJS-$(CONFIG_APE_DECODER) += > arm/apedsp_init_arm.o > OBJS-$(CONFIG_DCA_DECODER) += arm/dcadsp_init_arm.o > OBJS-$(CONFIG_FLAC_DECODER) += arm/flacdsp_init_arm.o \ > arm/flacdsp_arm.o > +OBJS-$(CONFIG_ADPCM_G722_DECODER) += arm/g722dsp_init_arm.o \ > + arm/g722dsp_neon.o > +OBJS-$(CONFIG_ADPCM_G722_ENCODER) += arm/g722dsp_init_arm.o \ > + arm/g722dsp_neon.o > OBJS-$(CONFIG_MLP_DECODER) += arm/mlpdsp_init_arm.o > OBJS-$(CONFIG_VC1_DECODER) += arm/vc1dsp_init_arm.o > OBJS-$(CONFIG_VORBIS_DECODER) += arm/vorbisdsp_init_arm.o
... > diff --git a/libavcodec/arm/g722dsp_neon.S b/libavcodec/arm/g722dsp_neon.S > new file mode 100644 > index 0000000..64b812c > --- /dev/null > +++ b/libavcodec/arm/g722dsp_neon.S > @@ -0,0 +1,66 @@ > +/* > + * ARM NEON optimised DSP functions for G722 coding > + * Copyright (c) 2015 Peter Meerwald <[email protected]> > + * > + * This file is part of Libav. > + * > + * Libav is free software; you can redistribute it and/or > + * modify it under the terms of the GNU Lesser General Public > + * License as published by the Free Software Foundation; either > + * version 2.1 of the License, or (at your option) any later version. > + * > + * Libav is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * Lesser General Public License for more details. > + */ > + > +#include "libavutil/arm/asm.S" > + > +function ff_g722_apply_qmf_neon, export=1, align=4 > + movrel r3, qmf_coeffs > + vld1.s16 {d2,d3,d4}, [r0]! /* load prev_samples */ The input is not guaranteed to be aligned? > + vld1.s16 {d16,d17,d18}, [r3,:64]! /* load qmf_coeffs */ it looks a little bit odd to load 2 times 3 64-bit registers. If you were to load 3 times 2 64-bit registers (or first 4 64-bit registers and then 2) you could use the 16-byte alignemt of the constants. > + vmull.s16 q0, d2, d16 > + vmlal.s16 q0, d3, d17 it might be faster to accumumate in two registers and add the results at the end. > + vmlal.s16 q0, d4, d18 > + > + vld1.s16 {d5,d6,d7}, [r0]! /* load prev_samples */ > + vld1.s16 {d19,d20,d21}, [r3,:64]! /* load qmf_coeffs */ > + vmlal.s16 q0, d5, d19 > + vmlal.s16 q0, d6, d20 > + vmlal.s16 q0, d7, d21 > + > + vadd.s32 d0, d1, d0 > + vrev64.32 d0, d0 > + vst1.s32 {d0}, [r1] no alignment? it might be faster then to avoid the vrev64 and store each s32 individually. Janne _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
