Hi, On Sat, Jul 14, 2012 at 9:29 PM, Justin Ruggles <justin.rugg...@gmail.com> wrote: > --- > libavresample/x86/audio_convert.asm | 37 > ++++++++++++++++++++++++++++++++ > libavresample/x86/audio_convert_init.c | 9 +++++++ > 2 files changed, 46 insertions(+), 0 deletions(-) > > diff --git a/libavresample/x86/audio_convert.asm > b/libavresample/x86/audio_convert.asm > index ba6cb60..b241542 100644 > --- a/libavresample/x86/audio_convert.asm > +++ b/libavresample/x86/audio_convert.asm > @@ -463,6 +463,43 @@ INIT_XMM avx > CONV_S16P_TO_FLT_6CH > %endif > > +;------------------------------------------------------------------------------ > +; void ff_conv_fltp_to_s16_2ch(int16_t *dst, float *const *src, int len, > +; int channels); > +;------------------------------------------------------------------------------ > + > +%macro CONV_FLTP_TO_S16_2CH 0 > +cglobal conv_fltp_to_s16_2ch, 3,4,3, dst, src0, len, src1 > + lea lenq, [4*lend] > + mov src1q, [src0q+gprsize] > + mov src0q, [src0q ] > + add dstq, lenq > + add src0q, lenq > + add src1q, lenq > + neg lenq > + mova m2, [pf_s16_scale] > + ALIGN 16 > +.loop: > + mulps m0, m2, [src0q+lenq] > + mulps m1, m2, [src1q+lenq] > + cvtps2dq m0, m0 > + cvtps2dq m1, m1 > + packssdw m0, m1 > + movhlps m1, m0 > + punpcklwd m0, m1
You should be able to get slightly better performance (because of smaller dependency chain) by using: packssdw m0, m0 packssdw m1, m1 punpcklwd m0, m1 Please modify it to use that if faster. Otherwise OK. Ronald _______________________________________________ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel