Hi, On Sat, Jul 14, 2012 at 9:29 PM, Justin Ruggles <justin.rugg...@gmail.com> wrote: > --- > libavresample/x86/audio_convert.asm | 38 > ++++++++++++++++++++++++++++++++ > libavresample/x86/audio_convert_init.c | 11 +++++++++ > 2 files changed, 49 insertions(+), 0 deletions(-) > > diff --git a/libavresample/x86/audio_convert.asm > b/libavresample/x86/audio_convert.asm > index 9ba7251..70519e1 100644 > --- a/libavresample/x86/audio_convert.asm > +++ b/libavresample/x86/audio_convert.asm > @@ -734,3 +734,41 @@ CONV_FLTP_TO_FLT_6CH > INIT_XMM avx > CONV_FLTP_TO_FLT_6CH > %endif > + > +;------------------------------------------------------------------------------ > +; void ff_conv_s16_to_s16p_2ch(int16_t *const *dst, int16_t *src, int len, > +; int channels); > +;------------------------------------------------------------------------------ > + > +%macro CONV_S16_TO_S16P_2CH 0 > +cglobal conv_s16_to_s16p_2ch, 3,4,3, dst0, src, len, dst1 > + lea lenq, [2*lend] > + mov dst1q, [dst0q+gprsize] > + mov dst0q, [dst0q ] > + lea srcq, [srcq+2*lenq] > + add dst0q, lenq > + add dst1q, lenq > + neg lenq > + ALIGN 16 > +.loop: > + mova m0, [srcq+2*lenq ] > + mova m1, [srcq+2*lenq+mmsize] > + pshuflw m0, m0, q3120 > + pshufhw m0, m0, q3120 > + pshuflw m1, m1, q3120 > + pshufhw m1, m1, q3120 > + shufps m2, m0, m1, q2020 > + shufps m0, m1, q3131 > + mova [dst0q+lenq], m2 > + mova [dst1q+lenq], m0
The more common way to do this (I believe) is to set up mask reg: pcmpeqb m4, m4 psrlw m4, 8 ; 0x00ff Then mask/shift: mova m0, [srcq+2*lenq+0*mmsize] mova m1, [srcq+2*lenq+1*mmsize] psrlw m2, m0, 8 psrlw m3, m1, 8 pand m0, m4 pand m1, m4 packsswb m0, m1 packsswb m2, m3 mova [dst1q+lenq], m0 mova [dst2q+lenq], m2 However, that's not less instructions, maybe worth checking anyway. Alternatively, a pshufb version: mova m3, [pb_02468ace13579bdf] .loop: mova m0, [srcq+2*lenq+0*mmsize] mova m1, [srcq+2*lenq+1*mmsize] pshufb m0, m3 pshufb m1, m3 punpcklqdq m2, m0, m1 punpckhqdq m0, m1 mova [dst1q+lenq], m2 mova [dst2q+lenq], m0 2 instructions less, and only 2 unpacks as opposed to all the shuffles, so potentially faster (except on Atom where pshufb is dog-slow). Ronald _______________________________________________ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel