Hi,

On Sat, Jul 14, 2012 at 9:29 PM, Justin Ruggles
<justin.rugg...@gmail.com> wrote:
> ---
>  libavresample/x86/audio_convert.asm    |   38 
> ++++++++++++++++++++++++++++++++
>  libavresample/x86/audio_convert_init.c |   11 +++++++++
>  2 files changed, 49 insertions(+), 0 deletions(-)
>
> diff --git a/libavresample/x86/audio_convert.asm 
> b/libavresample/x86/audio_convert.asm
> index 9ba7251..70519e1 100644
> --- a/libavresample/x86/audio_convert.asm
> +++ b/libavresample/x86/audio_convert.asm
> @@ -734,3 +734,41 @@ CONV_FLTP_TO_FLT_6CH
>  INIT_XMM avx
>  CONV_FLTP_TO_FLT_6CH
>  %endif
> +
> +;------------------------------------------------------------------------------
> +; void ff_conv_s16_to_s16p_2ch(int16_t *const *dst, int16_t *src, int len,
> +;                              int channels);
> +;------------------------------------------------------------------------------
> +
> +%macro CONV_S16_TO_S16P_2CH 0
> +cglobal conv_s16_to_s16p_2ch, 3,4,3, dst0, src, len, dst1
> +    lea       lenq, [2*lend]
> +    mov      dst1q, [dst0q+gprsize]
> +    mov      dst0q, [dst0q        ]
> +    lea       srcq, [srcq+2*lenq]
> +    add      dst0q, lenq
> +    add      dst1q, lenq
> +    neg       lenq
> +    ALIGN 16
> +.loop:
> +    mova        m0, [srcq+2*lenq       ]
> +    mova        m1, [srcq+2*lenq+mmsize]
> +    pshuflw     m0, m0, q3120
> +    pshufhw     m0, m0, q3120
> +    pshuflw     m1, m1, q3120
> +    pshufhw     m1, m1, q3120
> +    shufps      m2, m0, m1, q2020
> +    shufps      m0, m1, q3131
> +    mova  [dst0q+lenq], m2
> +    mova  [dst1q+lenq], m0

The more common way to do this (I believe) is to set up mask reg:

pcmpeqb m4, m4
psrlw m4, 8 ; 0x00ff

Then mask/shift:

mova m0, [srcq+2*lenq+0*mmsize]
mova m1, [srcq+2*lenq+1*mmsize]
psrlw m2, m0, 8
psrlw m3, m1, 8
pand m0, m4
pand m1, m4
packsswb m0, m1
packsswb m2, m3
mova [dst1q+lenq], m0
mova [dst2q+lenq], m2

However, that's not less instructions, maybe worth checking anyway.

Alternatively, a pshufb version:

mova m3, [pb_02468ace13579bdf]
.loop:
mova m0, [srcq+2*lenq+0*mmsize]
mova m1, [srcq+2*lenq+1*mmsize]
pshufb m0, m3
pshufb m1, m3
punpcklqdq m2, m0, m1
punpckhqdq m0, m1
mova [dst1q+lenq], m2
mova [dst2q+lenq], m0

2 instructions less, and only 2 unpacks as opposed to all the
shuffles, so potentially faster (except on Atom where pshufb is
dog-slow).

Ronald
_______________________________________________
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to