Hi,

On Sat, Jul 14, 2012 at 9:29 PM, Justin Ruggles
<justin.rugg...@gmail.com> wrote:
> ---
>  libavresample/x86/audio_convert.asm    |   37 
> ++++++++++++++++++++++++++++++++
>  libavresample/x86/audio_convert_init.c |    9 +++++++
>  2 files changed, 46 insertions(+), 0 deletions(-)
>
> diff --git a/libavresample/x86/audio_convert.asm 
> b/libavresample/x86/audio_convert.asm
> index ba6cb60..b241542 100644
> --- a/libavresample/x86/audio_convert.asm
> +++ b/libavresample/x86/audio_convert.asm
> @@ -463,6 +463,43 @@ INIT_XMM avx
>  CONV_S16P_TO_FLT_6CH
>  %endif
>
> +;------------------------------------------------------------------------------
> +; void ff_conv_fltp_to_s16_2ch(int16_t *dst, float *const *src, int len,
> +;                              int channels);
> +;------------------------------------------------------------------------------
> +
> +%macro CONV_FLTP_TO_S16_2CH 0
> +cglobal conv_fltp_to_s16_2ch, 3,4,3, dst, src0, len, src1
> +    lea      lenq, [4*lend]
> +    mov     src1q, [src0q+gprsize]
> +    mov     src0q, [src0q        ]
> +    add      dstq, lenq
> +    add     src0q, lenq
> +    add     src1q, lenq
> +    neg      lenq
> +    mova       m2, [pf_s16_scale]
> +    ALIGN 16
> +.loop:
> +    mulps      m0, m2, [src0q+lenq]
> +    mulps      m1, m2, [src1q+lenq]
> +    cvtps2dq   m0, m0
> +    cvtps2dq   m1, m1
> +    packssdw   m0, m1
> +    movhlps    m1, m0
> +    punpcklwd  m0, m1

You should be able to get slightly better performance (because of
smaller dependency chain) by using:

packssdw m0, m0
packssdw m1, m1
punpcklwd m0, m1

Please modify it to use that if faster. Otherwise OK.

Ronald
_______________________________________________
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to