Christophe Gisquet <[email protected]> writes:

> Hi,
>
> the length is always a multiple of 2, so some unrolling can be
> performed. Not all architectures would benefit as much as x86 w/ FP
> math on SSE regs, and this is less efficient that actual SIMD, but the
> results are still interesting nonetheless.
>
> Best regards,
> Christophe
>
> From 9df4e625e2a043b1fab811dfc10fcfa511e9b3dd Mon Sep 17 00:00:00 2001
> From: Christophe GISQUET <[email protected]>
> Date: Wed, 22 Feb 2012 17:48:59 +0100
> Subject: [PATCH 5/6] SBR DSP: unroll sum_square
>
> The length is even, so some unrolling can be performed. Timings are for x86:
> - 32bits: 102c -> 82c
> - 64bits:  82c -> 69c
> ---
>  libavcodec/sbrdsp.c |   15 +++++++++++----
>  1 files changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/libavcodec/sbrdsp.c b/libavcodec/sbrdsp.c
> index 84f92c5..616511e 100644
> --- a/libavcodec/sbrdsp.c
> +++ b/libavcodec/sbrdsp.c
> @@ -35,13 +35,18 @@ static void sbr_sum64x5_c(float *z)
>
>  static float sbr_sum_square_c(float (*x)[2], int n)
>  {
> -    float sum = 0.0f;
> +    float sum0 = 0.0f, sum1 = 0.0f;
>      int i;
>
> -    for (i = 0; i < n; i++)
> -        sum += x[i][0] * x[i][0] + x[i][1] * x[i][1];
> +    for (i = 0; i < n; i+=2)

Spaces around +=, please.

> +    {
> +        sum0 += x[i+0][0] * x[i+0][0];
> +        sum1 += x[i+0][1] * x[i+0][1];
> +        sum0 += x[i+1][0] * x[i+1][0];
> +        sum1 += x[i+1][1] * x[i+1][1];
> +    }
>
> -    return sum;
> +    return sum0+sum1;

Spaces around +, please.

-- 
Måns Rullgård
[email protected]
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to