On Thu, Jul 21, 2016 at 2:48 AM, Josh de Kock <j...@itanimul.li> wrote:
> +cglobal hevc_add_residual_16_8, 3, 5, 7, dst, coeffs, stride
> +    pxor                m0, m0
> +    lea                 r3, [strideq * 3]
> +    RES_ADD_SSE_16_32_8  0, dstq, dstq + strideq
> +    RES_ADD_SSE_16_32_8 64, dstq + strideq * 2, dstq + r3
> +    mov r4d, 3
> +.loop:
> +    add            coeffsq, 128
> +    lea               dstq, [dstq + strideq * 4]
> +    RES_ADD_SSE_16_32_8  0, dstq, dstq + strideq
> +    RES_ADD_SSE_16_32_8 64, dstq + strideq * 2, dstq + r3
> +    dec r4d
> +    jnz .loop
> +    RET

You can do all iterations within the loop instead, e.g. something like:

    mov r4d, 4
.loop:
    RES_ADD_SSE_16_32_8  0, dstq, dstq + strideq
    RES_ADD_SSE_16_32_8 64, dstq + strideq * 2, dstq + r3
    add            coeffsq, 128
    lea               dstq, [dstq + strideq * 4]
    dec r4d
    jnz .loop

(the same applies to all other similar functions)
_______________________________________________
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to