On Thu, Jul 21, 2016 at 2:48 AM, Josh de Kock <j...@itanimul.li> wrote: > +cglobal hevc_add_residual_16_8, 3, 5, 7, dst, coeffs, stride > + pxor m0, m0 > + lea r3, [strideq * 3] > + RES_ADD_SSE_16_32_8 0, dstq, dstq + strideq > + RES_ADD_SSE_16_32_8 64, dstq + strideq * 2, dstq + r3 > + mov r4d, 3 > +.loop: > + add coeffsq, 128 > + lea dstq, [dstq + strideq * 4] > + RES_ADD_SSE_16_32_8 0, dstq, dstq + strideq > + RES_ADD_SSE_16_32_8 64, dstq + strideq * 2, dstq + r3 > + dec r4d > + jnz .loop > + RET
You can do all iterations within the loop instead, e.g. something like: mov r4d, 4 .loop: RES_ADD_SSE_16_32_8 0, dstq, dstq + strideq RES_ADD_SSE_16_32_8 64, dstq + strideq * 2, dstq + r3 add coeffsq, 128 lea dstq, [dstq + strideq * 4] dec r4d jnz .loop (the same applies to all other similar functions) _______________________________________________ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel