Re: [libav-devel] [PATCH 1/3] ac3enc: add int32_t array clipping function to DSPUtil, including x86 versions.

Loren Merritt Fri, 10 Jun 2011 21:57:40 -0700

On Fri, 10 Jun 2011, Justin Ruggles wrote:

On 06/09/2011 02:33 PM, Loren Merritt wrote:

On Tue, 7 Jun 2011, Justin Ruggles wrote:

+cglobal vector_clip_int32_%1, 5,5,7, dst, src, min, max, len
+    movd      m4, mind
+    movd      m5, maxd
+    SPLATD    m4
+    SPLATD    m5
+%ifidn %1, sse2
+    cvtdq2ps  m4, m4
+    cvtdq2ps  m5, m5
+%endif


%ifidn %1, sse2
     cvtsi2ss  m4, minm
     cvtsi2ss  m5, maxm
%else
     movd      m4, minm
     movd      m5, maxm
%endif
     SPLATD    m4
     SPLATD    m5


why minm/maxm? is loading from memory faster than loading from a
register on x86-32?


mem->xmm is faster than mem->gpr->xmm
Even aside from the fact that mem->xmm is faster than gpr->xmm on AMD.

Ideally you'd also change the prologue to not load them into gpr, and notspill callee-saved regs:

cglobal vector_clip_int32_%1, 3,3,7, dst, src, len, min, max

--Loren Merritt
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 1/3] ac3enc: add int32_t array clipping function to DSPUtil, including x86 versions.

Reply via email to