On Fri, 10 Jun 2011, Justin Ruggles wrote:
On 06/09/2011 02:33 PM, Loren Merritt wrote:
On Tue, 7 Jun 2011, Justin Ruggles wrote:

+cglobal vector_clip_int32_%1, 5,5,7, dst, src, min, max, len
+    movd      m4, mind
+    movd      m5, maxd
+    SPLATD    m4
+    SPLATD    m5
+%ifidn %1, sse2
+    cvtdq2ps  m4, m4
+    cvtdq2ps  m5, m5
+%endif

%ifidn %1, sse2
     cvtsi2ss  m4, minm
     cvtsi2ss  m5, maxm
%else
     movd      m4, minm
     movd      m5, maxm
%endif
     SPLATD    m4
     SPLATD    m5

why minm/maxm? is loading from memory faster than loading from a
register on x86-32?

mem->xmm is faster than mem->gpr->xmm
Even aside from the fact that mem->xmm is faster than gpr->xmm on AMD.
Ideally you'd also change the prologue to not load them into gpr, and not spill callee-saved regs:
cglobal vector_clip_int32_%1, 3,3,7, dst, src, len, min, max

--Loren Merritt
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to