On Fri, 10 Jun 2011, Justin Ruggles wrote:
On 06/09/2011 02:33 PM, Loren Merritt wrote:
On Tue, 7 Jun 2011, Justin Ruggles wrote:
+cglobal vector_clip_int32_%1, 5,5,7, dst, src, min, max, len
+ movd m4, mind
+ movd m5, maxd
+ SPLATD m4
+ SPLATD m5
+%ifidn %1, sse2
+ cvtdq2ps m4, m4
+ cvtdq2ps m5, m5
+%endif
%ifidn %1, sse2
cvtsi2ss m4, minm
cvtsi2ss m5, maxm
%else
movd m4, minm
movd m5, maxm
%endif
SPLATD m4
SPLATD m5
why minm/maxm? is loading from memory faster than loading from a
register on x86-32?
mem->xmm is faster than mem->gpr->xmm
Even aside from the fact that mem->xmm is faster than gpr->xmm on AMD.
Ideally you'd also change the prologue to not load them into gpr, and not
spill callee-saved regs:
cglobal vector_clip_int32_%1, 3,3,7, dst, src, len, min, max
--Loren Merritt
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel