On Thu, 16 Jun 2011, Justin Ruggles wrote:

On 06/12/2011 04:31 PM, Ronald S. Bultje wrote:

Hi,

On Sat, Jun 11, 2011 at 10:35 AM, Justin Ruggles
<[email protected]> wrote:
---
 libavcodec/dsputil.c            |   17 +++++++
 libavcodec/dsputil.h            |   14 ++++++
 libavcodec/x86/dsputil_mmx.c    |   15 +++++++
 libavcodec/x86/dsputil_yasm.asm |   88 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 134 insertions(+), 0 deletions(-)
[..]
+    CLIPD  m0, m4, m5, m6
+    CLIPD  m1, m4, m5, m6
+    CLIPD  m2, m4, m5, m6
+    CLIPD  m3, m4, m5, m6

For something like Atom (or basically anything with out-of-order
execution), this could be interleaved (i.e. CLIPDx2 m0, m1, m4, m5,
m6). With that changed, looks good to me, feel free to apply.


I tested that on Atom and it doesn't improve speed. But it doesn't hurt
speed either. Should we do it anyway?

Also, unrolling to 32 values per loop on x86-64 does help, so I'll send
an updated patch to do that.

On Atom, you mean? Penryn is indifferent to amount of unrolling here.
Can you unroll with %rep instead of copy/paste?

Also, document the limitations on min/max values due to the float implementation.

--Loren Merritt

_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to