On 06/12/2011 04:31 PM, Ronald S. Bultje wrote: > Hi, > > On Sat, Jun 11, 2011 at 10:35 AM, Justin Ruggles > <[email protected]> wrote: >> --- >> libavcodec/dsputil.c | 17 +++++++ >> libavcodec/dsputil.h | 14 ++++++ >> libavcodec/x86/dsputil_mmx.c | 15 +++++++ >> libavcodec/x86/dsputil_yasm.asm | 88 >> +++++++++++++++++++++++++++++++++++++++ >> 4 files changed, 134 insertions(+), 0 deletions(-) > [..] >> + CLIPD m0, m4, m5, m6 >> + CLIPD m1, m4, m5, m6 >> + CLIPD m2, m4, m5, m6 >> + CLIPD m3, m4, m5, m6 > > For something like Atom (or basically anything with out-of-order > execution), this could be interleaved (i.e. CLIPDx2 m0, m1, m4, m5, > m6). With that changed, looks good to me, feel free to apply.
I tested that on Atom and it doesn't improve speed. But it doesn't hurt speed either. Should we do it anyway? Also, unrolling to 32 values per loop on x86-64 does help, so I'll send an updated patch to do that. I'm dropping the other 2 patches for now and will send a new patch set to clip coefficients right after the mdct so that any processing before exponent extraction will use clipped coefficients. This is particularly important for stereo rematrixing since the mid/side output of a stereo pair with an out-of-bounds coeff might be in-bounds, but it will be out-of-bounds again when reconstructed on the decoder side. -Justin _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
