On 06/12/2011 04:31 PM, Ronald S. Bultje wrote:

> Hi,
> 
> On Sat, Jun 11, 2011 at 10:35 AM, Justin Ruggles
> <[email protected]> wrote:
>> ---
>>  libavcodec/dsputil.c            |   17 +++++++
>>  libavcodec/dsputil.h            |   14 ++++++
>>  libavcodec/x86/dsputil_mmx.c    |   15 +++++++
>>  libavcodec/x86/dsputil_yasm.asm |   88 
>> +++++++++++++++++++++++++++++++++++++++
>>  4 files changed, 134 insertions(+), 0 deletions(-)
> [..]
>> +    CLIPD  m0, m4, m5, m6
>> +    CLIPD  m1, m4, m5, m6
>> +    CLIPD  m2, m4, m5, m6
>> +    CLIPD  m3, m4, m5, m6
> 
> For something like Atom (or basically anything with out-of-order
> execution), this could be interleaved (i.e. CLIPDx2 m0, m1, m4, m5,
> m6). With that changed, looks good to me, feel free to apply.


I tested that on Atom and it doesn't improve speed. But it doesn't hurt
speed either. Should we do it anyway?

Also, unrolling to 32 values per loop on x86-64 does help, so I'll send
an updated patch to do that.

I'm dropping the other 2 patches for now and will send a new patch set
to clip coefficients right after the mdct so that any processing before
exponent extraction will use clipped coefficients. This is particularly
important for stereo rematrixing since the mid/side output of a stereo
pair with an out-of-bounds coeff might be in-bounds, but it will be
out-of-bounds again when reconstructed on the decoder side.

-Justin
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to