Re: [libav-devel] [PATCH 1/3] ac3enc: add int32_t array clipping function to DSPUtil, including x86 versions.

Justin Ruggles Sat, 18 Jun 2011 15:02:05 -0700

On 06/17/2011 09:44 PM, Loren Merritt wrote:

> On Thu, 16 Jun 2011, Justin Ruggles wrote:
> 
>> On 06/12/2011 04:31 PM, Ronald S. Bultje wrote:
>>
>>> Hi,
>>>
>>> On Sat, Jun 11, 2011 at 10:35 AM, Justin Ruggles
>>> <[email protected]> wrote:
>>>> ---
>>>>  libavcodec/dsputil.c            |   17 +++++++
>>>>  libavcodec/dsputil.h            |   14 ++++++
>>>>  libavcodec/x86/dsputil_mmx.c    |   15 +++++++
>>>>  libavcodec/x86/dsputil_yasm.asm |   88 
>>>> +++++++++++++++++++++++++++++++++++++++
>>>>  4 files changed, 134 insertions(+), 0 deletions(-)
>>> [..]
>>>> +    CLIPD  m0, m4, m5, m6
>>>> +    CLIPD  m1, m4, m5, m6
>>>> +    CLIPD  m2, m4, m5, m6
>>>> +    CLIPD  m3, m4, m5, m6
>>>
>>> For something like Atom (or basically anything with out-of-order
>>> execution), this could be interleaved (i.e. CLIPDx2 m0, m1, m4, m5,
>>> m6). With that changed, looks good to me, feel free to apply.
>>
>>
>> I tested that on Atom and it doesn't improve speed. But it doesn't hurt
>> speed either. Should we do it anyway?
>>
>> Also, unrolling to 32 values per loop on x86-64 does help, so I'll send
>> an updated patch to do that.
> 
> On Atom, you mean? Penryn is indifferent to amount of unrolling here.
> Can you unroll with %rep instead of copy/paste?


Maybe we're thinking of different things.  I was referring to unrolling
by using more xmm registers for x86-64.  This helps on atom and sandy
bridge, but doesn't seem to have a significant effect on athlon64.  I
also don't see how I could do that cleanly with %rep.

The other thing I guess would be running the load/clip/store twice
before looping, which can of coarse be done simply with %rep.  And that
actually does seem to improve speed slightly on athlon64 but I haven't
tested it yet on other systems.

> Also, document the limitations on min/max values due to the float 
> implementation.


Indeed. It's accurate for +/- 1<<24 right? or is it 1<<25?

-Justin
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 1/3] ac3enc: add int32_t array clipping function to DSPUtil, including x86 versions.

Reply via email to