Re: [libav-devel] [PATCH 1/3] ac3enc: add int32_t array clipping function to DSPUtil, including x86 versions.

Loren Merritt Sun, 19 Jun 2011 01:47:38 -0700

On Sat, 18 Jun 2011, Justin Ruggles wrote:

On 06/17/2011 09:44 PM, Loren Merritt wrote:

On Thu, 16 Jun 2011, Justin Ruggles wrote:

Also, unrolling to 32 values per loop on x86-64 does help, so I'll send
an updated patch to do that.


On Atom, you mean? Penryn is indifferent to amount of unrolling here.
Can you unroll with %rep instead of copy/paste?


Maybe we're thinking of different things.  I was referring to unrolling
by using more xmm registers for x86-64.  This helps on atom and sandy
bridge, but doesn't seem to have a significant effect on athlon64.  I
also don't see how I could do that cleanly with %rep.

I included both "interleaved copies" and "consecutive copies" in"unrolling", under the prediction that they'd have the same effect. I waswrong, but I still don't know why. 4x should be plenty to max out ILP(regardless of whether it gets done by manual interleaving or instructionreordering), so I don't see what 8x unroll could possibly do other thanhave a tiny effect on loop overhead and increase code size.


That said, I can reproduce your result on sandybridge.

Also, document the limitations on min/max values due to the float
implementation.


Indeed. It's accurate for +/- 1<<24 right?


yes

--Loren Merritt
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 1/3] ac3enc: add int32_t array clipping function to DSPUtil, including x86 versions.

Reply via email to