Re: [libav-devel] [PATCH 1/3] ac3enc: add int32_t array clipping function to DSPUtil, including x86 versions.

Loren Merritt Mon, 20 Jun 2011 04:08:23 -0700

On Sun, 19 Jun 2011, Justin Ruggles wrote:

On 06/19/2011 04:46 AM, Loren Merritt wrote:

I included both "interleaved copies" and "consecutive copies" in
"unrolling", under the prediction that they'd have the same effect. I was
wrong, but I still don't know why. 4x should be plenty to max out ILP
(regardless of whether it gets done by manual interleaving or instruction
reordering), so I don't see what 8x unroll could possibly do other than
have a tiny effect on loop overhead and increase code size.

That said, I can reproduce your result on sandybridge.


sandybridge:

[...]

947 - SSE4.1
907 - SSE4.1 (unroll 8x consecutive)
1094 - SSE4.1 (unroll 8x interleaved)


Did you get that backwards? Interleaved is the one that uses more xmmregs,
consecutive is the one that can be done with %rep.

- Clipping by doing int2float/clip/float2int only benefits athlon64. The
integer-only version is insanely faster on Atom. Are there other CPUs
that might benefit from the float version?


conroe:
6252 mmx
2855 sse2 float, 4x
2844 sse2 float, 8x interleaved
2804 sse2 float, 8x consecutive
3230 sse2 int, 4x
3044 sse2 int, 8x interleaved
3200 sse2 int, 8x consecutive

--Loren Merritt
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 1/3] ac3enc: add int32_t array clipping function to DSPUtil, including x86 versions.

Reply via email to