On Sun, Aug 14, 2011 at 8:29 PM, Loren Merritt <lor...@u.washington.edu> wrote:
> On Sun, 14 Aug 2011, Jason Garrett-Glaser wrote:
>
>> On Sun, Aug 14, 2011 at 3:41 AM, Vitor Sessak <vitor1...@gmail.com> wrote:
>> > On Sun, Aug 14, 2011 at 6:03 AM, Alex Converse <alex.conve...@gmail.com> 
>> > wrote:
>> >> When the 3DNOW version of vector_fmul_add was preferred over SSE the code
>> >> was substantially more complex than it is now. Would someone with an AMD 
>> >> chip
>> >> that supports both SSE and 3DNOW be willing to benchmark them and see 
>> >> which is
>> >> current faster?
>> >
>> > According to /proc/cpuinfo:
>> > model name      : AMD Athlon(tm) 64 X2 Dual Core Processor 5600+
>> >
>> > Using the best result for each of 1000 runs:
>> > 1334000 dezicycles in 3DNOW, 1 runs, 0 skips
>> > 1336460 dezicycles in SSE, 1 runs, 0 skips
>>
>> Are we sure this isn't memory-bound?
>
> Of course it's memory-bound. So the SSE version should be faster on k10.

Changing Alex's test prog to use len = 32 and iters = 128*1024, I have

41944200 dezicycles in SSE, 1 runs, 0 skips
43254770 dezicycles in 3DNOW, 1 runs, 0 skips

which shows a much more significant difference than my previous result.

-Vitor
_______________________________________________
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to