On Sun, Aug 14, 2011 at 8:29 PM, Loren Merritt <lor...@u.washington.edu> wrote: > On Sun, 14 Aug 2011, Jason Garrett-Glaser wrote: > >> On Sun, Aug 14, 2011 at 3:41 AM, Vitor Sessak <vitor1...@gmail.com> wrote: >> > On Sun, Aug 14, 2011 at 6:03 AM, Alex Converse <alex.conve...@gmail.com> >> > wrote: >> >> When the 3DNOW version of vector_fmul_add was preferred over SSE the code >> >> was substantially more complex than it is now. Would someone with an AMD >> >> chip >> >> that supports both SSE and 3DNOW be willing to benchmark them and see >> >> which is >> >> current faster? >> > >> > According to /proc/cpuinfo: >> > model name : AMD Athlon(tm) 64 X2 Dual Core Processor 5600+ >> > >> > Using the best result for each of 1000 runs: >> > 1334000 dezicycles in 3DNOW, 1 runs, 0 skips >> > 1336460 dezicycles in SSE, 1 runs, 0 skips >> >> Are we sure this isn't memory-bound? > > Of course it's memory-bound. So the SSE version should be faster on k10.
Changing Alex's test prog to use len = 32 and iters = 128*1024, I have 41944200 dezicycles in SSE, 1 runs, 0 skips 43254770 dezicycles in 3DNOW, 1 runs, 0 skips which shows a much more significant difference than my previous result. -Vitor _______________________________________________ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel