[Issue 4393] Very good dotProduct

d-bugmail Thu, 28 Apr 2011 21:40:18 -0700

http://d.puremagic.com/issues/show_bug.cgi?id=4393



Don <clugd...@yahoo.com.au> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |clugd...@yahoo.com.au


--- Comment #4 from Don <clugd...@yahoo.com.au> 2011-04-28 21:36:38 PDT ---
Did you test this on Intel, or AMD? Blas1 code is generally limited by memory
access, and AMD has two load ports, so it has different bottlenecks and in
these operations always does better than Intel.

See also a discussion at:
http://www.bealto.com/mp-dot_sse.html
(I've talked to Eric Bealto before, he's happy for Phobos to use any of his
stuff if we see anything we want). It's a bit misleading, though, because above
a certain length, you become dominated by cache effects, so I don't know if
unrolling by 4 is actually worthwhile in practice.

I also have some optimized x87 dot product code (AMD 32 bit CPUs don't have
SSE2, so they still need x87 for doubles).

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

[Issue 4393] Very good dotProduct

Reply via email to