Until the Phi's came along, we were purchasing 1RU, 4 sockets nodes with 6276's and 256GB ram. On all our codes, we found the throughput to be greater than any equivalent density Sandy bridge systems (usually 2 x dual socket in 1RU) at about 10-15% less energy and about 1/3 the price for the actual CPU (save a couple thousand $$ per 1RU).
The other interesting point aspect is that their performance is MUCH better than the sandy bridges when over allocate the cores (ie. run >n cpu threads on n cores). We found the sandy bridges performance completely tanked when we did this... the AMD's maintained the same performance (as what you get with n threads). Consequently, we have about 5 racks of these systems (120 nodes). Of course, we are now purchasing Phi's. First 2 racks meant to turn up this week. On Fri, Jan 11, 2013 at 1:03 PM, Bill Broadley <[email protected]> wrote: > > Over the last few months I've been hearing quite a few negative comments > about AMD. Seems like most of them are extrapolating from desktop > performance. > > Keep in mind that it's quite a stretch going from a desktop (single > socket, 2 memory channels) to a server (dual socket, 4x the cores, 8 > memory channels). > > Also keep in mind that compilers and kernels can make quite a > difference. The vector units have changed significantly (a factor of 2) > and the scheduler needs tweaks to account for the various latencies and > NUMA related values. Using old kernels/compilers may well significantly > impact AMD and/or Intel. > > I've found the bandwidth and latency mostly controlled by the socket and > specifically the number of memory channels. 2, 3, and 4 channel per > socket systems have very similar bandwidth and latency for AMD and Intel > systems. > > When taking a pragmatic approach to best price performance I find AMD > competitive. Normally I figure out how much ram per CPU is needed, disk > needs, then figure out which Intel chip has the best system price/system > perf on the relevant applications. Then do similar for AMD. Then buy > whichever is better. Often the result is a 15% improvement in one > direction or another (HIGHLY application dependent). > > Of course sometimes a user asks for the "better" system for running a > wide variety of floating point codes. In such cases I often use CPU2006 > FP rate. > > In a recent comparison I compared (both perf numbers from HP systems) > * AMD 6344, 64GB ram, SpecFPRateBase=333 $2,915, $8.75 per spec > * Intel E5-2620, 64GB ram, SpecFPRateBase=322 $2,990, $9.22 per spec > > Whenever possible I try to use actual applications justifying the > purchase of a cluster. > > When using actual end user applications it's about a 50/50 chance that > AMD or Intel will win. > > I figured I'd add a few comments: > * Latency for a quad socket AMD is around 64ns to a random piece > of memory (not 600ns as recently mentioned). > * AMD quad sockets with 512GB ram start around $9k ($USA) > * With OpenMP, pthreads, MPI or other parallel friendly code a quad > socket amd can look up random cache line approximately every 2.25ns. > (64 threads banging on 16 memory channels at once). > * I've seen no problems with the AMD memory system, in general > the 2k pin/4 memory bus amd sockets seem to performance similarly > to Intel. > > And example of AMD's bandwidth scaling on a quad socket with 64 cores: > http://cse.ucdavis.edu/bill/pstream/bm3-all.png > > I don't have a similar Intel, but I do have a dual socket e5: > http://cse.ucdavis.edu/bill/pstream/e5-2609.png > > > > > > > > > > _______________________________________________ > Beowulf mailing list, [email protected] sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Dr Stuart Midgley [email protected]
_______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
