Over the last few months I've been hearing quite a few negative comments about AMD. Seems like most of them are extrapolating from desktop performance.
Keep in mind that it's quite a stretch going from a desktop (single socket, 2 memory channels) to a server (dual socket, 4x the cores, 8 memory channels). Also keep in mind that compilers and kernels can make quite a difference. The vector units have changed significantly (a factor of 2) and the scheduler needs tweaks to account for the various latencies and NUMA related values. Using old kernels/compilers may well significantly impact AMD and/or Intel. I've found the bandwidth and latency mostly controlled by the socket and specifically the number of memory channels. 2, 3, and 4 channel per socket systems have very similar bandwidth and latency for AMD and Intel systems. When taking a pragmatic approach to best price performance I find AMD competitive. Normally I figure out how much ram per CPU is needed, disk needs, then figure out which Intel chip has the best system price/system perf on the relevant applications. Then do similar for AMD. Then buy whichever is better. Often the result is a 15% improvement in one direction or another (HIGHLY application dependent). Of course sometimes a user asks for the "better" system for running a wide variety of floating point codes. In such cases I often use CPU2006 FP rate. In a recent comparison I compared (both perf numbers from HP systems) * AMD 6344, 64GB ram, SpecFPRateBase=333 $2,915, $8.75 per spec * Intel E5-2620, 64GB ram, SpecFPRateBase=322 $2,990, $9.22 per spec Whenever possible I try to use actual applications justifying the purchase of a cluster. When using actual end user applications it's about a 50/50 chance that AMD or Intel will win. I figured I'd add a few comments: * Latency for a quad socket AMD is around 64ns to a random piece of memory (not 600ns as recently mentioned). * AMD quad sockets with 512GB ram start around $9k ($USA) * With OpenMP, pthreads, MPI or other parallel friendly code a quad socket amd can look up random cache line approximately every 2.25ns. (64 threads banging on 16 memory channels at once). * I've seen no problems with the AMD memory system, in general the 2k pin/4 memory bus amd sockets seem to performance similarly to Intel. And example of AMD's bandwidth scaling on a quad socket with 64 cores: http://cse.ucdavis.edu/bill/pstream/bm3-all.png I don't have a similar Intel, but I do have a dual socket e5: http://cse.ucdavis.edu/bill/pstream/e5-2609.png _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
