On Jan 11, 2013, at 6:03 AM, Bill Broadley wrote: > > Over the last few months I've been hearing quite a few negative > comments > about AMD. Seems like most of them are extrapolating from desktop > performance. > > Keep in mind that it's quite a stretch going from a desktop (single > socket, 2 memory channels) to a server (dual socket, 4x the cores, 8 > memory channels). >
Bill - a 2 socket system doesn't deliver 512GB ram. Your compare at 2 socket domain doesn't make sense for someone who needs 512GB ram, the performance of 4 socket systems is total different from 2. [snip] > > I figured I'd add a few comments: > * Latency for a quad socket AMD is around 64ns to a random piece > of memory (not 600ns as recently mentioned). I wrote a testprogram for this in 2003. You have no idea what TLB trashing accesses are obviously at the hundreds of gigabyte area. There is 0 cheap systems on the planet where you can get a bunch of random bytes in 64 ns from a random spot out of 500GB of RAM, a memory line you previously hadn't opened yet and which with sureness isn't in the cache yet. You will be looking at 400+ ns latencies bestcase. You won't get it faster at any platform which is affordable (of course 512GB of SRAM would be faster, yet let's not go into theoretic discussions here - as you can't afford 512GB of SRAM). > * AMD quad sockets with 512GB ram start around $9k ($USA) You can easily build one with new components from ebay for $2k. Then add the 512GB ram price to that. New from a shop the AMD stuff is dirt cheap as well, as a single core ain't fast of course of the new bulldozer line, offers fully assembled and everything ready working is around $6k mark - excluding 512GB ram of course. Yet it has better latency to a 512 GB block of RAM than intels 4 socket systems. And that will be many many hundreds of nanoseconds of course. > * With OpenMP, pthreads, MPI or other parallel friendly code a quad > socket amd can look up random cache line approximately every 2.25ns. > (64 threads banging on 16 memory channels at once). You still didn't get the picture of TLB trashing software huh? It reads each time from a random memory location. Only at the end of the calculation the search space converges a tad, but still it's random. A measurement i have from a tad older 8 socket intel box here is 700 ns for similar TLB trashing behaviour. > * I've seen no problems with the AMD memory system, in general > the 2k pin/4 memory bus amd sockets seem to performance similarly > to Intel. For random accesses at a single or 2 sockets there is huge differences (all cores busy). Intel single socket around 90 ns for my benchmark and bulldozer single socket around 150-170 ns ( 8 cores busy). You really have no idea what 'random' reads are. > > And example of AMD's bandwidth scaling on a quad socket with 64 cores: > http://cse.ucdavis.edu/bill/pstream/bm3-all.png > > I don't have a similar Intel, but I do have a dual socket e5: > http://cse.ucdavis.edu/bill/pstream/e5-2609.png > > > > > > > > > > _______________________________________________ > Beowulf mailing list, [email protected] sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
