Eugen Leitl wrote: > On Fri, Apr 03, 2009 at 01:32:13PM -0700, Greg Lindahl wrote: > >>> Will have to do with embedded memory or stacked 3d memory a la >>> http://www.cc.gatech.edu/~loh/Papers/isca2008-3Ddram.pdf >> We've been building bigger and bigger SMPs for a long time, making >> changes to improve the memory system as needed. How is multicore any > > Off-die memory bandwidth and latency are limited, so many codes > start running into memory bottlenecks even at moderate number > of cores (quad-cores seem to be a sweet spot).
Limited? Seems pretty constant to me. 10 years ago 1 GB/sec per core/CPU was available on the high end (like the i7 is today). In the last 3 doubles in the threads per socket the memory per thread/core has stayed in the 1 to 3GB/sec per range. Today's I7 is just south of 3GB/sec per thread (8 threads and 22GB/sec per core) So because of the difficulty of using more than 1-3GB/sec with a single thread caused by failures in branch prediction, memory latency, and related that the market isn't willing to pay for more bandwidth since the number of applications that benefit shrinks as the bandwidth / thread ratio grows. In other markets where they have found a value in more bandwidth like GPUs consumers can buy qty 1 video cards with 1GB ram, a GPU, motherboard, and video out with a 160GB/sec memory system for $375. So for the next few doubles in the cores/threads per socket I'd expect CPUs to follow the GPUs, the low hanging fruit would seem to be GDDR5 which has double the bandwidth per pin. Fortunately GPUs are leading the way, they already have the next 3 doubles in bandwidth mapped out for us. Certainly at some point it will require CPU and ram sharing the same die so that you can have a 8 kbit wide memory bus. _______________________________________________ Beowulf mailing list, [email protected] To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
