>
> Even if it does fit in memory I suspect memory bandwidth is more important
> than clock cycles.


http://people.freebsd.org/~lstewart/articles/cpumemory.pdf

This paper is old but the ratios should still be pretty accurate.  Main
memory is 240 clock cycles away and L1d is only 3.  If the experiments in
this paper still hold true loading the 8K block into L1d is far more
expensive than the CPU processing done once the block is in cache.

When one adds in NUMA to the contention on this shared resource, its not
that hard for a 40 core machine to starve for memory bandwidth, and for
cores to sit idle waiting for main memory.  Eliminating wasted space seems
far more important even when everything could fit in memory already.

Reply via email to