On Wed, 22 Dec 2010, Yongjun Chen wrote: > On Wed, Dec 22, 2010 at 5:40 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > > Processors: 4 CPUS * 4Cores/CPU, with each core 2500MHz > > > > > > Memories: 16 *2 GB DDR2 333 MHz, dual channel, data width 64 bit, so the > > memory Bandwidth for 2 memories is 64/8*166*2*2=5.4GB/s. > > > > Wait a minute. You have 16 cores that share 5.4 GB/s???? This is not > > enough for iterative solvers, in fact this is absolutely terrible for > > iterative solvers. You really want 5.4 GB/s PER core! This machine is > > absolutely inappropriate for iterative solvers. No package can give you good > > speedups on this machine. > > Barry, there are 16 memories, every 2 memories make up one dual channel, > thus in this machine there are 8 dual channel, each dual channel has the > memory bandwidth 5.4GB/s. What hardware is this? [processor/chipset?] >From what you say - it looks like each chip has 4cores, and 2 dual-channel memory controllers for each of them. The question is - does the hardware provide scalable memory-bandwidth per core? Most machines don't. I.e the same 5.4*2GB/s is avilable for 1 core run as well as the 4 core run. So if the algorithm is able to use 5.4GB/s [or more] for 1 threads, 10.8 [or more] for 2 threads - you would just see scalable performance from 1 to 2, and 3, 4 would perhaps be slightly incremental to the 2-core performance. Satish
