> On Aug 13, 2015, at 10:34 AM, Justin Chang <[email protected]> wrote:
>
> Hi all,
>
> According to our University's HPC cluster (Intel Xeon E5-2680v2), the online
> specifications says I should have a maximum BW of 59.7 GB/s. I am guessing
> this number is computed by 1866 MHz * 8 Bytes * 4 memory channels.
>
> Now, when I run the STREAMS Triad benchmark on a single compute node (two
> sockets, 10 cores each, 64 GB total memory), on up to 20 processes with
> MPICH, i get the following:
>
> $ mpiexec -n 1 ./MPIVersion:
> Triad: 13448.6701 Rate (MB/s)
>
> $ mpiexec -n 2 ./MPIVersion:
> Triad: 24409.1406 Rate (MB/s)
>
> $ mpiexec -n 4 ./MPIVersion
> Triad: 31914.8087 Rate (MB/s)
> $ mpiexec -n 6 ./MPIVersion
> Triad: 33290.2676 Rate (MB/s)
>
>
> $ mpiexec -n 8 ./MPIVersion
> Triad: 33618.2542 Rate (MB/s)
>
> $ mpiexec -n 10 ./MPIVersion
> Triad: 33730.1662 Rate (MB/s)
>
>
> $ mpiexec -n 12 ./MPIVersion
> Triad: 40835.9440 Rate (MB/s)
>
>
> $ mpiexec -n 14 ./MPIVersion
> Triad: 44396.0042 Rate (MB/s)
>
> $ mpiexec -n 16 ./MPIVersion
> Triad: 54647.5214 Rate (MB/s) *
>
> $ mpiexec -n 18 ./MPIVersion
> Triad: 57530.8125 Rate (MB/s) *
>
> $ mpiexec -n 20 ./MPIVersion
> Triad: 42388.0739 Rate (MB/s) *
>
> The * numbers fluctuate greatly each time I run this.
Yeah, MPICH's default behavior is super annoying. I think they need better
defaults.
> However, if I use hydra's processor binding options:
>
> $ mpiexec.hydra -n 2 -bind-to socket ./MPIVersion
> Triad: 26879.3853 Rate (MB/s)
>
> $ mpiexec.hydra -n 4 -bind-to socket ./MPIVersion
> Triad: 48363.8441 Rate (MB/s)
>
> $ mpiexec.hydra -n 8 -bind-to socket ./MPIVersion
> Triad: 63479.9284 Rate (MB/s)
>
> $ mpiexec.hydra -n 10 -bind-to socket ./MPIVersion
> Triad: 66160.5627 Rate (MB/s)
>
> $ mpiexec.hydra -n 16 -bind-to socket ./MPIVersion
> Triad: 65975.5959 Rate (MB/s)
>
> $ mpiexec.hydra -n 20 -bind-to socket ./MPIVersion
> Triad: 64738.9336 Rate (MB/s)
>
> I get similar metrics when i use the binding options "-bind-to hwthread
> -map-by socket".
>
> Now my question is, is 13.5 GB/s on one processor "good"?
You mean one core.
Yes, that is a good number. These systems are not designed so that a
single core can "saturate" (that is use) all the memory bandwidth of the node.
Note that after about 8 cores you don't see any more improvement because the 8
cores has saturated the memory bandwidth. What this means is that for PETSc
simulations any cores beyond 8 (or so) on the node are just unnecessary
eye-candy.
> Because when I compare this to the 59.7 GB/s it seems really inefficient. Is
> there a way to browse through my system files to confirm this?
>
> Also, when I use multiple cores and with proper binding, the streams BW
> exceeds the reported max BW. Is this expected?
I cannot explain this, look at the exact number of loads and stores needed
for the triad benchmark. Perhaps the online docs are out of date.
Barry
>
> Thanks,
> Justin
>