Xiangdong <[email protected]> writes: >> VecAXPY 1021815 1.0 2.2148e+01 2.1 1.89e+10 1.1 0.0e+00 0.0e+00 >> 0.0e+00 2 4 0 0 0 2 4 0 0 0 207057 >> VecMAXPY 613089 1.0 1.3276e+01 2.2 2.27e+10 1.1 0.0e+00 0.0e+00 >> 0.0e+00 1 4 0 0 0 1 4 0 0 0 414499 >> MatSOR 818390 1.0 1.9608e+02 1.5 2.00e+11 1.1 0.0e+00 0.0e+00 >> 0.0e+00 22 40 0 0 0 22 40 0 0 0 247472 >> >> > The result above is from a run with 256 cores (16 nodes * 16 cores/node). I > did another run with 64 nodes * 4 cores/node. Now these functions are much > better balanced ( a factor of 1.2-1.3, instead of 1.5-2.1). > > VecAXPY 987215 1.0 6.8469e+00 1.3 1.82e+10 1.1 0.0e+00 0.0e+00 > 0.0e+00 1 4 0 0 0 1 4 0 0 0 647096 > VecMAXPY 592329 1.0 6.0866e+00 1.3 2.19e+10 1.1 0.0e+00 0.0e+00 > 0.0e+00 1 4 0 0 0 1 4 0 0 0 873511 > MatSOR 790717 1.0 1.2933e+02 1.2 1.93e+11 1.1 0.0e+00 0.0e+00 > 0.0e+00 24 40 0 0 0 24 40 0 0 0 362525
So it's significantly faster in addition to being more balanced. I would attribute that to memory bandwidth. > For the functions requires communication, the time ratio is about (1.4-1.6) > VecDot 789772 1.0 8.4804e+01 1.4 1.46e+10 1.1 0.0e+00 0.0e+00 > 7.9e+05 14 3 0 0 40 14 3 0 0 40 41794 > VecNorm 597914 1.0 7.6259e+01 1.6 1.10e+10 1.1 0.0e+00 0.0e+00 > 6.0e+05 12 2 0 0 30 12 2 0 0 30 34996 > > The full logsummary for this new run is here: > https://googledrive.com/host/0BxEfb1tasJxhVkZ2NHJkSmF4LUU > > Can we say now the load imbalance is from the network communication, > instead of memory bandwidth? It is expected that synchronizing functions like these have higher "load imbalance", but it doesn't necessarily mean the network is running at different speeds for different nodes or some such. Rather, you've accumulated load imbalance over previous operations and now you have to wait for the slowest process before anyone can continue. So now the process that was fastest before logs the longest time for the Norm or Dot. I see 100µs per VecDot above, which is reasonable. If you get more exact load balance in the local computation, you might be able to improve it a bit.
signature.asc
Description: PGP signature
