On Fri, Feb 5, 2016 at 11:32 AM, Xiangdong <[email protected]> wrote:
> > > On Fri, Feb 5, 2016 at 12:18 PM, Matthew Knepley <[email protected]> > wrote: > >> On Fri, Feb 5, 2016 at 11:12 AM, Xiangdong <[email protected]> wrote: >> >>> Hello everyone, >>> >>> When I looked at the log_summary, I found that flops ratio of VecDot is >>> 1.1, but the time ratio is 1.8. Also the flops ratio of MatMult is 1.2, but >>> the time ratio is 1.6. I am using the simple SOR preconditioner on 256 >>> procs. Since each of them consumes about 20% of total time, is it worth >>> looking into these imbalance issue? If yes, how do I know which procs are >>> causing the issue? What tools should I start with? >>> >> >> 1) Always send all the output >> >> 2) You have a large number of dot products, and I am guessing you have a >> slow reduce on this machine (we could see the data at the end of >> log_summary). >> The waiting time comes from slow collective communication, load >> imbalance, system noise, etc. You can see this by looking at the difference >> between >> VecDot() and VecMDot(). For the latter there is enough work to cover >> up the latency, and its imbalance is only 1.1, not 1.8. Thus I think it is >> clear that your >> problem is due to a slow communication network rather than load >> imbalance. >> > > I attached the full summary. At the end, it has > > Average time to get PetscTime(): 0 > Average time for MPI_Barrier(): 8.3971e-05 > Average time for zero size MPI_Send(): 7.16746e-06 > > Is it an indication of slow network? > I think so. It takes nearly 100 microseconds to synchronize processes. Thanks, Matt > >> 3) Thus you would be better served with a stronger preconditioner (more >> flops) which resulted in fewer iterations (less synchronization). >> > > Yes, the preconditioner I am using is simple (just sor). I am also trying > other advanced preconditioners. > >> >> Thanks, >> >> Matt >> >> >>> Thanks. >>> >>> Xiangdong >>> >>> >>> ------------------------------------------------------------------------------------------------------------------------ >>> Event Count Time (sec) Flops >>> --- Global --- --- Stage --- Total >>> Max Ratio Max Ratio Max Ratio Mess Avg len >>> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >>> >>> ------------------------------------------------------------------------------------------------------------------------ >>> >>> --- Event Stage 0: Main Stage >>> >>> VecView 44 1.0 3.9026e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 4.8e+01 1 0 0 0 0 1 0 0 0 0 0 >>> VecDot 817452 1.0 1.5972e+02 1.8 1.51e+10 1.1 0.0e+00 0.0e+00 >>> 8.2e+05 18 3 0 0 40 18 3 0 0 40 22969 >>> VecMDot 613089 1.0 5.8918e+01 1.1 2.27e+10 1.1 0.0e+00 0.0e+00 >>> 6.1e+05 8 4 0 0 30 8 4 0 0 30 93397 >>> VecNorm 618619 1.0 1.4426e+02 2.2 1.14e+10 1.1 0.0e+00 0.0e+00 >>> 6.2e+05 15 2 0 0 30 15 2 0 0 30 19145 >>> VecScale 938 1.0 8.3907e-03 1.2 8.67e+06 1.1 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 250859 >>> VecCopy 8159 1.0 1.0394e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecSet 1884 1.0 3.9701e-02 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecAXPY 1021815 1.0 2.2148e+01 2.1 1.89e+10 1.1 0.0e+00 0.0e+00 >>> 0.0e+00 2 4 0 0 0 2 4 0 0 0 207057 >>> VecAYPX 613089 1.0 6.5360e+00 1.9 1.13e+10 1.1 0.0e+00 0.0e+00 >>> 0.0e+00 1 2 0 0 0 1 2 0 0 0 420336 >>> VecAXPBYCZ 191 1.0 3.7365e-03 1.4 7.06e+06 1.1 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 458830 >>> VecWAXPY 938 1.0 3.8502e-02 1.4 8.67e+06 1.1 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 54669 >>> VecMAXPY 613089 1.0 1.3276e+01 2.2 2.27e+10 1.1 0.0e+00 0.0e+00 >>> 0.0e+00 1 4 0 0 0 1 4 0 0 0 414499 >>> VecLoad 5 1.0 2.0913e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecScatterBegin 820695 1.0 1.8959e+01 2.8 0.00e+00 0.0 1.1e+09 4.6e+03 >>> 0.0e+00 2 0100100 0 2 0100100 0 0 >>> VecScatterEnd 820695 1.0 5.2002e+01 4.2 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 4 0 0 0 0 4 0 0 0 0 0 >>> VecReduceArith 2814 1.0 1.8517e-02 1.2 5.20e+07 1.1 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 681995 >>> VecReduceComm 938 1.0 2.6041e+00 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 9.4e+02 0 0 0 0 0 0 0 0 0 0 0 >>> MatMult 817452 1.0 1.9379e+02 1.6 2.04e+11 1.2 1.0e+09 4.6e+03 >>> 0.0e+00 23 40100100 0 23 40100100 0 253083 >>> MatSOR 818390 1.0 1.9608e+02 1.5 2.00e+11 1.1 0.0e+00 0.0e+00 >>> 0.0e+00 22 40 0 0 0 22 40 0 0 0 247472 >>> MatAssemblyBegin 939 1.0 7.9671e+00 6.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 1.9e+03 1 0 0 0 0 1 0 0 0 0 0 >>> MatAssemblyEnd 939 1.0 2.0570e-01 1.2 0.00e+00 0.0 2.6e+03 1.2e+03 >>> 8.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatZeroEntries 938 1.0 1.8202e-01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> SNESSolve 210 1.0 6.8585e+02 1.0 5.06e+11 1.2 1.1e+09 4.6e+03 >>> 2.1e+06 98100100100100 98100100100100 178411 >>> SNESFunctionEval 2296 1.0 2.0512e+01 1.3 0.00e+00 0.0 2.9e+06 3.5e+03 >>> 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 >>> SNESJacobianEval 938 1.0 2.7680e+01 1.0 0.00e+00 0.0 1.2e+06 3.5e+03 >>> 1.9e+03 4 0 0 0 0 4 0 0 0 0 0 >>> SNESLineSearch 938 1.0 9.2491e+00 1.0 6.07e+07 1.1 1.2e+06 3.5e+03 >>> 9.4e+02 1 0 0 0 0 1 0 0 0 0 1593 >>> KSPSetUp 938 1.0 2.7661e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 2.0e+01 0 0 0 0 0 0 0 0 0 0 0 >>> KSPSolve 938 1.0 6.3433e+02 1.0 5.06e+11 1.2 1.0e+09 4.6e+03 >>> 2.0e+06 91100100100 99 91100100100 99 192852 >>> PCSetUp 938 1.0 2.1005e-04 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> PCApply 818390 1.0 1.9675e+02 1.5 2.00e+11 1.1 0.0e+00 0.0e+00 >>> 0.0e+00 22 40 0 0 0 22 40 0 0 0 246626 >>> >>> ------------------------------------------------------------------------------------------------------------------------ >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener
