Based on what you suggested, I have done the following: i) rerun the same problem without output. The ratios are still roughly the same. So it is not the problem of IO.
ii) rerun the program on a supercomputer (Stampede), instead of group cluster. the MPI_Barrier time got better: Average time to get PetscTime(): 0 Average time for MPI_Barrier(): 1.27792e-05 Average time for zero size MPI_Send(): 3.94508e-06 the full petsc logsummary is here: https://googledrive.com/host/0BxEfb1tasJxhTjNTVXh4bmJmWlk iii) since the time ratios of VecDot (2.5) and MatMult (1.5) are still high, I rerun the program with ipm module. The IPM summary is here: https://drive.google.com/file/d/0BxEfb1tasJxhYXI0VkV0cjlLWUU/view?usp=sharing. >From this IPM reuslts, MPI_Allreduce takes 74% of MPI time. The communication by task figure (1st figure in p4) in above link showed that it is not well-balanced. Is this related to the hardware and network (which the users cannot control) or can I do something on my codes to improve? Thank you. Best, Xiangdong On Fri, Feb 5, 2016 at 10:34 PM, Barry Smith <[email protected]> wrote: > > Make the same run with no IO and see if the numbers are much better and > if the load balance is better. > > > On Feb 5, 2016, at 8:59 PM, Xiangdong <[email protected]> wrote: > > > > If I want to know whether only rank 0 is slow (since it may has more io) > or actually a portion of cores are slow, what tools can I start with? > > > > Thanks. > > > > Xiangdong > > > > On Fri, Feb 5, 2016 at 5:27 PM, Jed Brown <[email protected]> wrote: > > Matthew Knepley <[email protected]> writes: > > >> I attached the full summary. At the end, it has > > >> > > >> Average time to get PetscTime(): 0 > > >> Average time for MPI_Barrier(): 8.3971e-05 > > >> Average time for zero size MPI_Send(): 7.16746e-06 > > >> > > >> Is it an indication of slow network? > > >> > > > > > > I think so. It takes nearly 100 microseconds to synchronize processes. > > > > Edison with 65536 processes: > > Average time for MPI_Barrier(): 4.23908e-05 > > Average time for zero size MPI_Send(): 2.46466e-06 > > > > Mira with 16384 processes: > > Average time for MPI_Barrier(): 5.7075e-06 > > Average time for zero size MPI_Send(): 1.33179e-05 > > > > Titan with 131072 processes: > > Average time for MPI_Barrier(): 0.000368595 > > Average time for zero size MPI_Send(): 1.71567e-05 > > > >
