Re: [petsc-users] log_summary time ratio and flops ratio

Xiangdong Mon, 08 Feb 2016 14:21:53 -0800

Based on what you suggested, I have done the following:

i) rerun the same problem without output. The ratios are still roughly the
same. So it is not the problem of IO.


ii) rerun the program on a supercomputer (Stampede), instead of group
cluster. the MPI_Barrier time got better:

Average time to get PetscTime(): 0
Average time for MPI_Barrier(): 1.27792e-05
Average time for zero size MPI_Send(): 3.94508e-06

the full petsc logsummary is here:
https://googledrive.com/host/0BxEfb1tasJxhTjNTVXh4bmJmWlk

iii) since the time ratios of VecDot (2.5) and MatMult (1.5) are still
high, I rerun the program with ipm module. The IPM summary is here:
https://drive.google.com/file/d/0BxEfb1tasJxhYXI0VkV0cjlLWUU/view?usp=sharing.
>From this IPM reuslts, MPI_Allreduce takes 74% of MPI time. The
communication by task figure (1st figure in p4) in above link showed that
it is not well-balanced. Is this related to the hardware and network (which
the users cannot control) or can I do something on my codes to improve?

Thank you.

Best,
Xiangdong

On Fri, Feb 5, 2016 at 10:34 PM, Barry Smith <[email protected]> wrote:

>
>   Make the same run with no IO and see if the numbers are much better and
> if the load balance is better.
>
> > On Feb 5, 2016, at 8:59 PM, Xiangdong <[email protected]> wrote:
> >
> > If I want to know whether only rank 0 is slow (since it may has more io)
> or actually a portion of cores are slow, what tools can I start with?
> >
> > Thanks.
> >
> > Xiangdong
> >
> > On Fri, Feb 5, 2016 at 5:27 PM, Jed Brown <[email protected]> wrote:
> > Matthew Knepley <[email protected]> writes:
> > >> I attached the full summary. At the end, it has
> > >>
> > >> Average time to get PetscTime(): 0
> > >> Average time for MPI_Barrier(): 8.3971e-05
> > >> Average time for zero size MPI_Send(): 7.16746e-06
> > >>
> > >> Is it an indication of slow network?
> > >>
> > >
> > > I think so. It takes nearly 100 microseconds to synchronize processes.
> >
> > Edison with 65536 processes:
> > Average time for MPI_Barrier(): 4.23908e-05
> > Average time for zero size MPI_Send(): 2.46466e-06
> >
> > Mira with 16384 processes:
> > Average time for MPI_Barrier(): 5.7075e-06
> > Average time for zero size MPI_Send(): 1.33179e-05
> >
> > Titan with 131072 processes:
> > Average time for MPI_Barrier(): 0.000368595
> > Average time for zero size MPI_Send(): 1.71567e-05
> >
>
>

Re: [petsc-users] log_summary time ratio and flops ratio

Reply via email to