On 30/01/2011 10:26 AM, martyn.w...@stfc.ac.uk wrote:

-----Original Message-----
From: gmx-users-boun...@gromacs.org [mailto:gmx-users-
boun...@gromacs.org] On Behalf Of Mark Abraham
Sent: 29 January 2011 08:24
To: Discussion list for GROMACS users
Subject: Re: [gmx-users] Simulation time losses with REMD

On 28/01/2011 4:46 PM, Mark Abraham wrote:
Hi,

I compared the .log file time accounting for same .tpr file run alone
in serial or as part of an REMD simulation (with each replica on a
single proessor). It ran about 5-10% slower in the latter. The effect
was a bit larger when comparing the same .tpr on 8 processors with
REMD with 8 processers per replica. The effect seems fairly
independent of whether I compare the lowest or highest replica.
OK I found the issue by binary-searching the code looking for the
offending line. It's in compute_globals() in src/kernel/md.c. The call
to gmx_sum_sim consumes all the extra time. This code is taking care of
synchronization for possibly doing checkpointing.

                  if (MULTISIM(cr)&&  bInterSimGS)
                  {
                      if (MASTER(cr))
                      {
                          /* Communicate the signals between the
simulations */
                          gmx_sum_sim(eglsNR,gs_buf,cr->ms);
                      }
                      /* Communicate the signals form the master to the
others */
                      gmx_bcast(eglsNR*sizeof(gs_buf[0]),gs_buf,cr);
                  }

This eventually calls

void gmx_sumf_comm(int nr,float r[],MPI_Comm mpi_comm)
{
#if defined(MPI_IN_PLACE_EXISTS) || defined(GMX_THREADS)
      MPI_Allreduce(MPI_IN_PLACE,r,nr,MPI_FLOAT,MPI_SUM,mpi_comm);
#else
      /* this function is only used in code that is not performance
critical,
         (during setup, when comm_rec is not the appropriate
communication
         structure), so this isn't as bad as it looks. */
      float *buf;
      int i;

      snew(buf, nr);
      MPI_Allreduce(r,buf,nr,MPI_FLOAT,MPI_SUM,mpi_comm);
      for(i=0; i<nr; i++)
          r[i] = buf[i];
      sfree(buf);
#endif
}

Clearly the comment is out of date. My nstlist=5, repl_ex_nst=2500 and
nstcalcenergy=-1, so that triggers gs.nstms=5 and so bInterSimGS is
TRUE
every 5 steps. I'm not sure whether the problem is with nstlist, or the
multi-simulation checkpointing engineering, or what.

Mark
So are you saying that this code itself is slow (and called frequently), or 
this is showing the latency in synchronising replicas? If the latter, then 
presumably if you comment this out (or adjust nstlist or whatever), then it 
will just defer to the latency to the REMD call itself?
(I'll check my own example in due course, but our systems happen to be down 
this weekend.)

I've already controlled for the REMD cost and latency. The issue is what is causing the extra delay.

I've worked out what the issue is, and I'll move this thread to a Redmine issue - http://redmine.gromacs.org/issues/691

Mark
--
gmx-users mailing list    gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

Reply via email to