Hi Szilárd,

Thanks for the suggestion on removing that separate pme rank: 113 ns/day instead of 90 ns/day. ;) This is running on pretty much a piece of garbage and this performance is vs 320 ns/day on a much more powerful box with four GPUs.

I am fine with the general concept of ranks being units of execution, what I am not comfortable with is how one selects e.g. number of threads per rank, depending on the system size, or the use (or non-use) of a separate pme rank, e.g. let's say I make a system that's 3-4 times larger in XY. Do I keep all the mdrun scripts as is, do I go through tuning for every new system?

Some sort of a guideline with examples would be nice, or some automation on the mdrun side, or maybe a webform that asks questions about system size, number of GPUs/CPU cores and spits out a starting point for the "optimal" set of mdrun keys. I am mostly learning this by varying things (e.g. offloading or not offloading pmefft, or trying your suggestions, for instance). A deep understanding, however, is lacking.

Alex



On Thu, Jun 21, 2018 at 10:02 AM, Szilárd Páll<pall.szil...@gmail.com <mailto:pall.szil...@gmail.com>>wrote:

   On Mon, Jun 18, 2018 at 11:35 PM Alex <nedoma...@gmail.com
   <mailto:nedoma...@gmail.com>> wrote:

    > Persistence is enabled so I don't have to overclock again.


   Sure, makes sense. Note that strictly speaking this is not an
   "overclock",
   but a manual "boost clock" (to use terminology CPU vendors use).
   Consumer
   GPUs automatically scale their clock speeds above their nominal/base
   clock
   (just as CPUs do), but Tesla GPUs don't do that but rather give the
   option
   on the user (or put the burden if we want to look at it differently).


    > To be honest, I
    > am still not entirely comfortable with the notion of ranks, after
   reading
    > the acceleration document a bunch of times.


   Feel free to ask if you need clarification.
   Briefly: ranks are the execution units, typically MPI processes,
   that tasks
   get assigned to when decomposing work across multiple compute units
   (nodes,
   processors). In general, data or tasks can be decomposed (also called
   data-/task-parallelization), and GROMACS does employ both, the
   former for
   the spatial domain decomposition, the latter for offloading PME work
   to a
   subset of the ranks.


    > Parts of log file below and I
    > will obviously appreciate suggestions/clarifications:
    >

   In the future, please share the full log by uploading it somewhere.


On 6/21/2018 10:02 AM, Szilárd Páll wrote:
On Mon, Jun 18, 2018 at 11:35 PM Alex <nedoma...@gmail.com> wrote:

Persistence is enabled so I don't have to overclock again.

Sure, makes sense. Note that strictly speaking this is not an "overclock",
but a manual "boost clock" (to use terminology CPU vendors use). Consumer
GPUs automatically scale their clock speeds above their nominal/base clock
(just as CPUs do), but Tesla GPUs don't do that but rather give the option
on the user (or put the burden if we want to look at it differently).


To be honest, I
am still not entirely comfortable with the notion of ranks, after reading
the acceleration document a bunch of times.

Feel free to ask if you need clarification.
Briefly: ranks are the execution units, typically MPI processes, that tasks
get assigned to when decomposing work across multiple compute units (nodes,
processors). In general, data or tasks can be decomposed (also called
data-/task-parallelization), and GROMACS does employ both, the former for
the spatial domain decomposition, the latter for offloading PME work to a
subset of the ranks.


Parts of log file below and I
will obviously appreciate suggestions/clarifications:

In the future, please share the full log by uploading it somewhere.


Command line:
   gmx mdrun -nt 4 -ntmpi 2 -npme 1 -pme gpu -nb gpu -s run_unstretch.tpr -o
traj_unstretch.trr -g md.log -c unstretched.gro

As noted before, I doubt that you benefit from using a separate PME rank
with a single GPU.

I suggest that instead you simply run:
gmx mdrun -ntmpi 1 -pme gpu -nb gpu
optionally, you can pass -ntomp 4, but that's the default so it's not
needed.



--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Reply via email to