Dear Szilard, thanks for the very clear answer. Following your suggestion I tried to run without DD; for the same system I run two simulations on two gpus:
gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 28 -ntmpi 1 -npme 0 -gputasks 00 -pin on -pinoffset 0 -pinstride 1 gmx mdrun -deffnm run2 -nb gpu -pme gpu -ntomp 28 -ntmpi 1 -npme 0 -gputasks 11 -pin on -pinoffset 28 -pinstride 1 but again the system crashed; with this I mean that after few minutes the machine goes off (power off) without any error message, even without using all the threads. I then tried running the two simulations on the same gpu without DD: gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 28 -ntmpi 1 -npme 0 -gputasks 00 -pin on -pinoffset 0 -pinstride 1 gmx mdrun -deffnm run2 -nb gpu -pme gpu -ntomp 28 -ntmpi 1 -npme 0 -gputasks 00 -pin on -pinoffset 28 -pinstride 1 and I obtained better performance (about 70 ns/day) with a massive use of the gpu (around 90%), comparing to the two runs on two gpus I reported in the previous post (gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 4 -ntmpi 7 -npme 1 -gputasks 0000000 -pin on -pinoffset 0 -pinstride 1 gmx mdrun -deffnm run2 -nb gpu -pme gpu -ntomp 4 -ntmpi 7 -npme 1 -gputasks 1111111 -pin on -pinoffset 28 -pinstride 1). As for pinning, cpu topology according to log file is: hardware topology: Basic Sockets, cores, and logical processors: Socket 0: [ 0 32] [ 1 33] [ 2 34] [ 3 35] [ 4 36] [ 5 37] [ 6 38] [ 7 39] [ 16 48] [ 17 49] [ 18 50] [ 19 51] [ 20 52] [ 21 53] [ 22 54] [ 23 55] [ 8 40] [ 9 41] [ 10 42] [ 11 43] [ 12 44] [ 13 45] [ 14 46] [ 15 47] [ 24 56] [ 25 57] [ 26 58] [ 27 59] [ 28 60] [ 29 61] [ 30 62] [ 31 63] If I understand well (absolutely not sure) it should not be that convenient to pin to consecutive threads, and indeed I found a subtle degradation of performance for a single simulation, switching from: gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 28 -ntmpi 1 -npme 0 -gputasks 00 -pin on to gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 28 -ntmpi 1 -npme 0 -gputasks 00 -pin on -pinoffset 0 -pinstride 1. Thanks again Stefano Il giorno ven 16 ago 2019 alle ore 17:48 Szilárd Páll < pall.szil...@gmail.com> ha scritto: > On Mon, Aug 5, 2019 at 5:00 PM Stefano Guglielmo > <stefano.guglie...@unito.it> wrote: > > > > Dear Paul, > > thanks for suggestions. Following them I managed to run 91 ns/day for the > > system I referred to in my previous post with the configuration: > > gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 4 -ntmpi 7 -npme 1 > -gputasks > > 0000111 -pin on (still 28 threads seems to be the best choice) > > > > and 56 ns/day for two independent runs: > > gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 4 -ntmpi 7 -npme 1 > -gputasks > > 0000000 -pin on -pinoffset 0 -pinstride 1 > > gmx mdrun -deffnm run2 -nb gpu -pme gpu -ntomp 4 -ntmpi 7 -npme 1 > -gputasks > > 1111111 -pin on -pinoffset 28 -pinstride 1 > > which is a fairly good result. > > Use no DD in single-GPU runs, i.e. for the latter, just simply > gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 28 -ntmpi 1 -npme 0 > -gputasks 00 -pin on -pinoffset 0 -pinstride 1 > > You can also have mdrun's multidir functionality manage an ensemble of > jobs (related or not) so you don't have to manually start, calculate > pinning, etc. > > > > I am still wondering if somehow I should pin the threads in some > different > > way in order to reflect the cpu topology and if this can influence > > performance (if I remember well NAMD allows the user to indicate > explicitly > > the cpu core/threads to use in a computation). > > Your pinning does reflect the CPU topology -- the 4x7=28 threads are > pinned to consecutive hardware threads (because of -pinstride 1, i.e. > don't skip the second hardware thread of the core). The mapping of > software to hardware threads happens based on a the topology-based > hardware thread indexing, see the hardware detection report in the log > file. > > > When I tried to run two simulations with the following configuration: > > gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 4 -ntmpi 8 -npme 1 > -gputasks > > 00001111 -pin on -pinoffset 0 -pinstride 1 > > gmx mdrun -deffnm run2 -nb gpu -pme gpu -ntomp 4 -ntmpi 8 -npme 1 > -gputasks > > 00001111 -pin on -pinoffset 0 -pinstride 32 > > the system crashed down. Probably this is normal and I am missing > something > > quite obvious. > > Not really. What do you mean by "crashed down", the machine should not > crash, nor should the simulation. Even though your machine has 32 > cores / 64 threads, using all of these may not always be beneficial as > using more threads where there is too little work to scale will have > an overhead. Have you tried using all cores but only 1 thread / core > (i.e. 32 threads in total with pinstride 2)? > > Cheers, > -- > Szilárd > > > > > Thanks again for the valuable advices > > Stefano > > > > > > > > Il giorno dom 4 ago 2019 alle ore 01:40 paul buscemi <pbusc...@q.com> ha > > scritto: > > > > > Stefano, > > > > > > A recent run with 140000 atoms, including 10000 isopropanol molecules > on > > > top of an end restrained PDMS surface of 74000 atoms in a 20 20 30 > nm > > > box ran at 67 ns/d nvt with the mdrun conditions I posted. It took 120 > ns > > > for 100 molecules of an adsorbate to go from solution to the > surface. I > > > don't think this will set the world ablaze with any benchmarks but it > is > > > acceptable to get some work done. > > > > > > Linux Mint Mate 18, AMD Threadripper 32 core 2990wx 4.2Ghz, 32GB DDR4, > 2x > > > RTX 2080TI gmx2019 in the simplest gmx configuration for gpus, CUDA > > > version 10, Nvidia 410.7p loaded from the repository > > > > > > Paul > > > > > > > On Aug 3, 2019, at 12:58 PM, paul buscemi <pbusc...@q.com> wrote: > > > > > > > > Stefano, > > > > > > > > Here is a typical run > > > > > > > > fpr minimization mdrun -deffnm grofile. -nn gpu > > > > > > > > and for other runs for a 32 core > > > > > > > > gmx -deffnm grofile.nvt -nb gpu -pme gpu -ntomp 8 -ntmpi 8 -npme > 1 > > > -gputasks 0000000011111111 -pin on > > > > > > > > Depending on the molecular system/model -ntomp -4 -ntmpi 16 may be > > > faster - of course adjusting -gputasks > > > > > > > > Rarely do I find that not using ntomp and ntpmi is faster, but it is > > > never bad > > > > > > > > Let me know how it goes. > > > > > > > > Paul > > > > > > > >> On Aug 3, 2019, at 4:41 AM, Stefano Guglielmo < > > > stefano.guglie...@unito.it> wrote: > > > >> > > > >> Hi Paul, > > > >> thanks for the reply. Would you mind posting the command you used or > > > >> telling how did you balance the work between cpu and gpu? > > > >> > > > >> What about pinning? Does anyone know how to deal with a cpu topology > > > like > > > >> the one reported in my previous post and if it is relevant for > > > performance? > > > >> Thanks > > > >> Stefano > > > >> > > > >> Il giorno sabato 3 agosto 2019, Paul Buscemi <pbusc...@q.com> ha > > > scritto: > > > >> > > > >>> I run the same system and setup but no nvlink. Maestro runs both > gpus > > > at > > > >>> 100 percent. Gromacs typically 50 --60 percent can do 600ns/d on > 20000 > > > >>> atoms > > > >>> > > > >>> PB > > > >>> > > > >>>> On Jul 25, 2019, at 9:30 PM, Kevin Boyd <kevin.b...@uconn.edu> > wrote: > > > >>>> > > > >>>> Hi, > > > >>>> > > > >>>> I've done a lot of research/experimentation on this, so I can > maybe > > > get > > > >>> you > > > >>>> started - if anyone has any questions about the essay to follow, > feel > > > >>> free > > > >>>> to email me personally, and I'll link it to the email thread if it > > > ends > > > >>> up > > > >>>> being pertinent. > > > >>>> > > > >>>> First, there's some more internet resources to checkout. See > Mark's > > > talk > > > >>> at > > > >>>> - > > > >>>> https://bioexcel.eu/webinar-performance-tuning-and- > > > >>> optimization-of-gromacs/ > > > >>>> Gromacs development moves fast, but a lot of it is still relevant. > > > >>>> > > > >>>> I'll expand a bit here, with the caveat that Gromacs GPU > development > > > is > > > >>>> moving very fast and so the correct commands for optimal > performance > > > are > > > >>>> both system-dependent and a moving target between versions. This > is a > > > >>> good > > > >>>> thing - GPUs have revolutionized the field, and with each > iteration we > > > >>> make > > > >>>> better use of them. The downside is that it's unclear exactly what > > > sort > > > >>> of > > > >>>> CPU-GPU balance you should look to purchase to take advantage of > > > future > > > >>>> developments, though the trend is certainly that more and more > > > >>> computation > > > >>>> is being offloaded to the GPUs. > > > >>>> > > > >>>> The most important consideration is that to get maximum total > > > throughput > > > >>>> performance, you should be running not one but multiple > simulations > > > >>>> simultaneously. You can do this through the -multidir option, but > I > > > don't > > > >>>> recommend that in this case, as it requires compiling with MPI and > > > limits > > > >>>> some of your options. My run scripts usually use "gmx mdrun ... > &" to > > > >>>> initiate subprocesses, with combinations of -ntomp, -ntmpi, -pin > > > >>>> -pinoffset, and -gputasks. I can give specific examples if you're > > > >>>> interested. > > > >>>> > > > >>>> Another important point is that you can run more simulations than > the > > > >>>> number of GPUs you have. Depending on CPU-GPU balance and > quality, you > > > >>>> won't double your throughput by e.g. putting 4 simulations on 2 > GPUs, > > > but > > > >>>> you might increase it up to 1.5x. This would involve targeting the > > > same > > > >>> GPU > > > >>>> with -gputasks. > > > >>>> > > > >>>> Within a simulation, you should set up a benchmarking script to > figure > > > >>> out > > > >>>> the best combination of thread-mpi ranks and open-mp threads - > this > > > can > > > >>>> have pretty drastic effects on performance. For example, if you > want > > > to > > > >>> use > > > >>>> your entire machine for one simulation (not recommended for > maximal > > > >>> > > > >>> -- > > > >>> Gromacs Users mailing list > > > >>> > > > >>> * Please search the archive at http://www.gromacs.org/ > > > >>> Support/Mailing_Lists/GMX-Users_List before posting! > > > >>> > > > >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > >>> > > > >>> * For (un)subscribe requests visit > > > >>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users > or > > > >>> send a mail to gmx-users-requ...@gromacs.org. > > > >>> > > > >> > > > >> > > > >> -- > > > >> Stefano GUGLIELMO PhD > > > >> Assistant Professor of Medicinal Chemistry > > > >> Department of Drug Science and Technology > > > >> Via P. Giuria 9 > > > >> 10125 Turin, ITALY > > > >> ph. +39 (0)11 6707178 > > > >> -- > > > >> Gromacs Users mailing list > > > >> > > > >> * Please search the archive at > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > > > posting! > > > >> > > > >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > >> > > > >> * For (un)subscribe requests visit > > > >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users > or > > > send a mail to gmx-users-requ...@gromacs.org. > > > > > > > > -- > > > > Gromacs Users mailing list > > > > > > > > * Please search the archive at > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > > > posting! > > > > > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > > > > > > * For (un)subscribe requests visit > > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users > or > > > send a mail to gmx-users-requ...@gromacs.org. > > > > > > -- > > > Gromacs Users mailing list > > > > > > * Please search the archive at > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > > > posting! > > > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > > > > * For (un)subscribe requests visit > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > > > send a mail to gmx-users-requ...@gromacs.org. > > > > > > > > > -- > > Stefano GUGLIELMO PhD > > Assistant Professor of Medicinal Chemistry > > Department of Drug Science and Technology > > Via P. Giuria 9 > > 10125 Turin, ITALY > > ph. +39 (0)11 6707178 > > -- > > Gromacs Users mailing list > > > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > posting! > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > > * For (un)subscribe requests visit > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > send a mail to gmx-users-requ...@gromacs.org. > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > send a mail to gmx-users-requ...@gromacs.org. -- Stefano GUGLIELMO PhD Assistant Professor of Medicinal Chemistry Department of Drug Science and Technology Via P. Giuria 9 10125 Turin, ITALY ph. +39 (0)11 6707178 -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.