Hi, I think Szilard meant more like
gmx mdrun -ntmpi 7 -npme 1 -nb gpu -pme gpu -bonded gpu -gputask 0022446 for the assignment of the 7 GPU tasks. Mark On Fri., 6 Sep. 2019, 22:07 sunyeping, <sunyep...@aliyun.com> wrote: > Hello Szilárd Páll > > Thank you for you reply. I tried your command: > > gmx mdrun -ntmpi 7 -npme 1 -nb gpu -pme gpu -bonded gpu -gpuid 0,2,4,6 > -gputask 001122334 > > but got the following error information: > > Using 7 MPI threads > Using 10 OpenMP threads per tMPI thread > > Program: gmx mdrun, version 2019.3 > Source file: src/gromacs/taskassignment/taskassignment.cpp (line 255) > Function: std::vector<std::vector<gmx::GpuTaskMapping> >::value_type > gmx::runTaskAssignment(const std::vector<int>&, const std::vector<int>&, > const gmx_hw_info_t&, const gmx::MDLogger&, const t_commrec*, const > gmx_multisim_t*, const gmx::PhysicalNodeCommunicator&, const > std::vector<gmx::GpuTask>&, bool, PmeRunMode) > MPI rank: 0 (out of 7) > > Inconsistency in user input: > There were 7 GPU tasks found on node localhost.localdomain, but 4 GPUs were > available. If the GPUs are equivalent, then it is usually best to have a > number of tasks that is a multiple of the number of GPUs. You should > reconsider your GPU task assignment, number of ranks, or your use of the > -nb, > -pme, and -npme options, perhaps after measuring the performance you can > get. > > Could you tell me how to correct this? > > Best regards, > Yeping > > ------------------------------------------------------------------ > Hi, > > You have 2x Xeon Gold 6150 which is 2x 18 = 36 cores; Intel CPUs > support 2 threads/core (HyperThreading), hence the 72. > > https://ark.intel.com/content/www/us/en/ark/products/120490/intel-xeon-gold-6150-processor-24-75m-cache-2-70-ghz.html > > You will not be able to scale efficiently over 8 GPUs in a single > simulation with the current code; while performance will likely > improve in the next release, due to PCI bus and PME scaling > limitations, even with GROMACS 2020 it is unlikely you will see much > benefit beyond 4 GPUs. > > Try running on 3-4 GPUs with at least 2 ranks on each, and one > separate PME rank. You might also want to use every second GPU rather > than the first four to avoid overloading the PCI bus; e.g. > gmx mdrun -ntmpi 7 -npme 1 -nb gpu -pme gpu -bonded gpu -gpuid 0,2,4,6 > -gputask 001122334 > > Cheers, > -- > Szilárd > > On Thu, Sep 5, 2019 at 1:12 AM 孙业平 <sunyeping at aliyun.com> wrote: > >> Hello Mark Abraham, > >> Thank you very much for your reply. I will definitely check the webinar > and gromacs document. But now I am confused and expect an direct solution. > The workstation should have 18 cores each with 4 hyperthreads. The output > of "lscpu" reads: > > Architecture: x86_64 > > CPU op-mode(s): 32-bit, 64-bit > > Byte Order: Little Endian > > CPU(s): 72 > > On-line CPU(s) list: 0-71 > > Thread(s) per core: 2 > > Core(s) per socket: 18 > > Socket(s): 2 > > NUMA node(s): 2 > > Vendor ID: GenuineIntel > > CPU family: 6 > > Model: 85 > > Model name: Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz > > Stepping: 4 > > CPU MHz: 2701.000 > > CPU max MHz: 2701.0000 > > CPU min MHz: 1200.0000 > > BogoMIPS: 5400.00 > > Virtualization: VT-x > > L1d cache: 32K > > L1i cache: 32K > > L2 cache: 1024K > > L3 cache: 25344K > > NUMA node0 CPU(s): 0-17,36-53 > > NUMA node1 CPU(s): 18-35,54-71 > >> Now I don't want to do multiple simulations and just want to run a > single simulation. When assigning the simulation to only one GPU (gmx mdrun > -v -gpu_id 0 -deffnm md), the simulation performance is 90 ns/day. However, > when I don't assign the GPU but let all GPU work by: > > gmx mdrun -v -deffnm md > > The simulation performance is only 2 ns/day. > >> So what is correct command to make a full use of all GPUs and achieve > the best performance (which I expect should be much higher than 90 ns/day > with only one GPU)? Could you give me further suggestions and help? > >> Best regards, > > Yeping > ------------------------------------------------------------------ > From:孙业平 <sunyep...@aliyun.com> > Sent At:2019 Sep. 5 (Thu.) 07:12 > To:gromacs <gmx-us...@gromacs.org>; Mark Abraham <mark.j.abra...@gmail.com > > > Cc:gromacs.org_gmx-users <gromacs.org_gmx-users@maillist.sys.kth.se> > Subject:Re: [gmx-users] The problem of utilizing multiple GPU > > Hello Mark Abraham, > > Thank you very much for your reply. I will definitely check the webinar > and gromacs document. But now I am confused and expect an direct solution. > The workstation should have 18 cores each with 4 hyperthreads. The output > of "lscpu" reads: > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 72 > On-line CPU(s) list: 0-71 > Thread(s) per core: 2 > Core(s) per socket: 18 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 85 > Model name: Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz > Stepping: 4 > CPU MHz: 2701.000 > CPU max MHz: 2701.0000 > CPU min MHz: 1200.0000 > BogoMIPS: 5400.00 > Virtualization: VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 1024K > L3 cache: 25344K > NUMA node0 CPU(s): 0-17,36-53 > NUMA node1 CPU(s): 18-35,54-71 > > Now I don't want to do multiple simulations and just want to run a single > simulation. When assigning the simulation to only one GPU (gmx mdrun -v > -gpu_id 0 -deffnm md), the simulation performance is 90 ns/day. However, > when I don't assign the GPU but let all GPU work by: > gmx mdrun -v -deffnm md > The simulation performance is only 2 ns/day. > > So what is correct command to make a full use of all GPUs and achieve the > best performance (which I expect should be much higher than 90 ns/day with > only one GPU)? Could you give me further suggestions and help? > > Best regards, > Yeping > > ------------------------------------------------------------------ > From:Mark Abraham <mark.j.abra...@gmail.com> > Sent At:2019 Sep. 4 (Wed.) 19:10 > To:gromacs <gmx-us...@gromacs.org>; 孙业平 <sunyep...@aliyun.com> > Cc:gromacs.org_gmx-users <gromacs.org_gmx-users@maillist.sys.kth.se> > Subject:Re: [gmx-users] The problem of utilizing multiple GPU > > Hi, > > > On Wed, 4 Sep 2019 at 12:54, sunyeping <sunyep...@aliyun.com> wrote: > Dear everyone, > > I am trying to do simulation with a workstation with 72 core and 8 > geforce 1080 GPUs. > > 72 cores, or just 36 cores each with two hyperthreads? (it matters because > you might not want to share cores between simulations, which is what you'd > get if you just assigned 9 hyperthreads per GPU and 1 GPU per simulation). > > When I do not assign a certain GPU with the command: > gmx mdrun -v -deffnm md > all GPUs are used and but the utilization of each GPU is extremely low > (only 1-2 %), and the simulation will be finished after several months. > > Yep. Too many workers for not enough work means everyone spends time more > time coordinating than working. This is likely to improve in GROMACS 2020 > (beta out shortly). > > In contrast, when I assign the simulation task to only one GPU: > gmx mdrun -v -gpu_id 0 -deffnm md > the GPU utilization can reach 60-70%, and the simulation can be finished > within a week. Even when I use only two GPU: > > Utilization is only a proxy - what you actually want to measure is the > rate of simulation ie. ns/day. > > gmx mdrun -v -gpu_id 0,2 -deffnm md > > the GPU utilizations are very low and the simulation is very slow. > > That could be for a variety of reasons, which you could diagnose by > looking at the performance report at the end of the log file, and comparing > different runs. > I think I may missuse the GPU for gromacs simulation. Could you tell me > what is the correct way to use multiple GPUs? > > If you're happy running multiple simulations, then the easiest thing to do > is to use the existing multi-simulation support to do > > mpirun -np 8 gmx_mpi -multidir dir0 dir1 dir2 ... dir7 > > and let mdrun handle the details. Otherwise you have to get involved in > assigning a subset of the CPU cores and GPUs to each job that both runs > fast and does not conflict. See the documentation for GROMACS for the > version you're running e.g. > http://manual.gromacs.org/documentation/current/user-guide/mdrun-performance.html#running-mdrun-within-a-single-node > . > > You probably want to check out this webinar tomorrow > https://bioexcel.eu/webinar-more-bang-for-your-buck-improved-use-of-gpu-nodes-for-gromacs-2018-2019-09-05/ > . > > Mark > Best regards > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > send a mail to gmx-users-requ...@gromacs.org. > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > send a mail to gmx-users-requ...@gromacs.org. > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.