Just to quickly jump in, because Mark suggested taken a look at the latest doc and unfortunately I must admit that I didn't understand what I read. I appear to be especially struggling with the idea of gputasks.

Can you please explain what is happening in this line?

-pme gpu -nb gpu -ntmpi 8 -ntomp 6 -npme 1 -gputasks 00000001

I am seriously confused here. Also, the number of ranks is 8, while the number 
of threads is 6? Is -ntomp now specifying the _per-rank_ number of threads, 
i.e. the actual number of threads for this job would be 48?

Thank you,

Alex


On 2/9/2018 8:25 AM, Szilárd Páll wrote:
Hi,

First of all,have you read the docs (admittedly somewhat brief):
http://manual.gromacs.org/documentation/2018/user-guide/mdrun-performance.html#types-of-gpu-tasks

The current PME GPU was optimized for single-GPU runs. Using multiple GPUs
with PME offloaded works, but this mode hasn't been an optimization target
and it will often not give very good performance. Using multiple GPUs
requires a separate PME rank (as you have realized), only one can be used
(as we don't support PME decomposition on GPUs) and it comes some inherent
scaling drawbacks. For this reason, unless you _need_ your single run to be
as fast as possible, you'll be better off running multiple simulations
side-by side.

A few tips for tuning the performance of a multi-GPU run with PME offload:
* expect to get at best 1.5 scaling to 2 GPUs (rarely 3 if the tasks allow)
* generally it's best to use about the same decomposition that you'd use
with nonbonded-only offload, e.g. in your case 6-8 ranks
* map the GPU task alone or at most together with 1 PP rank to a GPU, i.e.
use the new -gputasks option
e.g. for your case I'd expect the following to work ~best:
gmx mdrun -v -deffnm md -pme gpu -nb gpu -ntmpi 8 -ntomp 6 -npme 1
-gputasks 00000001
or
gmx mdrun -v -deffnm md -pme gpu -nb gpu -ntmpi 8 -ntomp 6 -npme 1
-gputasks 00000011


Let me know if that gave some improvement.

Cheers,

--
Szilárd

On Fri, Feb 9, 2018 at 8:51 AM, Gmx QA <gmxquesti...@gmail.com> wrote:

Hi list,

I am trying out the new gromacs 2018 (really nice so far), but have a few
questions about what command line options I should specify, specifically
with the new gnu pme implementation.

My computer has two CPUs (with 12 cores each, 24 with hyper threading) and
two GPUs, and I currently (with 2018) start simulations like this:

$ gmx mdrun -v -deffnm md -pme gpu -nb gpu -ntmpi 2 -npme 1 -ntomp 24
-gpu_id 01

this works, but gromacs prints the message that 24 omp threads per mpi rank
is likely inefficient. However, trying to reduce the number of omp threads
I see a reduction in performance. Is this message no longer relevant with
gpu pme or am I overlooking something?

Thanks
/PK
--
Gromacs Users mailing list

* Please search the archive at http://www.gromacs.org/
Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-requ...@gromacs.org.


--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Reply via email to