subject:"\[gmx\-users\] Performance"

Re: [gmx-users] Performance issues with Gromacs 2020 on GPUs - slower than 2019.5

2020-03-09 Thread Szilárd Páll

Hi Andreas,

Sorry for the delay.

I can confirm the regression. This affects the energy calculation steps
where the GPU bonded computational did get significantly slower (as a
side-effect of optimizations that mainly targeted the force-only kernels).

Can you please file an issue on redmine.gromacs.org and upload the data you
shared with me?

As a workaround you should consider using an nstcalcenergy>1; bumping it to
just ~10 would eliminate most of the regression and would improve the
performance of other computation too (the nonbonded F-only kernels are also
at least 1.5x faster than the force+energy kernels).
Alternatively, I recall you had decent CPU, so you could run the bonded
interactions on the CPU

Side-note: you are using an overly fine PME grid that you did not scale
along the (overly accurate) the rather long cut-offs (see
http://manual.gromacs.org/documentation/current/user-guide/mdp-options.html#mdp-fourierspacing
).

Cheers,
--
Szilárd


On Fri, Feb 28, 2020 at 11:10 AM Andreas Baer  wrote:

> Hi,
>
> sorry for it!
>
> https://faubox.rrze.uni-erlangen.de/getlink/fiUpELsXokQr3a7vyeDSKdY3/benchmarks_2019-2020_all
>
> Cheers,
> Andreas
>
> On 27.02.20 17:59, Szilárd Páll wrote:
>
> On Thu, Feb 27, 2020 at 1:08 PM Andreas Baer  wrote:
>
>> Hi,
>>
>> On 27.02.20 12:34, Szilárd Páll wrote:
>> > Hi
>> >
>> > On Thu, Feb 27, 2020 at 11:31 AM Andreas Baer 
>> wrote:
>> >
>> >> Hi,
>> >>
>> >> with the link below, additional log files for runs with 1 GPU should be
>> >> accessible now.
>> >>
>> > I meant to ask you to run single-rank GPU runs, i.e. gmx mdrun -ntmpi 1.
>> >
>> > It would also help if you could share some input files in case if
>> further
>> > testing is needed.
>> Ok, there is now also an additional benchmark with `-ntmpi 1 -ntomp 4
>> -bonded gpu -update gpu` as parameters. However, it is run on the same
>> machine with smt disabled.
>> With the following link, I provide all the tests on this machine, I did
>> by now, along with a summary of the performance for the several input
>> parameters (both in `logfiles`), as well as input files (`C60xh.7z`) and
>> the scripts to run these.
>>
>
> Links seems to be missing.
> --
> Szilárd
>
>
>> I hope, this helps. If there is anything else, I can do to help, please
>> let me know!
>> >
>> >
>> >> Thank you for the comment with the rlist, I did not know, that this
>> will
>> >> affect the performance negatively.
>> >
>> > It does in multiple ways. First, you are using a rather long list buffer
>> > which will make the nonbonded pair-interaction calculation more
>> > computational expensive than it could be if you just used a tolerance
>> and
>> > let the buffer be calculated. Secondly, as setting a manual rlist
>> disables
>> > the automated verlet buffer calculation, it prevents mdrun from using a
>> > dual pairl-list setup (see
>> >
>> http://manual.gromacs.org/documentation/2018.1/release-notes/2018/major/features.html#dual-pair-list-buffer-with-dynamic-pruning
>> )
>> > which has additional performance benefits.
>> Ok, thank you for the explanation!
>> >
>> > Cheers,
>> > --
>> > Szilárd
>> Cheers,
>> Andreas
>> >
>> >
>> >
>> >> I know, about the nstcalcenergy, but
>> >> I need it for several of my simulations.
>> > Cheers,
>> >> Andreas
>> >>
>> >> On 26.02.20 16:50, Szilárd Páll wrote:
>> >>> Hi,
>> >>>
>> >>> Can you please check the performance when running on a single GPU
>> 2019 vs
>> >>> 2020 with your inputs?
>> >>>
>> >>> Also note that you are using some peculiar settings that will have an
>> >>> adverse effect on performance (like manually set rlist disallowing the
>> >> dual
>> >>> pair-list setup, and nstcalcenergy=1).
>> >>>
>> >>> Cheers,
>> >>>
>> >>> --
>> >>> Szilárd
>> >>>
>> >>>
>> >>> On Wed, Feb 26, 2020 at 4:11 PM Andreas Baer 
>> >> wrote:
>>  Hello,
>> 
>>  here is a link to the logfiles.
>> 
>> 
>> >>
>> https://faubox.rrze.uni-erlangen.de/getlink/fiX8wP1LwSBkHRoykw6ksjqY/benchmarks_2019-2020
>>  If necessary, I can also provide some more log or tpr/gro/... files.
>> 
>>  Cheers,
>>  Andreas
>> 
>> 
>>  On 26.02.20 16:09, Paul bauer wrote:
>> > Hello,
>> >
>> > you can't add attachments to the list, please upload the files
>> > somewhere to share them.
>> > This might be quite important to us, because the performance
>> > regression is not expected by us.
>> >
>> > Cheers
>> >
>> > Paul
>> >
>> > On 26/02/2020 15:54, Andreas Baer wrote:
>> >> Hello,
>> >>
>> >> from a set of benchmark tests with large systems using Gromacs
>> >> versions 2019.5 and 2020, I obtained some unexpected results:
>> >> With the same set of parameters and the 2020 version, I obtain a
>> >> performance that is about 2/3 of the 2019.5 version. Interestingly,
>> >> according to nvidia-smi, the GPU usage is about 20% higher for the
>> >> 2020 version.
>> >> Also from the log files it seems, that the 2020

Re: [gmx-users] Performance issues with Gromacs 2020 on GPUs - slower than 2019.5

2020-02-28 Thread Andreas Baer


Hi,

sorry for it!
https://faubox.rrze.uni-erlangen.de/getlink/fiUpELsXokQr3a7vyeDSKdY3/benchmarks_2019-2020_all

Cheers,
Andreas

On 27.02.20 17:59, Szilárd Páll wrote:
On Thu, Feb 27, 2020 at 1:08 PM Andreas Baer > wrote:


Hi,

On 27.02.20 12:34, Szilárd Páll wrote:
> Hi
>
> On Thu, Feb 27, 2020 at 11:31 AM Andreas Baer
mailto:andreas.b...@fau.de>> wrote:
>
>> Hi,
>>
>> with the link below, additional log files for runs with 1 GPU
should be
>> accessible now.
>>
> I meant to ask you to run single-rank GPU runs, i.e. gmx mdrun
-ntmpi 1.
>
> It would also help if you could share some input files in case
if further
> testing is needed.
Ok, there is now also an additional benchmark with `-ntmpi 1 -ntomp 4
-bonded gpu -update gpu` as parameters. However, it is run on the
same
machine with smt disabled.
With the following link, I provide all the tests on this machine,
I did
by now, along with a summary of the performance for the several input
parameters (both in `logfiles`), as well as input files
(`C60xh.7z`) and
the scripts to run these.


Links seems to be missing.
--
Szilárd

I hope, this helps. If there is anything else, I can do to help,
please
let me know!
>
>
>> Thank you for the comment with the rlist, I did not know, that
this will
>> affect the performance negatively.
>
> It does in multiple ways. First, you are using a rather long
list buffer
> which will make the nonbonded pair-interaction calculation more
> computational expensive than it could be if you just used a
tolerance and
> let the buffer be calculated. Secondly, as setting a manual
rlist disables
> the automated verlet buffer calculation, it prevents mdrun from
using a
> dual pairl-list setup (see
>

http://manual.gromacs.org/documentation/2018.1/release-notes/2018/major/features.html#dual-pair-list-buffer-with-dynamic-pruning)
> which has additional performance benefits.
Ok, thank you for the explanation!
>
> Cheers,
> --
> Szilárd
Cheers,
Andreas
>
>
>
>> I know, about the nstcalcenergy, but
>> I need it for several of my simulations.
> Cheers,
>> Andreas
>>
>> On 26.02.20 16:50, Szilárd Páll wrote:
>>> Hi,
>>>
>>> Can you please check the performance when running on a single
GPU 2019 vs
>>> 2020 with your inputs?
>>>
>>> Also note that you are using some peculiar settings that will
have an
>>> adverse effect on performance (like manually set rlist
disallowing the
>> dual
>>> pair-list setup, and nstcalcenergy=1).
>>>
>>> Cheers,
>>>
>>> --
>>> Szilárd
>>>
>>>
>>> On Wed, Feb 26, 2020 at 4:11 PM Andreas Baer
mailto:andreas.b...@fau.de>>
>> wrote:
 Hello,

 here is a link to the logfiles.


>>

https://faubox.rrze.uni-erlangen.de/getlink/fiX8wP1LwSBkHRoykw6ksjqY/benchmarks_2019-2020
 If necessary, I can also provide some more log or tpr/gro/...
files.

 Cheers,
 Andreas


 On 26.02.20 16:09, Paul bauer wrote:
> Hello,
>
> you can't add attachments to the list, please upload the files
> somewhere to share them.
> This might be quite important to us, because the performance
> regression is not expected by us.
>
> Cheers
>
> Paul
>
> On 26/02/2020 15:54, Andreas Baer wrote:
>> Hello,
>>
>> from a set of benchmark tests with large systems using Gromacs
>> versions 2019.5 and 2020, I obtained some unexpected results:
>> With the same set of parameters and the 2020 version, I
obtain a
>> performance that is about 2/3 of the 2019.5 version.
Interestingly,
>> according to nvidia-smi, the GPU usage is about 20% higher
for the
>> 2020 version.
>> Also from the log files it seems, that the 2020 version
does the
>> computations more efficiently, but spends so much more time
waiting,
>> that the overall performance drops.
>>
>> Some background info on the benchmarks:
>> - System contains about 2.1 million atoms.
>> - Hardware: 2x Intel Xeon Gold 6134 („Skylake“) @3.2 GHz =
16 cores +
>> SMT; 4x NVIDIA Tesla V100
>>     (similar results with less significant performance drop
(~15%) on a
>> different machine: 2 or 4 nodes with each [2x Intel Xeon
2660v2 („Ivy
>> Bridge“) @ 2.2GHz = 20 cores + SMT; 2x NVIDIA Kepler K20])
>> - Several options for -ntmpi, -ntomp, -bonded, -pme are
used to find
>> the optimal set. However the performance drop seems to be
persistent

Re: [gmx-users] Performance issues with Gromacs 2020 on GPUs - slower than 2019.5

2020-02-27 Thread Szilárd Páll

On Thu, Feb 27, 2020 at 1:08 PM Andreas Baer  wrote:

> Hi,
>
> On 27.02.20 12:34, Szilárd Páll wrote:
> > Hi
> >
> > On Thu, Feb 27, 2020 at 11:31 AM Andreas Baer 
> wrote:
> >
> >> Hi,
> >>
> >> with the link below, additional log files for runs with 1 GPU should be
> >> accessible now.
> >>
> > I meant to ask you to run single-rank GPU runs, i.e. gmx mdrun -ntmpi 1.
> >
> > It would also help if you could share some input files in case if further
> > testing is needed.
> Ok, there is now also an additional benchmark with `-ntmpi 1 -ntomp 4
> -bonded gpu -update gpu` as parameters. However, it is run on the same
> machine with smt disabled.
> With the following link, I provide all the tests on this machine, I did
> by now, along with a summary of the performance for the several input
> parameters (both in `logfiles`), as well as input files (`C60xh.7z`) and
> the scripts to run these.
>

Links seems to be missing.
--
Szilárd


> I hope, this helps. If there is anything else, I can do to help, please
> let me know!
> >
> >
> >> Thank you for the comment with the rlist, I did not know, that this will
> >> affect the performance negatively.
> >
> > It does in multiple ways. First, you are using a rather long list buffer
> > which will make the nonbonded pair-interaction calculation more
> > computational expensive than it could be if you just used a tolerance and
> > let the buffer be calculated. Secondly, as setting a manual rlist
> disables
> > the automated verlet buffer calculation, it prevents mdrun from using a
> > dual pairl-list setup (see
> >
> http://manual.gromacs.org/documentation/2018.1/release-notes/2018/major/features.html#dual-pair-list-buffer-with-dynamic-pruning
> )
> > which has additional performance benefits.
> Ok, thank you for the explanation!
> >
> > Cheers,
> > --
> > Szilárd
> Cheers,
> Andreas
> >
> >
> >
> >> I know, about the nstcalcenergy, but
> >> I need it for several of my simulations.
> > Cheers,
> >> Andreas
> >>
> >> On 26.02.20 16:50, Szilárd Páll wrote:
> >>> Hi,
> >>>
> >>> Can you please check the performance when running on a single GPU 2019
> vs
> >>> 2020 with your inputs?
> >>>
> >>> Also note that you are using some peculiar settings that will have an
> >>> adverse effect on performance (like manually set rlist disallowing the
> >> dual
> >>> pair-list setup, and nstcalcenergy=1).
> >>>
> >>> Cheers,
> >>>
> >>> --
> >>> Szilárd
> >>>
> >>>
> >>> On Wed, Feb 26, 2020 at 4:11 PM Andreas Baer 
> >> wrote:
>  Hello,
> 
>  here is a link to the logfiles.
> 
> 
> >>
> https://faubox.rrze.uni-erlangen.de/getlink/fiX8wP1LwSBkHRoykw6ksjqY/benchmarks_2019-2020
>  If necessary, I can also provide some more log or tpr/gro/... files.
> 
>  Cheers,
>  Andreas
> 
> 
>  On 26.02.20 16:09, Paul bauer wrote:
> > Hello,
> >
> > you can't add attachments to the list, please upload the files
> > somewhere to share them.
> > This might be quite important to us, because the performance
> > regression is not expected by us.
> >
> > Cheers
> >
> > Paul
> >
> > On 26/02/2020 15:54, Andreas Baer wrote:
> >> Hello,
> >>
> >> from a set of benchmark tests with large systems using Gromacs
> >> versions 2019.5 and 2020, I obtained some unexpected results:
> >> With the same set of parameters and the 2020 version, I obtain a
> >> performance that is about 2/3 of the 2019.5 version. Interestingly,
> >> according to nvidia-smi, the GPU usage is about 20% higher for the
> >> 2020 version.
> >> Also from the log files it seems, that the 2020 version does the
> >> computations more efficiently, but spends so much more time waiting,
> >> that the overall performance drops.
> >>
> >> Some background info on the benchmarks:
> >> - System contains about 2.1 million atoms.
> >> - Hardware: 2x Intel Xeon Gold 6134 („Skylake“) @3.2 GHz = 16 cores
> +
> >> SMT; 4x NVIDIA Tesla V100
> >> (similar results with less significant performance drop (~15%)
> on a
> >> different machine: 2 or 4 nodes with each [2x Intel Xeon 2660v2
> („Ivy
> >> Bridge“) @ 2.2GHz = 20 cores + SMT; 2x NVIDIA Kepler K20])
> >> - Several options for -ntmpi, -ntomp, -bonded, -pme are used to find
> >> the optimal set. However the performance drop seems to be persistent
> >> for all such options.
> >>
> >> Two representative log files are attached.
> >> Does anyone have an idea, where this drop comes from, and how to
> >> choose the parameters for the 2020 version to circumvent this?
> >>
> >> Regards,
> >> Andreas
> >>
>  --
>  Gromacs Users mailing list
> 
>  * Please search the archive at
>  http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>  posting!
> 
>  * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> 
>  * For (un)subscribe requests visit

Re: [gmx-users] Performance issues with Gromacs 2020 on GPUs - slower than 2019.5

2020-02-27 Thread Andreas Baer

Hi,

On 27.02.20 12:34, Szilárd Páll wrote:

On Thu, Feb 27, 2020 at 11:31 AM Andreas Baer wrote:

Hi,

with the link below, additional log files for runs with 1 GPU should be
accessible now.

I meant to ask you to run single-rank GPU runs, i.e. gmx mdrun -ntmpi 1.

It would also help if you could share some input files in case if further
testing is needed.
Ok, there is now also an additional benchmark with `-ntmpi 1 -ntomp 4
-bonded gpu -update gpu` as parameters. However, it is run on the same
machine with smt disabled.
With the following link, I provide all the tests on this machine, I did
by now, along with a summary of the performance for the several input
parameters (both in `logfiles`), as well as input files (`C60xh.7z`) and
the scripts to run these.
I hope, this helps. If there is anything else, I can do to help, please
let me know!

Thank you for the comment with the rlist, I did not know, that this will
affect the performance negatively.

It does in multiple ways. First, you are using a rather long list buffer
which will make the nonbonded pair-interaction calculation more
computational expensive than it could be if you just used a tolerance and
let the buffer be calculated. Secondly, as setting a manual rlist disables
the automated verlet buffer calculation, it prevents mdrun from using a
dual pairl-list setup (see
http://manual.gromacs.org/documentation/2018.1/release-notes/2018/major/features.html#dual-pair-list-buffer-with-dynamic-pruning)
which has additional performance benefits.

Ok, thank you for the explanation!

Cheers,
--
Szilárd

Cheers,
Andreas

I know, about the nstcalcenergy, but
I need it for several of my simulations.

Cheers,

Andreas

On 26.02.20 16:50, Szilárd Páll wrote:

Hi,

Can you please check the performance when running on a single GPU 2019 vs
2020 with your inputs?

Also note that you are using some peculiar settings that will have an
adverse effect on performance (like manually set rlist disallowing the

dual

pair-list setup, and nstcalcenergy=1).

Cheers,

--
Szilárd

On Wed, Feb 26, 2020 at 4:11 PM Andreas Baer

wrote:

Hello,

here is a link to the logfiles.

https://faubox.rrze.uni-erlangen.de/getlink/fiX8wP1LwSBkHRoykw6ksjqY/benchmarks_2019-2020

If necessary, I can also provide some more log or tpr/gro/... files.

Cheers,
Andreas

On 26.02.20 16:09, Paul bauer wrote:

Hello,

you can't add attachments to the list, please upload the files
somewhere to share them.
This might be quite important to us, because the performance
regression is not expected by us.

Cheers

Paul

On 26/02/2020 15:54, Andreas Baer wrote:

Hello,

from a set of benchmark tests with large systems using Gromacs
versions 2019.5 and 2020, I obtained some unexpected results:
With the same set of parameters and the 2020 version, I obtain a
performance that is about 2/3 of the 2019.5 version. Interestingly,
according to nvidia-smi, the GPU usage is about 20% higher for the
2020 version.
Also from the log files it seems, that the 2020 version does the
computations more efficiently, but spends so much more time waiting,
that the overall performance drops.

Some background info on the benchmarks:
- System contains about 2.1 million atoms.
- Hardware: 2x Intel Xeon Gold 6134 („Skylake“) @3.2 GHz = 16 cores +
SMT; 4x NVIDIA Tesla V100
(similar results with less significant performance drop (~15%) on a
different machine: 2 or 4 nodes with each [2x Intel Xeon 2660v2 („Ivy
Bridge“) @ 2.2GHz = 20 cores + SMT; 2x NVIDIA Kepler K20])
- Several options for -ntmpi, -ntomp, -bonded, -pme are used to find
the optimal set. However the performance drop seems to be persistent
for all such options.

Two representative log files are attached.
Does anyone have an idea, where this drop comes from, and how to
choose the parameters for the 2020 version to circumvent this?

Regards,
Andreas

--
Gromacs Users mailing list

* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-requ...@gromacs.org.

--
Gromacs Users mailing list

* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-requ...@gromacs.org.

--
Gromacs Users mailing list

* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Performance issues with Gromacs 2020 on GPUs - slower than 2019.5

2020-02-27 Thread Szilárd Páll

Hi

On Thu, Feb 27, 2020 at 11:31 AM Andreas Baer  wrote:

> Hi,
>
> with the link below, additional log files for runs with 1 GPU should be
> accessible now.
>

I meant to ask you to run single-rank GPU runs, i.e. gmx mdrun -ntmpi 1.

It would also help if you could share some input files in case if further
testing is needed.


> Thank you for the comment with the rlist, I did not know, that this will
> affect the performance negatively.


It does in multiple ways. First, you are using a rather long list buffer
which will make the nonbonded pair-interaction calculation more
computational expensive than it could be if you just used a tolerance and
let the buffer be calculated. Secondly, as setting a manual rlist disables
the automated verlet buffer calculation, it prevents mdrun from using a
dual pairl-list setup (see
http://manual.gromacs.org/documentation/2018.1/release-notes/2018/major/features.html#dual-pair-list-buffer-with-dynamic-pruning)
which has additional performance benefits.

Cheers,
--
Szilárd



> I know, about the nstcalcenergy, but
> I need it for several of my simulations.

Cheers,
> Andreas
>
> On 26.02.20 16:50, Szilárd Páll wrote:
> > Hi,
> >
> > Can you please check the performance when running on a single GPU 2019 vs
> > 2020 with your inputs?
> >
> > Also note that you are using some peculiar settings that will have an
> > adverse effect on performance (like manually set rlist disallowing the
> dual
> > pair-list setup, and nstcalcenergy=1).
> >
> > Cheers,
> >
> > --
> > Szilárd
> >
> >
> > On Wed, Feb 26, 2020 at 4:11 PM Andreas Baer 
> wrote:
> >
> >> Hello,
> >>
> >> here is a link to the logfiles.
> >>
> >>
> https://faubox.rrze.uni-erlangen.de/getlink/fiX8wP1LwSBkHRoykw6ksjqY/benchmarks_2019-2020
> >>
> >> If necessary, I can also provide some more log or tpr/gro/... files.
> >>
> >> Cheers,
> >> Andreas
> >>
> >>
> >> On 26.02.20 16:09, Paul bauer wrote:
> >>> Hello,
> >>>
> >>> you can't add attachments to the list, please upload the files
> >>> somewhere to share them.
> >>> This might be quite important to us, because the performance
> >>> regression is not expected by us.
> >>>
> >>> Cheers
> >>>
> >>> Paul
> >>>
> >>> On 26/02/2020 15:54, Andreas Baer wrote:
>  Hello,
> 
>  from a set of benchmark tests with large systems using Gromacs
>  versions 2019.5 and 2020, I obtained some unexpected results:
>  With the same set of parameters and the 2020 version, I obtain a
>  performance that is about 2/3 of the 2019.5 version. Interestingly,
>  according to nvidia-smi, the GPU usage is about 20% higher for the
>  2020 version.
>  Also from the log files it seems, that the 2020 version does the
>  computations more efficiently, but spends so much more time waiting,
>  that the overall performance drops.
> 
>  Some background info on the benchmarks:
>  - System contains about 2.1 million atoms.
>  - Hardware: 2x Intel Xeon Gold 6134 („Skylake“) @3.2 GHz = 16 cores +
>  SMT; 4x NVIDIA Tesla V100
> (similar results with less significant performance drop (~15%) on a
>  different machine: 2 or 4 nodes with each [2x Intel Xeon 2660v2 („Ivy
>  Bridge“) @ 2.2GHz = 20 cores + SMT; 2x NVIDIA Kepler K20])
>  - Several options for -ntmpi, -ntomp, -bonded, -pme are used to find
>  the optimal set. However the performance drop seems to be persistent
>  for all such options.
> 
>  Two representative log files are attached.
>  Does anyone have an idea, where this drop comes from, and how to
>  choose the parameters for the 2020 version to circumvent this?
> 
>  Regards,
>  Andreas
> 
> >> --
> >> Gromacs Users mailing list
> >>
> >> * Please search the archive at
> >> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> >> posting!
> >>
> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>
> >> * For (un)subscribe requests visit
> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> >> send a mail to gmx-users-requ...@gromacs.org.
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@gromacs.org.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Performance issues with Gromacs 2020 on GPUs - slower than 2019.5

2020-02-27 Thread Andreas Baer


Hi,

with the link below, additional log files for runs with 1 GPU should be 
accessible now.


Thank you for the comment with the rlist, I did not know, that this will 
affect the performance negatively. I know, about the nstcalcenergy, but 
I need it for several of my simulations.


Cheers,
Andreas

On 26.02.20 16:50, Szilárd Páll wrote:

Hi,

Can you please check the performance when running on a single GPU 2019 vs
2020 with your inputs?

Also note that you are using some peculiar settings that will have an
adverse effect on performance (like manually set rlist disallowing the dual
pair-list setup, and nstcalcenergy=1).

Cheers,

--
Szilárd


On Wed, Feb 26, 2020 at 4:11 PM Andreas Baer  wrote:


Hello,

here is a link to the logfiles.

https://faubox.rrze.uni-erlangen.de/getlink/fiX8wP1LwSBkHRoykw6ksjqY/benchmarks_2019-2020

If necessary, I can also provide some more log or tpr/gro/... files.

Cheers,
Andreas


On 26.02.20 16:09, Paul bauer wrote:

Hello,

you can't add attachments to the list, please upload the files
somewhere to share them.
This might be quite important to us, because the performance
regression is not expected by us.

Cheers

Paul

On 26/02/2020 15:54, Andreas Baer wrote:

Hello,

from a set of benchmark tests with large systems using Gromacs
versions 2019.5 and 2020, I obtained some unexpected results:
With the same set of parameters and the 2020 version, I obtain a
performance that is about 2/3 of the 2019.5 version. Interestingly,
according to nvidia-smi, the GPU usage is about 20% higher for the
2020 version.
Also from the log files it seems, that the 2020 version does the
computations more efficiently, but spends so much more time waiting,
that the overall performance drops.

Some background info on the benchmarks:
- System contains about 2.1 million atoms.
- Hardware: 2x Intel Xeon Gold 6134 („Skylake“) @3.2 GHz = 16 cores +
SMT; 4x NVIDIA Tesla V100
   (similar results with less significant performance drop (~15%) on a
different machine: 2 or 4 nodes with each [2x Intel Xeon 2660v2 („Ivy
Bridge“) @ 2.2GHz = 20 cores + SMT; 2x NVIDIA Kepler K20])
- Several options for -ntmpi, -ntomp, -bonded, -pme are used to find
the optimal set. However the performance drop seems to be persistent
for all such options.

Two representative log files are attached.
Does anyone have an idea, where this drop comes from, and how to
choose the parameters for the 2020 version to circumvent this?

Regards,
Andreas


--
Gromacs Users mailing list

* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-requ...@gromacs.org.


--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Performance issues with Gromacs 2020 on GPUs - slower than 2019.5

2020-02-26 Thread Szilárd Páll

Hi,

Can you please check the performance when running on a single GPU 2019 vs
2020 with your inputs?

Also note that you are using some peculiar settings that will have an
adverse effect on performance (like manually set rlist disallowing the dual
pair-list setup, and nstcalcenergy=1).

Cheers,

--
Szilárd


On Wed, Feb 26, 2020 at 4:11 PM Andreas Baer  wrote:

> Hello,
>
> here is a link to the logfiles.
>
> https://faubox.rrze.uni-erlangen.de/getlink/fiX8wP1LwSBkHRoykw6ksjqY/benchmarks_2019-2020
>
> If necessary, I can also provide some more log or tpr/gro/... files.
>
> Cheers,
> Andreas
>
>
> On 26.02.20 16:09, Paul bauer wrote:
> > Hello,
> >
> > you can't add attachments to the list, please upload the files
> > somewhere to share them.
> > This might be quite important to us, because the performance
> > regression is not expected by us.
> >
> > Cheers
> >
> > Paul
> >
> > On 26/02/2020 15:54, Andreas Baer wrote:
> >> Hello,
> >>
> >> from a set of benchmark tests with large systems using Gromacs
> >> versions 2019.5 and 2020, I obtained some unexpected results:
> >> With the same set of parameters and the 2020 version, I obtain a
> >> performance that is about 2/3 of the 2019.5 version. Interestingly,
> >> according to nvidia-smi, the GPU usage is about 20% higher for the
> >> 2020 version.
> >> Also from the log files it seems, that the 2020 version does the
> >> computations more efficiently, but spends so much more time waiting,
> >> that the overall performance drops.
> >>
> >> Some background info on the benchmarks:
> >> - System contains about 2.1 million atoms.
> >> - Hardware: 2x Intel Xeon Gold 6134 („Skylake“) @3.2 GHz = 16 cores +
> >> SMT; 4x NVIDIA Tesla V100
> >>   (similar results with less significant performance drop (~15%) on a
> >> different machine: 2 or 4 nodes with each [2x Intel Xeon 2660v2 („Ivy
> >> Bridge“) @ 2.2GHz = 20 cores + SMT; 2x NVIDIA Kepler K20])
> >> - Several options for -ntmpi, -ntomp, -bonded, -pme are used to find
> >> the optimal set. However the performance drop seems to be persistent
> >> for all such options.
> >>
> >> Two representative log files are attached.
> >> Does anyone have an idea, where this drop comes from, and how to
> >> choose the parameters for the 2020 version to circumvent this?
> >>
> >> Regards,
> >> Andreas
> >>
> >
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@gromacs.org.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Performance issues with Gromacs 2020 on GPUs - slower than 2019.5

2020-02-26 Thread Andreas Baer


Hello,

here is a link to the logfiles.
https://faubox.rrze.uni-erlangen.de/getlink/fiX8wP1LwSBkHRoykw6ksjqY/benchmarks_2019-2020

If necessary, I can also provide some more log or tpr/gro/... files.

Cheers,
Andreas


On 26.02.20 16:09, Paul bauer wrote:

Hello,

you can't add attachments to the list, please upload the files 
somewhere to share them.
This might be quite important to us, because the performance 
regression is not expected by us.


Cheers

Paul

On 26/02/2020 15:54, Andreas Baer wrote:

Hello,

from a set of benchmark tests with large systems using Gromacs 
versions 2019.5 and 2020, I obtained some unexpected results:
With the same set of parameters and the 2020 version, I obtain a 
performance that is about 2/3 of the 2019.5 version. Interestingly, 
according to nvidia-smi, the GPU usage is about 20% higher for the 
2020 version.
Also from the log files it seems, that the 2020 version does the 
computations more efficiently, but spends so much more time waiting, 
that the overall performance drops.


Some background info on the benchmarks:
- System contains about 2.1 million atoms.
- Hardware: 2x Intel Xeon Gold 6134 („Skylake“) @3.2 GHz = 16 cores + 
SMT; 4x NVIDIA Tesla V100
  (similar results with less significant performance drop (~15%) on a 
different machine: 2 or 4 nodes with each [2x Intel Xeon 2660v2 („Ivy 
Bridge“) @ 2.2GHz = 20 cores + SMT; 2x NVIDIA Kepler K20])
- Several options for -ntmpi, -ntomp, -bonded, -pme are used to find 
the optimal set. However the performance drop seems to be persistent 
for all such options.


Two representative log files are attached.
Does anyone have an idea, where this drop comes from, and how to 
choose the parameters for the 2020 version to circumvent this?


Regards,
Andreas





--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Performance issues with Gromacs 2020 on GPUs - slower than 2019.5

2020-02-26 Thread Paul bauer


Hello,

you can't add attachments to the list, please upload the files somewhere 
to share them.
This might be quite important to us, because the performance regression 
is not expected by us.


Cheers

Paul

On 26/02/2020 15:54, Andreas Baer wrote:

Hello,

from a set of benchmark tests with large systems using Gromacs 
versions 2019.5 and 2020, I obtained some unexpected results:
With the same set of parameters and the 2020 version, I obtain a 
performance that is about 2/3 of the 2019.5 version. Interestingly, 
according to nvidia-smi, the GPU usage is about 20% higher for the 
2020 version.
Also from the log files it seems, that the 2020 version does the 
computations more efficiently, but spends so much more time waiting, 
that the overall performance drops.


Some background info on the benchmarks:
- System contains about 2.1 million atoms.
- Hardware: 2x Intel Xeon Gold 6134 („Skylake“) @3.2 GHz = 16 cores + 
SMT; 4x NVIDIA Tesla V100
  (similar results with less significant performance drop (~15%) on a 
different machine: 2 or 4 nodes with each [2x Intel Xeon 2660v2 („Ivy 
Bridge“) @ 2.2GHz = 20 cores + SMT; 2x NVIDIA Kepler K20])
- Several options for -ntmpi, -ntomp, -bonded, -pme are used to find 
the optimal set. However the performance drop seems to be persistent 
for all such options.


Two representative log files are attached.
Does anyone have an idea, where this drop comes from, and how to 
choose the parameters for the 2020 version to circumvent this?


Regards,
Andreas



--
Paul Bauer, PhD
GROMACS Development Manager
KTH Stockholm, SciLifeLab
0046737308594

--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

[gmx-users] Performance issues with Gromacs 2020 on GPUs - slower than 2019.5

2020-02-26 Thread Andreas Baer


Hello,

from a set of benchmark tests with large systems using Gromacs versions 
2019.5 and 2020, I obtained some unexpected results:
With the same set of parameters and the 2020 version, I obtain a 
performance that is about 2/3 of the 2019.5 version. Interestingly, 
according to nvidia-smi, the GPU usage is about 20% higher for the 2020 
version.
Also from the log files it seems, that the 2020 version does the 
computations more efficiently, but spends so much more time waiting, 
that the overall performance drops.


Some background info on the benchmarks:
- System contains about 2.1 million atoms.
- Hardware: 2x Intel Xeon Gold 6134 („Skylake“) @3.2 GHz = 16 cores + 
SMT; 4x NVIDIA Tesla V100
  (similar results with less significant performance drop (~15%) on a 
different machine: 2 or 4 nodes with each [2x Intel Xeon 2660v2 („Ivy 
Bridge“) @ 2.2GHz = 20 cores + SMT; 2x NVIDIA Kepler K20])
- Several options for -ntmpi, -ntomp, -bonded, -pme are used to find the 
optimal set. However the performance drop seems to be persistent for all 
such options.


Two representative log files are attached.
Does anyone have an idea, where this drop comes from, and how to choose 
the parameters for the 2020 version to circumvent this?


Regards,
Andreas
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

[gmx-users] Performance GROMACS on GPU

2019-12-09 Thread Talarico Carmine

Hi,
I run three simulations on 1 node with 3 GPU by using an increasing number of 
GPUs.

This is my system:
___
1 node with total 36 cores, 72 logical cores, 3 compatible GPUs

GROMACS version:2018.2
Precision:  single
Memory model:   64 bit
MPI library:thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support:CUDA
SIMD instructions:  AVX2_256
FFT library:fftw-3.3.8-sse2-avx-avx2-avx2_128-avx512
RDTSCP usage:   enabled
TNG support:enabled
Hwloc support:  hwloc-1.11.0
Tracing support:disabled
Built on:   2018-08-01 09:03:03

GPU info:
Number of GPUs detected: 3
#0: NVIDIA Tesla V100-PCIE-32GB, compute cap.: 7.0, ECC: yes, stat: 
compatible
#1: NVIDIA Tesla V100-PCIE-32GB, compute cap.: 7.0, ECC: yes, stat: 
compatible
#2: NVIDIA Tesla V100-PCIE-32GB, compute cap.: 7.0, ECC: yes, stat: 
compatible
___

These are the launched commands on two different system size and related 
performances:

Command

ns/day

h/ns

Alchol Dehydrogenase system (95561 atoms)

gmx mdrun -deffnm topol -nb gpu -pme gpu -ntmpi 1 -ntomp 12

53.355

0.45

gmx mdrun -deffnm topol -nb gpu -pme gpu -ntmpi 2 -ntomp 12 -npme 1 -gputasks 01

53.176

0.451

gmx mdrun -deffnm topol -nb gpu -pme gpu -ntmpi 3 -ntomp 12 -npme 1 -gputasks 
012

50.024

0.48

Villin system (4723 atoms)

gmx mdrun -deffnm topol -nb gpu -pme gpu -ntmpi 1 -ntomp 12

589.635

0.041

gmx mdrun -deffnm topol -nb gpu -pme gpu -ntmpi 2 -ntomp 12 -npme 1 -gputasks 01

727.139

0.033

gmx mdrun -deffnm topol -nb gpu -pme gpu -ntmpi 3 -ntomp 12 -npme 1 -gputasks 
012

664.695

0.036


Despite the performances seems very strange, because by increasing the number 
of GPUs in a big system I can't seeing a speedup, while in a small system the 
performance's peak is reached with 2 GPU,
can I ask to all of you if I'm using the GPU's selection options in the right 
way?

Moreover, I'm not sure about the right usage of -ntomp option, I thought to ask 
in another session.

Thanks a lot!
Carmine

---

CONFIDENTIALITY NOTICE
 
This message and its attachments are addressed solely to the persons
above and may contain confidential information. If you have received
the message in error, be informed that any use of the content hereof
is prohibited. Please return it immediately to the sender and delete
the message.
 

Thank you

-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] [Performance] poor performance with NV V100

2019-10-16 Thread Szilárd Páll

Hi,

Please keep the conversation on the mailing list.

GROMACS uses both CPUs and GPUs for computation. Your runs limit core count
per rank, and do so in a way that the rest of the cores are left idle. This
is not a suitable approach for realistic benchmarking due to the clock
boosting skewing your scaling results.

Secondly, you should consider using PME offload as well, see the docs and
previous discussion on the list how to do so.

Last, if you are evaluating hardware for some use-cases, do make sure you
set up your benchmarks such that they reflect the intended use cases (e.g.
scaling vs throughput), and please check out the best practices for how to
run GROMACS on GPU servers.

You might also be interested in a recent study we did:
https://onlinelibrary.wiley.com/doi/full/10.1002/jcc.26011

Cheers,

--
Szilárd


On Tue, Oct 8, 2019 at 3:00 PM Jimmy Chen  wrote:

> Hi Szilard,
>
> Thanks for your help.
> Is md.log enough for you to clarify where the bottleneck is located?
> If you need another log, please let me know.
>
> I just checked the release note of 2019.4, I didn't see any major release
> impact the performance of intra-node.
>
> http://manual.gromacs.org/documentation/2020-beta1/release-notes/2019/2019.4.html
>
> anyway, I will have a try on 2019.4 later.
>
> looking forward to check new feature which will be on 2/3 beta release of
> 2020.
>
> Best regards,
> Jimmy
>
>
> Szilárd Páll  於 2019年10月8日 週二 下午8:34寫道：
>
>> Hi,
>>
>> Can you please share your log files? we may be able to help with spotting
>> performance issues or bottlenecks.
>> However, note that for NVIDIA are the best source to aid you with
>> reproducing their benchmark numbers, we
>>
>> Scaling across multiple GPUs requires some tuning of command line options,
>> please see the related discussion on the list ((briefly: use multiple
>> ranks
>> per GPU, and one separate PME rank with GPU offload).
>>
>> Also note that intra-node strong scaling optimization target of recent
>> releases (there are no p2p optimizations either), however new features
>> going into the 2020 release will improve things significantly. Keep an eye
>> out on the beta2/3 releases if you are interested in checking out the new
>> features.
>>
>> Cheers,
>> --
>> Szilárd
>>
>>
>> On Mon, Oct 7, 2019 at 7:48 AM Jimmy Chen  wrote:
>>
>> > Hi,
>> >
>> > I'm using NV v100 to evaluate if it's suitable to do purchase.
>> > But I can't get similar test result as referenced performance data
>> > which was got from internet.
>> > https://developer.nvidia.com/hpc-application-performance
>> >
>> >
>> https://www.hpc.co.jp/images/pdf/benchmark/Molecular-Dynamics-March-2018.pdf
>> >
>> >
>> > No matter using docker tag 18.02 from
>> > https://ngc.nvidia.com/catalog/containers/hpc:gromacs/tags
>> >
>> > or gromacs source code from
>> > ftp://ftp.gromacs.org/pub/gromacs/gromacs-2019.3.tar.gz
>> >
>> > test data set is ADH dodec and water 1.5M
>> > gmx grompp -f pme_verlet.mdp
>> > gmx mdrun -ntmpi 1 -nb gpu -pin on -v -noconfout -nsteps 5000 -s
>> topol.tpr
>> > -ntomp 4
>> > and  gmx mdrun -ntmpi 2 -nb gpu -pin on -v -noconfout -nsteps 5000 -s
>> > topol.tpr -ntomp 4
>> >
>> > My CPU is Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz
>> > and GPU is NV V100 16GB PCIE.
>> >
>> > For ADH dodec,
>> > The perf data of 2xV100 16GB PCIE in
>> > https://developer.nvidia.com/hpc-application-performance is 176
>> (ns/day).
>> > But I only can get 28 (ns/day). actually I can get 67(ns/day) with
>> 1xV100.
>> > I don't know why I got poorer result with 2xV100.
>> >
>> > For water 1.5M
>> > The perf data of 1xV100 16GB PCIE in
>> >
>> >
>> https://www.hpc.co.jp/images/pdf/benchmark/Molecular-Dynamics-March-2018.pdf
>> > is
>> > 9.83(ns/day) and 2xV100 is 10.41(ns/day).
>> > But what I got is 6.5(ns/day) with 1xV100 and 2(ns/day) with 2xV100.
>> >
>> > Could anyone give me some suggestions about how to clarify what's
>> problem
>> > to result to this perf data in my environment? Is my command to perform
>> the
>> > testing wrong? any suggested command to perform the testing?
>> > or which source code version is recommended to use now?
>> >
>> > btw, after checking the code, it seems MPI doesn't go through PCIE P2p
>> or
>> > RDMA, is it correct? any plan to implement this in MPI?
>> >
>> > Best regards,
>> > Jimmy
>> > --
>> > Gromacs Users mailing list
>> >
>> > * Please search the archive at
>> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> > posting!
>> >
>> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >
>> > * For (un)subscribe requests visit
>> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> > send a mail to gmx-users-requ...@gromacs.org.
>> >
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>>

Re: [gmx-users] [Performance] poor performance with NV V100

2019-10-08 Thread Szilárd Páll

Hi,

Can you please share your log files? we may be able to help with spotting
performance issues or bottlenecks.
However, note that for NVIDIA are the best source to aid you with
reproducing their benchmark numbers, we

Scaling across multiple GPUs requires some tuning of command line options,
please see the related discussion on the list ((briefly: use multiple ranks
per GPU, and one separate PME rank with GPU offload).

Also note that intra-node strong scaling optimization target of recent
releases (there are no p2p optimizations either), however new features
going into the 2020 release will improve things significantly. Keep an eye
out on the beta2/3 releases if you are interested in checking out the new
features.

Cheers,
--
Szilárd

On Mon, Oct 7, 2019 at 7:48 AM Jimmy Chen  wrote:

> Hi,
>
> I'm using NV v100 to evaluate if it's suitable to do purchase.
> But I can't get similar test result as referenced performance data
> which was got from internet.
> https://developer.nvidia.com/hpc-application-performance
>
> https://www.hpc.co.jp/images/pdf/benchmark/Molecular-Dynamics-March-2018.pdf
>
>
> No matter using docker tag 18.02 from
> https://ngc.nvidia.com/catalog/containers/hpc:gromacs/tags
>
> or gromacs source code from
> ftp://ftp.gromacs.org/pub/gromacs/gromacs-2019.3.tar.gz
>
> test data set is ADH dodec and water 1.5M
> gmx grompp -f pme_verlet.mdp
> gmx mdrun -ntmpi 1 -nb gpu -pin on -v -noconfout -nsteps 5000 -s topol.tpr
> -ntomp 4
> and  gmx mdrun -ntmpi 2 -nb gpu -pin on -v -noconfout -nsteps 5000 -s
> topol.tpr -ntomp 4
>
> My CPU is Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz
> and GPU is NV V100 16GB PCIE.
>
> For ADH dodec,
> The perf data of 2xV100 16GB PCIE in
> https://developer.nvidia.com/hpc-application-performance is 176 (ns/day).
> But I only can get 28 (ns/day). actually I can get 67(ns/day) with 1xV100.
> I don't know why I got poorer result with 2xV100.
>
> For water 1.5M
> The perf data of 1xV100 16GB PCIE in
>
> https://www.hpc.co.jp/images/pdf/benchmark/Molecular-Dynamics-March-2018.pdf
> is
> 9.83(ns/day) and 2xV100 is 10.41(ns/day).
> But what I got is 6.5(ns/day) with 1xV100 and 2(ns/day) with 2xV100.
>
> Could anyone give me some suggestions about how to clarify what's problem
> to result to this perf data in my environment? Is my command to perform the
> testing wrong? any suggested command to perform the testing?
> or which source code version is recommended to use now?
>
> btw, after checking the code, it seems MPI doesn't go through PCIE P2p or
> RDMA, is it correct? any plan to implement this in MPI?
>
> Best regards,
> Jimmy
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@gromacs.org.
>
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

[gmx-users] [Performance] poor performance with NV V100

2019-10-06 Thread Jimmy Chen

Hi,

I'm using NV v100 to evaluate if it's suitable to do purchase.
But I can't get similar test result as referenced performance data
which was got from internet.
https://developer.nvidia.com/hpc-application-performance
https://www.hpc.co.jp/images/pdf/benchmark/Molecular-Dynamics-March-2018.pdf


No matter using docker tag 18.02 from
https://ngc.nvidia.com/catalog/containers/hpc:gromacs/tags

or gromacs source code from
ftp://ftp.gromacs.org/pub/gromacs/gromacs-2019.3.tar.gz

test data set is ADH dodec and water 1.5M
gmx grompp -f pme_verlet.mdp
gmx mdrun -ntmpi 1 -nb gpu -pin on -v -noconfout -nsteps 5000 -s topol.tpr
-ntomp 4
and  gmx mdrun -ntmpi 2 -nb gpu -pin on -v -noconfout -nsteps 5000 -s
topol.tpr -ntomp 4

My CPU is Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz
and GPU is NV V100 16GB PCIE.

For ADH dodec,
The perf data of 2xV100 16GB PCIE in
https://developer.nvidia.com/hpc-application-performance is 176 (ns/day).
But I only can get 28 (ns/day). actually I can get 67(ns/day) with 1xV100.
I don't know why I got poorer result with 2xV100.

For water 1.5M
The perf data of 1xV100 16GB PCIE in
https://www.hpc.co.jp/images/pdf/benchmark/Molecular-Dynamics-March-2018.pdf is
9.83(ns/day) and 2xV100 is 10.41(ns/day).
But what I got is 6.5(ns/day) with 1xV100 and 2(ns/day) with 2xV100.

Could anyone give me some suggestions about how to clarify what's problem
to result to this perf data in my environment? Is my command to perform the
testing wrong? any suggested command to perform the testing?
or which source code version is recommended to use now?

btw, after checking the code, it seems MPI doesn't go through PCIE P2p or
RDMA, is it correct? any plan to implement this in MPI?

Best regards,
Jimmy
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

[gmx-users] Performance with Epyc Rome

2019-08-28 Thread Jochen Hub


Dear Gromacs users,

does someone already have experience with the new AMD Epyc Rome? Can we 
expect that 4 Epyc Cores per Nvidia RTX 2080 on a CPU/GPU node is 
sufficient for common simulations (as one would expect with an common 
Intel Xeon)?


Many thanks,
Jochen


--
---
Dr. Jochen Hub
Computational Molecular Biophysics Group
Institute for Microbiology and Genetics
Georg-August-University of Göttingen
Justus-von-Liebig-Weg 11, 37077 Göttingen, Germany.
Phone: +49-551-39-14189
http://cmb.bio.uni-goettingen.de/
---
--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Performance, gpu

2019-08-28 Thread Mark Abraham

Hi,

Your command chooses 44 PME ranks, thus 88-44=44 PP ranks. It gives each of
those 6 threads and 4 threads respectively. That's 44*6+44*4 threads which
is very much larger than the 88 total cores in your 4 nodes, ie
over-subscription. The number of PME-only ranks just changes how much it's
over-subscribed.

I'd be starting my investigation with

aprun -n 8 gmx_mpi mdrun -npme 4

and let the defaults work out that there's 1 GPU and 11 OpenMP threads per
rank to achieve full utilization.

Mark

On Wed, 28 Aug 2019 at 17:31, Alex  wrote:

> Dear all,
> Whatever "-npme" likes 22, 44, 24, 48 ..  I use in below command, I always
> get the "WARNING: On rank 0: oversubscribing the available XXX logical CPU
> core per node with 88 threads, This will cause considerable performance
> loss."
>
> aprun -n 88 gmx_mpi mdrun -deffnm out -s out.tpr -g out.log -v -dlb yes
> -gcom 1 -nb gpu -npme 44 -ntomp 4 -ntomp_pme 6 -tunepme yes
>
> would you please help me choose a correct combinations of -npme and  ...
> to get a better performance, according to the attached case.log file in my
> previous email?
> Regards,
> Alex
>
> On Sat, Aug 24, 2019 at 11:21 AM Mark Abraham 
> wrote:
>
> > Hi,
> >
> > There's a thread oversubscription warning in your log file that you
> should
> > definitely have read and acted upon :-) I'd be running more like one PP
> > rank per gpu and 4 PME ranks, picking ntomp and ntomp_pme according to
> what
> > gives best performance (which could require configuring your MPI
> invocation
> > accordingly).
> >
> > Mark
> >
> > On Fri., 23 Aug. 2019, 21:00 Alex,  wrote:
> >
> > > Dear Gromacs user,
> > > Using a machine with below configurations and also below command I
> tried
> > to
> > > simulate a system with 479K atoms (mainly water) on CPU-GPU, the
> > > performance is around 1ns per 1 hour.
> > > According the information and also shared log file below, I would be so
> > > appreciated if you could comment on the submission command to improve
> the
> > > performance by involving better the GPU and CPU.
> > >
> > > %
> > > #PBS -l select=4:ncpus=22:mpiprocs=22:ngpus=1
> > > export OMP_NUM_THREADS=4
> > >
> > > aprun -n 88 gmx_mpi mdrun -deffnm out -s out.tpr -g out.log -v -dlb yes
> > > -gcom 1 -nb gpu -npme 44 -ntomp 4 -ntomp_pme 6 -tunepme yes
> > >
> > > Running on 4 nodes with total 88 cores, 176 logical cores, 4 compatible
> > > GPUs
> > >   Cores per node:   22
> > >   Logical cores per node:   44
> > >   Compatible GPUs per node:  1
> > >   All nodes have identical type(s) of GPUs
> > >
> > > %
> > > GROMACS version:2018.1
> > > Precision:  single
> > > Memory model:   64 bit
> > > MPI library:MPI
> > > OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
> > > GPU support:CUDA
> > > SIMD instructions:  AVX2_256
> > > FFT library:
> commercial-fftw-3.3.6-pl1-fma-sse2-avx-avx2-avx2_128
> > > RDTSCP usage:   enabled
> > > TNG support:enabled
> > > Hwloc support:  hwloc-1.11.0
> > > Tracing support:disabled
> > > Built on:   2018-09-12 20:34:33
> > > Built by:   
> > > Build OS/arch:  Linux 3.12.61-52.111-default x86_64
> > > Build CPU vendor:   Intel
> > > Build CPU brand:Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> > > Build CPU family:   6   Model: 79   Stepping: 1
> > > Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma hle
> > htt
> > > intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse
> > rdrnd
> > > rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
> > > C compiler: /opt/cray/pe/craype/2.5.13/bin/cc GNU 5.3.0
> > > C compiler flags:-march=core-avx2 -O3 -DNDEBUG
> -funroll-all-loops
> > > -fexcess-precision=fast
> > > C++ compiler:   /opt/cray/pe/craype/2.5.13/bin/CC GNU 5.3.0
> > > C++ compiler flags:  -march=core-avx2-std=c++11   -O3 -DNDEBUG
> > > -funroll-all-loops -fexcess-precision=fast
> > > CUDA compiler:
> > > /opt/nvidia/cudatoolkit8.0/8.0.61_2.3.13_g32c34f9-2.1/bin/nvcc nvcc:
> > NVIDIA
> > > (R) Cuda compiler driver;Copyright (c) 2005-2016 NVIDIA
> Corporation;Built
> > > on Tue_Jan_10_13:22:03_CST_2017;Cuda compilation tools, release 8.0,
> > > V8.0.61
> > > CUDA compiler
> > >
> > >
> >
> flags:-gencode;arch=compute_60,code=sm_60;-use_fast_math;-Wno-deprecated-gpu-targets;;;
> > >
> > >
> >
> ;-march=core-avx2;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
> > > CUDA driver:9.20
> > > CUDA runtime:   8.0
> > > %-
> > > Log file:
> > > https://drive.google.com/open?id=1-myQ5rP85UWKb1262QDPa6kYhuzHPzLu
> > >
> > > Thank you,
> > > Alex
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at
> > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > posting!
> > >
> > > * Can't post? Read

Re: [gmx-users] Performance, gpu

2019-08-28 Thread Alex

Dear all,
Whatever "-npme" likes 22, 44, 24, 48 ..  I use in below command, I always
get the "WARNING: On rank 0: oversubscribing the available XXX logical CPU
core per node with 88 threads, This will cause considerable performance
loss."

aprun -n 88 gmx_mpi mdrun -deffnm out -s out.tpr -g out.log -v -dlb yes
-gcom 1 -nb gpu -npme 44 -ntomp 4 -ntomp_pme 6 -tunepme yes

would you please help me choose a correct combinations of -npme and  ...
to get a better performance, according to the attached case.log file in my
previous email?
Regards,
Alex

On Sat, Aug 24, 2019 at 11:21 AM Mark Abraham 
wrote:

> Hi,
>
> There's a thread oversubscription warning in your log file that you should
> definitely have read and acted upon :-) I'd be running more like one PP
> rank per gpu and 4 PME ranks, picking ntomp and ntomp_pme according to what
> gives best performance (which could require configuring your MPI invocation
> accordingly).
>
> Mark
>
> On Fri., 23 Aug. 2019, 21:00 Alex,  wrote:
>
> > Dear Gromacs user,
> > Using a machine with below configurations and also below command I tried
> to
> > simulate a system with 479K atoms (mainly water) on CPU-GPU, the
> > performance is around 1ns per 1 hour.
> > According the information and also shared log file below, I would be so
> > appreciated if you could comment on the submission command to improve the
> > performance by involving better the GPU and CPU.
> >
> > %
> > #PBS -l select=4:ncpus=22:mpiprocs=22:ngpus=1
> > export OMP_NUM_THREADS=4
> >
> > aprun -n 88 gmx_mpi mdrun -deffnm out -s out.tpr -g out.log -v -dlb yes
> > -gcom 1 -nb gpu -npme 44 -ntomp 4 -ntomp_pme 6 -tunepme yes
> >
> > Running on 4 nodes with total 88 cores, 176 logical cores, 4 compatible
> > GPUs
> >   Cores per node:   22
> >   Logical cores per node:   44
> >   Compatible GPUs per node:  1
> >   All nodes have identical type(s) of GPUs
> >
> > %
> > GROMACS version:2018.1
> > Precision:  single
> > Memory model:   64 bit
> > MPI library:MPI
> > OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
> > GPU support:CUDA
> > SIMD instructions:  AVX2_256
> > FFT library:commercial-fftw-3.3.6-pl1-fma-sse2-avx-avx2-avx2_128
> > RDTSCP usage:   enabled
> > TNG support:enabled
> > Hwloc support:  hwloc-1.11.0
> > Tracing support:disabled
> > Built on:   2018-09-12 20:34:33
> > Built by:   
> > Build OS/arch:  Linux 3.12.61-52.111-default x86_64
> > Build CPU vendor:   Intel
> > Build CPU brand:Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> > Build CPU family:   6   Model: 79   Stepping: 1
> > Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma hle
> htt
> > intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse
> rdrnd
> > rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
> > C compiler: /opt/cray/pe/craype/2.5.13/bin/cc GNU 5.3.0
> > C compiler flags:-march=core-avx2 -O3 -DNDEBUG -funroll-all-loops
> > -fexcess-precision=fast
> > C++ compiler:   /opt/cray/pe/craype/2.5.13/bin/CC GNU 5.3.0
> > C++ compiler flags:  -march=core-avx2-std=c++11   -O3 -DNDEBUG
> > -funroll-all-loops -fexcess-precision=fast
> > CUDA compiler:
> > /opt/nvidia/cudatoolkit8.0/8.0.61_2.3.13_g32c34f9-2.1/bin/nvcc nvcc:
> NVIDIA
> > (R) Cuda compiler driver;Copyright (c) 2005-2016 NVIDIA Corporation;Built
> > on Tue_Jan_10_13:22:03_CST_2017;Cuda compilation tools, release 8.0,
> > V8.0.61
> > CUDA compiler
> >
> >
> flags:-gencode;arch=compute_60,code=sm_60;-use_fast_math;-Wno-deprecated-gpu-targets;;;
> >
> >
> ;-march=core-avx2;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
> > CUDA driver:9.20
> > CUDA runtime:   8.0
> > %-
> > Log file:
> > https://drive.google.com/open?id=1-myQ5rP85UWKb1262QDPa6kYhuzHPzLu
> >
> > Thank you,
> > Alex
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-requ...@gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@gromacs.org.
>
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read

Re: [gmx-users] Performance, gpu

2019-08-24 Thread Mark Abraham

Hi,

There's a thread oversubscription warning in your log file that you should
definitely have read and acted upon :-) I'd be running more like one PP
rank per gpu and 4 PME ranks, picking ntomp and ntomp_pme according to what
gives best performance (which could require configuring your MPI invocation
accordingly).

Mark

On Fri., 23 Aug. 2019, 21:00 Alex,  wrote:

> Dear Gromacs user,
> Using a machine with below configurations and also below command I tried to
> simulate a system with 479K atoms (mainly water) on CPU-GPU, the
> performance is around 1ns per 1 hour.
> According the information and also shared log file below, I would be so
> appreciated if you could comment on the submission command to improve the
> performance by involving better the GPU and CPU.
>
> %
> #PBS -l select=4:ncpus=22:mpiprocs=22:ngpus=1
> export OMP_NUM_THREADS=4
>
> aprun -n 88 gmx_mpi mdrun -deffnm out -s out.tpr -g out.log -v -dlb yes
> -gcom 1 -nb gpu -npme 44 -ntomp 4 -ntomp_pme 6 -tunepme yes
>
> Running on 4 nodes with total 88 cores, 176 logical cores, 4 compatible
> GPUs
>   Cores per node:   22
>   Logical cores per node:   44
>   Compatible GPUs per node:  1
>   All nodes have identical type(s) of GPUs
>
> %
> GROMACS version:2018.1
> Precision:  single
> Memory model:   64 bit
> MPI library:MPI
> OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
> GPU support:CUDA
> SIMD instructions:  AVX2_256
> FFT library:commercial-fftw-3.3.6-pl1-fma-sse2-avx-avx2-avx2_128
> RDTSCP usage:   enabled
> TNG support:enabled
> Hwloc support:  hwloc-1.11.0
> Tracing support:disabled
> Built on:   2018-09-12 20:34:33
> Built by:   
> Build OS/arch:  Linux 3.12.61-52.111-default x86_64
> Build CPU vendor:   Intel
> Build CPU brand:Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> Build CPU family:   6   Model: 79   Stepping: 1
> Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma hle htt
> intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd
> rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
> C compiler: /opt/cray/pe/craype/2.5.13/bin/cc GNU 5.3.0
> C compiler flags:-march=core-avx2 -O3 -DNDEBUG -funroll-all-loops
> -fexcess-precision=fast
> C++ compiler:   /opt/cray/pe/craype/2.5.13/bin/CC GNU 5.3.0
> C++ compiler flags:  -march=core-avx2-std=c++11   -O3 -DNDEBUG
> -funroll-all-loops -fexcess-precision=fast
> CUDA compiler:
> /opt/nvidia/cudatoolkit8.0/8.0.61_2.3.13_g32c34f9-2.1/bin/nvcc nvcc: NVIDIA
> (R) Cuda compiler driver;Copyright (c) 2005-2016 NVIDIA Corporation;Built
> on Tue_Jan_10_13:22:03_CST_2017;Cuda compilation tools, release 8.0,
> V8.0.61
> CUDA compiler
>
> flags:-gencode;arch=compute_60,code=sm_60;-use_fast_math;-Wno-deprecated-gpu-targets;;;
>
> ;-march=core-avx2;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
> CUDA driver:9.20
> CUDA runtime:   8.0
> %-
> Log file:
> https://drive.google.com/open?id=1-myQ5rP85UWKb1262QDPa6kYhuzHPzLu
>
> Thank you,
> Alex
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@gromacs.org.
>
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

[gmx-users] Performance, gpu

2019-08-23 Thread Alex

Dear Gromacs user,
Using a machine with below configurations and also below command I tried to
simulate a system with 479K atoms (mainly water) on CPU-GPU, the
performance is around 1ns per 1 hour.
According the information and also shared log file below, I would be so
appreciated if you could comment on the submission command to improve the
performance by involving better the GPU and CPU.

%
#PBS -l select=4:ncpus=22:mpiprocs=22:ngpus=1
export OMP_NUM_THREADS=4

aprun -n 88 gmx_mpi mdrun -deffnm out -s out.tpr -g out.log -v -dlb yes
-gcom 1 -nb gpu -npme 44 -ntomp 4 -ntomp_pme 6 -tunepme yes

Running on 4 nodes with total 88 cores, 176 logical cores, 4 compatible GPUs
  Cores per node:   22
  Logical cores per node:   44
  Compatible GPUs per node:  1
  All nodes have identical type(s) of GPUs

%
GROMACS version:2018.1
Precision:  single
Memory model:   64 bit
MPI library:MPI
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support:CUDA
SIMD instructions:  AVX2_256
FFT library:commercial-fftw-3.3.6-pl1-fma-sse2-avx-avx2-avx2_128
RDTSCP usage:   enabled
TNG support:enabled
Hwloc support:  hwloc-1.11.0
Tracing support:disabled
Built on:   2018-09-12 20:34:33
Built by:   
Build OS/arch:  Linux 3.12.61-52.111-default x86_64
Build CPU vendor:   Intel
Build CPU brand:Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
Build CPU family:   6   Model: 79   Stepping: 1
Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma hle htt
intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd
rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
C compiler: /opt/cray/pe/craype/2.5.13/bin/cc GNU 5.3.0
C compiler flags:-march=core-avx2 -O3 -DNDEBUG -funroll-all-loops
-fexcess-precision=fast
C++ compiler:   /opt/cray/pe/craype/2.5.13/bin/CC GNU 5.3.0
C++ compiler flags:  -march=core-avx2-std=c++11   -O3 -DNDEBUG
-funroll-all-loops -fexcess-precision=fast
CUDA compiler:
/opt/nvidia/cudatoolkit8.0/8.0.61_2.3.13_g32c34f9-2.1/bin/nvcc nvcc: NVIDIA
(R) Cuda compiler driver;Copyright (c) 2005-2016 NVIDIA Corporation;Built
on Tue_Jan_10_13:22:03_CST_2017;Cuda compilation tools, release 8.0, V8.0.61
CUDA compiler
flags:-gencode;arch=compute_60,code=sm_60;-use_fast_math;-Wno-deprecated-gpu-targets;;;
;-march=core-avx2;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
CUDA driver:9.20
CUDA runtime:   8.0
%-
Log file:
https://drive.google.com/open?id=1-myQ5rP85UWKb1262QDPa6kYhuzHPzLu

Thank you,
Alex
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] performance issues running gromacs with more than 1 gpu card in slurm

2019-07-30 Thread Szilárd Páll

On Tue, Jul 30, 2019 at 3:29 PM Carlos Navarro
 wrote:
>
> Hi all,
> First of all, thanks for all your valuable inputs!!.
> I tried Szilárd suggestion (multi simulations) with the following commands
> (using a single node):
>
> EXE="mpirun -np 4 gmx_mpi mdrun "
>
> cd $WORKDIR0
> #$DO_PARALLEL
> $EXE -s 4q.tpr -deffnm 4q -dlb yes -resethway -multidir 1 2 3 4
> And I noticed that the performance went from 37,32,23,22 ns/day to ~42
> ns/day in all four simulations. I check that the 80 processors were been
> used a 100% of the time, while the gpu was used about a 50% (from a 70%
> when running a single simulation in the node where I obtain a performance
> of ~50 ns/day).

Great!

Note that optimizing hardware utilization doesn't always maximize performance.

Also, manual launches with pinoffset/pinstride will give exactly the
same performance as the multi runs *if* you get the affinities right.
In your original commands you tried to use 20 of the 80 threads/rank,
but you offset the runs only by 10 (hardware threads) which means that
runs  were overlapping and interfering with each other as well as
ending up under-utilizing the hardware.

> So overall I'm quite happy with the performance I'm getting now; and
> honestly, I don't know if at some point I can get the same performance
> (running 4 jobs) that I'm getting running just one.

No, but you _may_ get a bit more aggregate performance if you run 8
concurrent jobs. Also, you cna try 1 thread per core ("mpirun -np 4
gmx mdrun_mpi -multi 4 -ntomp 10 -pin on to use only half of the
threads),

Cheers,
--
Szilárd

> Best regards,
> Carlos
>
> ——
> Carlos Navarro Retamal
> Bioinformatic Engineering. PhD.
> Postdoctoral Researcher in Center of Bioinformatics and Molecular
> Simulations
> Universidad de Talca
> Av. Lircay S/N, Talca, Chile
> E: carlos.navarr...@gmail.com or cnava...@utalca.cl
>
> On July 29, 2019 at 6:11:31 PM, Mark Abraham (mark.j.abra...@gmail.com)
> wrote:
>
> Hi,
>
> Yes and the -nmpi I copied from Carlos's post is ineffective - use -ntmpi
>
> Mark
>
>
> On Mon., 29 Jul. 2019, 15:15 Justin Lemkul,  wrote:
>
> >
> >
> > On 7/29/19 8:46 AM, Carlos Navarro wrote:
> > > Hi Mark,
> > > I tried that before, but unfortunately in that case (removing
> —gres=gpu:1
> > > and including in each line the -gpu_id flag) for some reason the jobs
> are
> > > run one at a time (one after the other), so I can’t use properly the
> > whole
> > > node.
> > >
> >
> > You need to run all but the last mdrun process in the background (&).
> >
> > -Justin
> >
> > > ——
> > > Carlos Navarro Retamal
> > > Bioinformatic Engineering. PhD.
> > > Postdoctoral Researcher in Center of Bioinformatics and Molecular
> > > Simulations
> > > Universidad de Talca
> > > Av. Lircay S/N, Talca, Chile
> > > E: carlos.navarr...@gmail.com or cnava...@utalca.cl
> > >
> > > On July 29, 2019 at 11:48:21 AM, Mark Abraham (mark.j.abra...@gmail.com)
> > > wrote:
> > >
> > > Hi,
> > >
> > > When you use
> > >
> > > DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 "
> > >
> > > then the environment seems to make sure only one GPU is visible. (The
> log
> > > files report only finding one GPU.) But it's probably the same GPU in
> > each
> > > case, with three remaining idle. I would suggest not using --gres unless
> > > you can specify *which* of the four available GPUs each run can use.
> > >
> > > Otherwise, don't use --gres and use the facilities built into GROMACS,
> > e.g.
> > >
> > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0
> > > -ntomp 20 -gpu_id 0
> > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset
> 10
> > > -ntomp 20 -gpu_id 1
> > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset
> 20
> > > -ntomp 20 -gpu_id 2
> > > etc.
> > >
> > > Mark
> > >
> > > On Mon, 29 Jul 2019 at 11:34, Carlos Navarro  > >
> > > wrote:
> > >
> > >> Hi Szilárd,
> > >> To answer your questions:
> > >> **are you trying to run multiple simulations concurrently on the same
> > >> node or are you trying to strong-scale?
> > >> I'm trying to run multiple simulations on the same node at the same
> > time.
> > >>
> > >> ** what are you simulating?
> > >> Regular and CompEl simulations
> > >>
> > >> ** can you provide log files of the runs?
> > >> In the following link are some logs files:
> > >> https://www.dropbox.com/s/7q249vbqqwf5r03/Archive.zip?dl=0.
> > >> In short, alone.log -> single run in the node (using 1 gpu).
> > >> multi1/2/3/4.log ->4 independent simulations ran at the same time in a
> > >> single node. In all cases, 20 cpus are used.
> > >> Best regards,
> > >> Carlos
> > >>
> > >> El jue., 25 jul. 2019 a las 10:59, Szilárd Páll (<
> > pall.szil...@gmail.com>)
> > >> escribió:
> > >>
> > >>> Hi,
> > >>>
> > >>> It is not clear to me how are you trying to set up your runs, so
> > >>> please provide some details:
> > >>> - are you trying to run multiple simulations concurrently on the same
> > >>> node or are you trying to

Re: [gmx-users] performance issues running gromacs with more than 1 gpu card in slurm

2019-07-30 Thread Carlos Navarro

Hi all,
First of all, thanks for all your valuable inputs!!.
I tried Szilárd suggestion (multi simulations) with the following commands
(using a single node):

EXE="mpirun -np 4 gmx_mpi mdrun "

cd $WORKDIR0
#$DO_PARALLEL
$EXE -s 4q.tpr -deffnm 4q -dlb yes -resethway -multidir 1 2 3 4
And I noticed that the performance went from 37,32,23,22 ns/day to ~42
ns/day in all four simulations. I check that the 80 processors were been
used a 100% of the time, while the gpu was used about a 50% (from a 70%
when running a single simulation in the node where I obtain a performance
of ~50 ns/day).
So overall I'm quite happy with the performance I'm getting now; and
honestly, I don't know if at some point I can get the same performance
(running 4 jobs) that I'm getting running just one.
Best regards,
Carlos

——
Carlos Navarro Retamal
Bioinformatic Engineering. PhD.
Postdoctoral Researcher in Center of Bioinformatics and Molecular
Simulations
Universidad de Talca
Av. Lircay S/N, Talca, Chile
E: carlos.navarr...@gmail.com or cnava...@utalca.cl

On July 29, 2019 at 6:11:31 PM, Mark Abraham (mark.j.abra...@gmail.com)
wrote:

Hi,

Yes and the -nmpi I copied from Carlos's post is ineffective - use -ntmpi

Mark


On Mon., 29 Jul. 2019, 15:15 Justin Lemkul,  wrote:

>
>
> On 7/29/19 8:46 AM, Carlos Navarro wrote:
> > Hi Mark,
> > I tried that before, but unfortunately in that case (removing
—gres=gpu:1
> > and including in each line the -gpu_id flag) for some reason the jobs
are
> > run one at a time (one after the other), so I can’t use properly the
> whole
> > node.
> >
>
> You need to run all but the last mdrun process in the background (&).
>
> -Justin
>
> > ——
> > Carlos Navarro Retamal
> > Bioinformatic Engineering. PhD.
> > Postdoctoral Researcher in Center of Bioinformatics and Molecular
> > Simulations
> > Universidad de Talca
> > Av. Lircay S/N, Talca, Chile
> > E: carlos.navarr...@gmail.com or cnava...@utalca.cl
> >
> > On July 29, 2019 at 11:48:21 AM, Mark Abraham (mark.j.abra...@gmail.com)
> > wrote:
> >
> > Hi,
> >
> > When you use
> >
> > DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 "
> >
> > then the environment seems to make sure only one GPU is visible. (The
log
> > files report only finding one GPU.) But it's probably the same GPU in
> each
> > case, with three remaining idle. I would suggest not using --gres unless
> > you can specify *which* of the four available GPUs each run can use.
> >
> > Otherwise, don't use --gres and use the facilities built into GROMACS,
> e.g.
> >
> > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0
> > -ntomp 20 -gpu_id 0
> > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset
10
> > -ntomp 20 -gpu_id 1
> > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset
20
> > -ntomp 20 -gpu_id 2
> > etc.
> >
> > Mark
> >
> > On Mon, 29 Jul 2019 at 11:34, Carlos Navarro  >
> > wrote:
> >
> >> Hi Szilárd,
> >> To answer your questions:
> >> **are you trying to run multiple simulations concurrently on the same
> >> node or are you trying to strong-scale?
> >> I'm trying to run multiple simulations on the same node at the same
> time.
> >>
> >> ** what are you simulating?
> >> Regular and CompEl simulations
> >>
> >> ** can you provide log files of the runs?
> >> In the following link are some logs files:
> >> https://www.dropbox.com/s/7q249vbqqwf5r03/Archive.zip?dl=0.
> >> In short, alone.log -> single run in the node (using 1 gpu).
> >> multi1/2/3/4.log ->4 independent simulations ran at the same time in a
> >> single node. In all cases, 20 cpus are used.
> >> Best regards,
> >> Carlos
> >>
> >> El jue., 25 jul. 2019 a las 10:59, Szilárd Páll (<
> pall.szil...@gmail.com>)
> >> escribió:
> >>
> >>> Hi,
> >>>
> >>> It is not clear to me how are you trying to set up your runs, so
> >>> please provide some details:
> >>> - are you trying to run multiple simulations concurrently on the same
> >>> node or are you trying to strong-scale?
> >>> - what are you simulating?
> >>> - can you provide log files of the runs?
> >>>
> >>> Cheers,
> >>>
> >>> --
> >>> Szilárd
> >>>
> >>> On Tue, Jul 23, 2019 at 1:34 AM Carlos Navarro
> >>>  wrote:
>  No one can give me an idea of what can be happening? Or how I can
> > solve
> >>> it?
>  Best regards,
>  Carlos
> 
>  ——
>  Carlos Navarro Retamal
>  Bioinformatic Engineering. PhD.
>  Postdoctoral Researcher in Center of Bioinformatics and Molecular
>  Simulations
>  Universidad de Talca
>  Av. Lircay S/N, Talca, Chile
>  E: carlos.navarr...@gmail.com or cnava...@utalca.cl
> 
>  On July 19, 2019 at 2:20:41 PM, Carlos Navarro (
> >>> carlos.navarr...@gmail.com)
>  wrote:
> 
>  Dear gmx-users,
>  I’m currently working in a server where each node posses 40 physical
> >>> cores
>  (40 threads) and 4 Nvidia-V100.
>  When I launch a single job (1 simulation using a single gpu card) I
> >> get a
>  performance

Re: [gmx-users] performance issues running gromacs with more than 1 gpu card in slurm

2019-07-29 Thread Mark Abraham

Hi,

Yes and the -nmpi I copied from Carlos's post is ineffective - use -ntmpi

Mark


On Mon., 29 Jul. 2019, 15:15 Justin Lemkul,  wrote:

>
>
> On 7/29/19 8:46 AM, Carlos Navarro wrote:
> > Hi Mark,
> > I tried that before, but unfortunately in that case (removing —gres=gpu:1
> > and including in each line the -gpu_id flag) for some reason the jobs are
> > run one at a time (one after the other), so I can’t use properly the
> whole
> > node.
> >
>
> You need to run all but the last mdrun process in the background (&).
>
> -Justin
>
> > ——
> > Carlos Navarro Retamal
> > Bioinformatic Engineering. PhD.
> > Postdoctoral Researcher in Center of Bioinformatics and Molecular
> > Simulations
> > Universidad de Talca
> > Av. Lircay S/N, Talca, Chile
> > E: carlos.navarr...@gmail.com or cnava...@utalca.cl
> >
> > On July 29, 2019 at 11:48:21 AM, Mark Abraham (mark.j.abra...@gmail.com)
> > wrote:
> >
> > Hi,
> >
> > When you use
> >
> > DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 "
> >
> > then the environment seems to make sure only one GPU is visible. (The log
> > files report only finding one GPU.) But it's probably the same GPU in
> each
> > case, with three remaining idle. I would suggest not using --gres unless
> > you can specify *which* of the four available GPUs each run can use.
> >
> > Otherwise, don't use --gres and use the facilities built into GROMACS,
> e.g.
> >
> > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0
> > -ntomp 20 -gpu_id 0
> > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 10
> > -ntomp 20 -gpu_id 1
> > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 20
> > -ntomp 20 -gpu_id 2
> > etc.
> >
> > Mark
> >
> > On Mon, 29 Jul 2019 at 11:34, Carlos Navarro  >
> > wrote:
> >
> >> Hi Szilárd,
> >> To answer your questions:
> >> **are you trying to run multiple simulations concurrently on the same
> >> node or are you trying to strong-scale?
> >> I'm trying to run multiple simulations on the same node at the same
> time.
> >>
> >> ** what are you simulating?
> >> Regular and CompEl simulations
> >>
> >> ** can you provide log files of the runs?
> >> In the following link are some logs files:
> >> https://www.dropbox.com/s/7q249vbqqwf5r03/Archive.zip?dl=0.
> >> In short, alone.log -> single run in the node (using 1 gpu).
> >> multi1/2/3/4.log ->4 independent simulations ran at the same time in a
> >> single node. In all cases, 20 cpus are used.
> >> Best regards,
> >> Carlos
> >>
> >> El jue., 25 jul. 2019 a las 10:59, Szilárd Páll (<
> pall.szil...@gmail.com>)
> >> escribió:
> >>
> >>> Hi,
> >>>
> >>> It is not clear to me how are you trying to set up your runs, so
> >>> please provide some details:
> >>> - are you trying to run multiple simulations concurrently on the same
> >>> node or are you trying to strong-scale?
> >>> - what are you simulating?
> >>> - can you provide log files of the runs?
> >>>
> >>> Cheers,
> >>>
> >>> --
> >>> Szilárd
> >>>
> >>> On Tue, Jul 23, 2019 at 1:34 AM Carlos Navarro
> >>>  wrote:
>  No one can give me an idea of what can be happening? Or how I can
> > solve
> >>> it?
>  Best regards,
>  Carlos
> 
>  ——
>  Carlos Navarro Retamal
>  Bioinformatic Engineering. PhD.
>  Postdoctoral Researcher in Center of Bioinformatics and Molecular
>  Simulations
>  Universidad de Talca
>  Av. Lircay S/N, Talca, Chile
>  E: carlos.navarr...@gmail.com or cnava...@utalca.cl
> 
>  On July 19, 2019 at 2:20:41 PM, Carlos Navarro (
> >>> carlos.navarr...@gmail.com)
>  wrote:
> 
>  Dear gmx-users,
>  I’m currently working in a server where each node posses 40 physical
> >>> cores
>  (40 threads) and 4 Nvidia-V100.
>  When I launch a single job (1 simulation using a single gpu card) I
> >> get a
>  performance of about ~35ns/day in a system of about 300k atoms.
> > Looking
>  into the usage of the video card during the simulation I notice that
> >> the
>  card is being used about and ~80%.
>  The problems arise when I increase the number of jobs running at the
> >> same
>  time. If for instance 2 jobs are running at the same time, the
> >>> performance
>  drops to ~25ns/day each and the usage of the video cards also drops
> >>> during
>  the simulation to about a ~30-40% (and sometimes dropping to less than
> >>> 5%).
>  Clearly there is a communication problem between the gpu cards and the
> >>> cpu
>  during the simulations, but I don’t know how to solve this.
>  Here is the script I use to run the simulations:
> 
>  #!/bin/bash -x
>  #SBATCH --job-name=testAtTPC1
>  #SBATCH --ntasks-per-node=4
>  #SBATCH --cpus-per-task=20
>  #SBATCH --account=hdd22
>  #SBATCH --nodes=1
>  #SBATCH --mem=0
>  #SBATCH --output=sout.%j
>  #SBATCH --error=s4err.%j
>  #SBATCH --time=00:10:00
>  #SBATCH --partition=develgpus
>  #SBATCH --gres=gpu:4
>

Re: [gmx-users] performance issues running gromacs with more than 1 gpu card in slurm

2019-07-29 Thread Szilárd Páll

Carlos,

You can accomplish the same using the multi-simulation feature of
mdrun and avoid having to manually manage the placement of runs, e.g.
instead of the above you just write
gmx mdrun_mpi -np N -multidir $WORKDIR1 $WORKDIR2 $WORKDIR3 ...
For more details see
http://manual.gromacs.org/documentation/current/user-guide/mdrun-features.html#running-multi-simulations
Note that if the different runs have different speed, just as with
your manual launch, your machine can end up partially utilized when
some of the runs finish.

Cheers,
--
Szilárd

On Mon, Jul 29, 2019 at 2:46 PM Carlos Navarro
 wrote:
>
> Hi Mark,
> I tried that before, but unfortunately in that case (removing —gres=gpu:1
> and including in each line the -gpu_id flag) for some reason the jobs are
> run one at a time (one after the other), so I can’t use properly the whole
> node.
>
>
> ——
> Carlos Navarro Retamal
> Bioinformatic Engineering. PhD.
> Postdoctoral Researcher in Center of Bioinformatics and Molecular
> Simulations
> Universidad de Talca
> Av. Lircay S/N, Talca, Chile
> E: carlos.navarr...@gmail.com or cnava...@utalca.cl
>
> On July 29, 2019 at 11:48:21 AM, Mark Abraham (mark.j.abra...@gmail.com)
> wrote:
>
> Hi,
>
> When you use
>
> DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 "
>
> then the environment seems to make sure only one GPU is visible. (The log
> files report only finding one GPU.) But it's probably the same GPU in each
> case, with three remaining idle. I would suggest not using --gres unless
> you can specify *which* of the four available GPUs each run can use.
>
> Otherwise, don't use --gres and use the facilities built into GROMACS, e.g.
>
> $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0
> -ntomp 20 -gpu_id 0
> $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 10
> -ntomp 20 -gpu_id 1
> $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 20
> -ntomp 20 -gpu_id 2
> etc.
>
> Mark
>
> On Mon, 29 Jul 2019 at 11:34, Carlos Navarro 
> wrote:
>
> > Hi Szilárd,
> > To answer your questions:
> > **are you trying to run multiple simulations concurrently on the same
> > node or are you trying to strong-scale?
> > I'm trying to run multiple simulations on the same node at the same time.
> >
> > ** what are you simulating?
> > Regular and CompEl simulations
> >
> > ** can you provide log files of the runs?
> > In the following link are some logs files:
> > https://www.dropbox.com/s/7q249vbqqwf5r03/Archive.zip?dl=0.
> > In short, alone.log -> single run in the node (using 1 gpu).
> > multi1/2/3/4.log ->4 independent simulations ran at the same time in a
> > single node. In all cases, 20 cpus are used.
> > Best regards,
> > Carlos
> >
> > El jue., 25 jul. 2019 a las 10:59, Szilárd Páll ()
> > escribió:
> >
> > > Hi,
> > >
> > > It is not clear to me how are you trying to set up your runs, so
> > > please provide some details:
> > > - are you trying to run multiple simulations concurrently on the same
> > > node or are you trying to strong-scale?
> > > - what are you simulating?
> > > - can you provide log files of the runs?
> > >
> > > Cheers,
> > >
> > > --
> > > Szilárd
> > >
> > > On Tue, Jul 23, 2019 at 1:34 AM Carlos Navarro
> > >  wrote:
> > > >
> > > > No one can give me an idea of what can be happening? Or how I can
> solve
> > > it?
> > > > Best regards,
> > > > Carlos
> > > >
> > > > ——
> > > > Carlos Navarro Retamal
> > > > Bioinformatic Engineering. PhD.
> > > > Postdoctoral Researcher in Center of Bioinformatics and Molecular
> > > > Simulations
> > > > Universidad de Talca
> > > > Av. Lircay S/N, Talca, Chile
> > > > E: carlos.navarr...@gmail.com or cnava...@utalca.cl
> > > >
> > > > On July 19, 2019 at 2:20:41 PM, Carlos Navarro (
> > > carlos.navarr...@gmail.com)
> > > > wrote:
> > > >
> > > > Dear gmx-users,
> > > > I’m currently working in a server where each node posses 40 physical
> > > cores
> > > > (40 threads) and 4 Nvidia-V100.
> > > > When I launch a single job (1 simulation using a single gpu card) I
> > get a
> > > > performance of about ~35ns/day in a system of about 300k atoms.
> Looking
> > > > into the usage of the video card during the simulation I notice that
> > the
> > > > card is being used about and ~80%.
> > > > The problems arise when I increase the number of jobs running at the
> > same
> > > > time. If for instance 2 jobs are running at the same time, the
> > > performance
> > > > drops to ~25ns/day each and the usage of the video cards also drops
> > > during
> > > > the simulation to about a ~30-40% (and sometimes dropping to less than
> > > 5%).
> > > > Clearly there is a communication problem between the gpu cards and the
> > > cpu
> > > > during the simulations, but I don’t know how to solve this.
> > > > Here is the script I use to run the simulations:
> > > >
> > > > #!/bin/bash -x
> > > > #SBATCH --job-name=testAtTPC1
> > > > #SBATCH --ntasks-per-node=4
> > > > #SBATCH --cpus-per-task=20
> > > > #SBATCH

Re: [gmx-users] performance issues running gromacs with more than 1 gpu card in slurm

2019-07-29 Thread Justin Lemkul




On 7/29/19 8:46 AM, Carlos Navarro wrote:

Hi Mark,
I tried that before, but unfortunately in that case (removing —gres=gpu:1
and including in each line the -gpu_id flag) for some reason the jobs are
run one at a time (one after the other), so I can’t use properly the whole
node.



You need to run all but the last mdrun process in the background (&).

-Justin


——
Carlos Navarro Retamal
Bioinformatic Engineering. PhD.
Postdoctoral Researcher in Center of Bioinformatics and Molecular
Simulations
Universidad de Talca
Av. Lircay S/N, Talca, Chile
E: carlos.navarr...@gmail.com or cnava...@utalca.cl

On July 29, 2019 at 11:48:21 AM, Mark Abraham (mark.j.abra...@gmail.com)
wrote:

Hi,

When you use

DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 "

then the environment seems to make sure only one GPU is visible. (The log
files report only finding one GPU.) But it's probably the same GPU in each
case, with three remaining idle. I would suggest not using --gres unless
you can specify *which* of the four available GPUs each run can use.

Otherwise, don't use --gres and use the facilities built into GROMACS, e.g.

$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0
-ntomp 20 -gpu_id 0
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 10
-ntomp 20 -gpu_id 1
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 20
-ntomp 20 -gpu_id 2
etc.

Mark

On Mon, 29 Jul 2019 at 11:34, Carlos Navarro 
wrote:


Hi Szilárd,
To answer your questions:
**are you trying to run multiple simulations concurrently on the same
node or are you trying to strong-scale?
I'm trying to run multiple simulations on the same node at the same time.

** what are you simulating?
Regular and CompEl simulations

** can you provide log files of the runs?
In the following link are some logs files:
https://www.dropbox.com/s/7q249vbqqwf5r03/Archive.zip?dl=0.
In short, alone.log -> single run in the node (using 1 gpu).
multi1/2/3/4.log ->4 independent simulations ran at the same time in a
single node. In all cases, 20 cpus are used.
Best regards,
Carlos

El jue., 25 jul. 2019 a las 10:59, Szilárd Páll ()
escribió:


Hi,

It is not clear to me how are you trying to set up your runs, so
please provide some details:
- are you trying to run multiple simulations concurrently on the same
node or are you trying to strong-scale?
- what are you simulating?
- can you provide log files of the runs?

Cheers,

--
Szilárd

On Tue, Jul 23, 2019 at 1:34 AM Carlos Navarro
 wrote:

No one can give me an idea of what can be happening? Or how I can

solve

it?

Best regards,
Carlos

——
Carlos Navarro Retamal
Bioinformatic Engineering. PhD.
Postdoctoral Researcher in Center of Bioinformatics and Molecular
Simulations
Universidad de Talca
Av. Lircay S/N, Talca, Chile
E: carlos.navarr...@gmail.com or cnava...@utalca.cl

On July 19, 2019 at 2:20:41 PM, Carlos Navarro (

carlos.navarr...@gmail.com)

wrote:

Dear gmx-users,
I’m currently working in a server where each node posses 40 physical

cores

(40 threads) and 4 Nvidia-V100.
When I launch a single job (1 simulation using a single gpu card) I

get a

performance of about ~35ns/day in a system of about 300k atoms.

Looking

into the usage of the video card during the simulation I notice that

the

card is being used about and ~80%.
The problems arise when I increase the number of jobs running at the

same

time. If for instance 2 jobs are running at the same time, the

performance

drops to ~25ns/day each and the usage of the video cards also drops

during

the simulation to about a ~30-40% (and sometimes dropping to less than

5%).

Clearly there is a communication problem between the gpu cards and the

cpu

during the simulations, but I don’t know how to solve this.
Here is the script I use to run the simulations:

#!/bin/bash -x
#SBATCH --job-name=testAtTPC1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=20
#SBATCH --account=hdd22
#SBATCH --nodes=1
#SBATCH --mem=0
#SBATCH --output=sout.%j
#SBATCH --error=s4err.%j
#SBATCH --time=00:10:00
#SBATCH --partition=develgpus
#SBATCH --gres=gpu:4

module use /gpfs/software/juwels/otherstages
module load Stages/2018b
module load Intel/2019.0.117-GCC-7.3.0
module load IntelMPI/2019.0.117
module load GROMACS/2018.3

WORKDIR1=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/1
WORKDIR2=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/2
WORKDIR3=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/3
WORKDIR4=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/4

DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 "
EXE=" gmx mdrun "

cd $WORKDIR1
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset

0

-ntomp 20 &>log &
cd $WORKDIR2
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset

10

-ntomp 20 &>log &
cd $WORKDIR3
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset

20

-ntomp 20 &>log &
cd $WORKDIR4
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi

Re: [gmx-users] performance issues running gromacs with more than 1 gpu card in slurm

2019-07-29 Thread Carlos Navarro

Hi Mark,
I tried that before, but unfortunately in that case (removing —gres=gpu:1
and including in each line the -gpu_id flag) for some reason the jobs are
run one at a time (one after the other), so I can’t use properly the whole
node.


——
Carlos Navarro Retamal
Bioinformatic Engineering. PhD.
Postdoctoral Researcher in Center of Bioinformatics and Molecular
Simulations
Universidad de Talca
Av. Lircay S/N, Talca, Chile
E: carlos.navarr...@gmail.com or cnava...@utalca.cl

On July 29, 2019 at 11:48:21 AM, Mark Abraham (mark.j.abra...@gmail.com)
wrote:

Hi,

When you use

DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 "

then the environment seems to make sure only one GPU is visible. (The log
files report only finding one GPU.) But it's probably the same GPU in each
case, with three remaining idle. I would suggest not using --gres unless
you can specify *which* of the four available GPUs each run can use.

Otherwise, don't use --gres and use the facilities built into GROMACS, e.g.

$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0
-ntomp 20 -gpu_id 0
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 10
-ntomp 20 -gpu_id 1
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 20
-ntomp 20 -gpu_id 2
etc.

Mark

On Mon, 29 Jul 2019 at 11:34, Carlos Navarro 
wrote:

> Hi Szilárd,
> To answer your questions:
> **are you trying to run multiple simulations concurrently on the same
> node or are you trying to strong-scale?
> I'm trying to run multiple simulations on the same node at the same time.
>
> ** what are you simulating?
> Regular and CompEl simulations
>
> ** can you provide log files of the runs?
> In the following link are some logs files:
> https://www.dropbox.com/s/7q249vbqqwf5r03/Archive.zip?dl=0.
> In short, alone.log -> single run in the node (using 1 gpu).
> multi1/2/3/4.log ->4 independent simulations ran at the same time in a
> single node. In all cases, 20 cpus are used.
> Best regards,
> Carlos
>
> El jue., 25 jul. 2019 a las 10:59, Szilárd Páll ()
> escribió:
>
> > Hi,
> >
> > It is not clear to me how are you trying to set up your runs, so
> > please provide some details:
> > - are you trying to run multiple simulations concurrently on the same
> > node or are you trying to strong-scale?
> > - what are you simulating?
> > - can you provide log files of the runs?
> >
> > Cheers,
> >
> > --
> > Szilárd
> >
> > On Tue, Jul 23, 2019 at 1:34 AM Carlos Navarro
> >  wrote:
> > >
> > > No one can give me an idea of what can be happening? Or how I can
solve
> > it?
> > > Best regards,
> > > Carlos
> > >
> > > ——
> > > Carlos Navarro Retamal
> > > Bioinformatic Engineering. PhD.
> > > Postdoctoral Researcher in Center of Bioinformatics and Molecular
> > > Simulations
> > > Universidad de Talca
> > > Av. Lircay S/N, Talca, Chile
> > > E: carlos.navarr...@gmail.com or cnava...@utalca.cl
> > >
> > > On July 19, 2019 at 2:20:41 PM, Carlos Navarro (
> > carlos.navarr...@gmail.com)
> > > wrote:
> > >
> > > Dear gmx-users,
> > > I’m currently working in a server where each node posses 40 physical
> > cores
> > > (40 threads) and 4 Nvidia-V100.
> > > When I launch a single job (1 simulation using a single gpu card) I
> get a
> > > performance of about ~35ns/day in a system of about 300k atoms.
Looking
> > > into the usage of the video card during the simulation I notice that
> the
> > > card is being used about and ~80%.
> > > The problems arise when I increase the number of jobs running at the
> same
> > > time. If for instance 2 jobs are running at the same time, the
> > performance
> > > drops to ~25ns/day each and the usage of the video cards also drops
> > during
> > > the simulation to about a ~30-40% (and sometimes dropping to less than
> > 5%).
> > > Clearly there is a communication problem between the gpu cards and the
> > cpu
> > > during the simulations, but I don’t know how to solve this.
> > > Here is the script I use to run the simulations:
> > >
> > > #!/bin/bash -x
> > > #SBATCH --job-name=testAtTPC1
> > > #SBATCH --ntasks-per-node=4
> > > #SBATCH --cpus-per-task=20
> > > #SBATCH --account=hdd22
> > > #SBATCH --nodes=1
> > > #SBATCH --mem=0
> > > #SBATCH --output=sout.%j
> > > #SBATCH --error=s4err.%j
> > > #SBATCH --time=00:10:00
> > > #SBATCH --partition=develgpus
> > > #SBATCH --gres=gpu:4
> > >
> > > module use /gpfs/software/juwels/otherstages
> > > module load Stages/2018b
> > > module load Intel/2019.0.117-GCC-7.3.0
> > > module load IntelMPI/2019.0.117
> > > module load GROMACS/2018.3
> > >
> > > WORKDIR1=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/1
> > > WORKDIR2=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/2
> > > WORKDIR3=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/3
> > > WORKDIR4=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/4
> > >
> > > DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 "
> > > EXE=" gmx mdrun "
> > >
> > > cd $WORKDIR1
> > > $DO_PARALLEL $EXE -s eq6.tpr

Re: [gmx-users] performance issues running gromacs with more than 1 gpu card in slurm

2019-07-29 Thread Mark Abraham

Hi,

When you use

DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 "

then the environment seems to make sure only one GPU is visible. (The log
files report only finding one GPU.) But it's probably the same GPU in each
case, with three remaining idle. I would suggest not using --gres unless
you can specify *which* of the four available GPUs each run can use.

Otherwise, don't use --gres and use the facilities built into GROMACS, e.g.

$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0
-ntomp 20 -gpu_id 0
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 10
-ntomp 20 -gpu_id 1
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20  -nmpi 1 -pin on -pinoffset 20
-ntomp 20 -gpu_id 2
etc.

Mark

On Mon, 29 Jul 2019 at 11:34, Carlos Navarro 
wrote:

> Hi Szilárd,
> To answer your questions:
> **are you trying to run multiple simulations concurrently on the same
> node or are you trying to strong-scale?
> I'm trying to run multiple simulations on the same node at the same time.
>
> ** what are you simulating?
> Regular and CompEl simulations
>
> ** can you provide log files of the runs?
> In the following link are some logs files:
> https://www.dropbox.com/s/7q249vbqqwf5r03/Archive.zip?dl=0.
> In short, alone.log -> single run in the node (using 1 gpu).
> multi1/2/3/4.log ->4 independent simulations ran at the same time in a
> single node. In all cases, 20 cpus are used.
> Best regards,
> Carlos
>
> El jue., 25 jul. 2019 a las 10:59, Szilárd Páll ()
> escribió:
>
> > Hi,
> >
> > It is not clear to me how are you trying to set up your runs, so
> > please provide some details:
> > - are you trying to run multiple simulations concurrently on the same
> > node or are you trying to strong-scale?
> > - what are you simulating?
> > - can you provide log files of the runs?
> >
> > Cheers,
> >
> > --
> > Szilárd
> >
> > On Tue, Jul 23, 2019 at 1:34 AM Carlos Navarro
> >  wrote:
> > >
> > > No one can give me an idea of what can be happening? Or how I can solve
> > it?
> > > Best regards,
> > > Carlos
> > >
> > > ——
> > > Carlos Navarro Retamal
> > > Bioinformatic Engineering. PhD.
> > > Postdoctoral Researcher in Center of Bioinformatics and Molecular
> > > Simulations
> > > Universidad de Talca
> > > Av. Lircay S/N, Talca, Chile
> > > E: carlos.navarr...@gmail.com or cnava...@utalca.cl
> > >
> > > On July 19, 2019 at 2:20:41 PM, Carlos Navarro (
> > carlos.navarr...@gmail.com)
> > > wrote:
> > >
> > > Dear gmx-users,
> > > I’m currently working in a server where each node posses 40 physical
> > cores
> > > (40 threads) and 4 Nvidia-V100.
> > > When I launch a single job (1 simulation using a single gpu card) I
> get a
> > > performance of about ~35ns/day in a system of about 300k atoms. Looking
> > > into the usage of the video card during the simulation I notice that
> the
> > > card is being used about and ~80%.
> > > The problems arise when I increase the number of jobs running at the
> same
> > > time. If for instance 2 jobs are running at the same time, the
> > performance
> > > drops to ~25ns/day each and the usage of the video cards also drops
> > during
> > > the simulation to about a ~30-40% (and sometimes dropping to less than
> > 5%).
> > > Clearly there is a communication problem between the gpu cards and the
> > cpu
> > > during the simulations, but I don’t know how to solve this.
> > > Here is the script I use to run the simulations:
> > >
> > > #!/bin/bash -x
> > > #SBATCH --job-name=testAtTPC1
> > > #SBATCH --ntasks-per-node=4
> > > #SBATCH --cpus-per-task=20
> > > #SBATCH --account=hdd22
> > > #SBATCH --nodes=1
> > > #SBATCH --mem=0
> > > #SBATCH --output=sout.%j
> > > #SBATCH --error=s4err.%j
> > > #SBATCH --time=00:10:00
> > > #SBATCH --partition=develgpus
> > > #SBATCH --gres=gpu:4
> > >
> > > module use /gpfs/software/juwels/otherstages
> > > module load Stages/2018b
> > > module load Intel/2019.0.117-GCC-7.3.0
> > > module load IntelMPI/2019.0.117
> > > module load GROMACS/2018.3
> > >
> > > WORKDIR1=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/1
> > > WORKDIR2=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/2
> > > WORKDIR3=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/3
> > > WORKDIR4=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/4
> > >
> > > DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 "
> > > EXE=" gmx mdrun "
> > >
> > > cd $WORKDIR1
> > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset
> 0
> > > -ntomp 20 &>log &
> > > cd $WORKDIR2
> > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset
> 10
> > > -ntomp 20 &>log &
> > > cd $WORKDIR3
> > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20  -nmpi 1 -pin on -pinoffset
> > 20
> > > -ntomp 20 &>log &
> > > cd $WORKDIR4
> > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset
> 30
> > > -ntomp 20 &>log &
> > >
> > >
> > > Regarding to pinoffset, I first tried using 20 cores for each job but
> > then
> > > also tried with

Re: [gmx-users] performance issues running gromacs with more than 1 gpu card in slurm

2019-07-29 Thread Carlos Navarro

Hi Szilárd,
To answer your questions:
**are you trying to run multiple simulations concurrently on the same
node or are you trying to strong-scale?
I'm trying to run multiple simulations on the same node at the same time.

** what are you simulating?
Regular and CompEl simulations

** can you provide log files of the runs?
In the following link are some logs files:
https://www.dropbox.com/s/7q249vbqqwf5r03/Archive.zip?dl=0.
In short, alone.log -> single run in the node (using 1 gpu).
multi1/2/3/4.log ->4 independent simulations ran at the same time in a
single node. In all cases, 20 cpus are used.
Best regards,
Carlos

El jue., 25 jul. 2019 a las 10:59, Szilárd Páll ()
escribió:

> Hi,
>
> It is not clear to me how are you trying to set up your runs, so
> please provide some details:
> - are you trying to run multiple simulations concurrently on the same
> node or are you trying to strong-scale?
> - what are you simulating?
> - can you provide log files of the runs?
>
> Cheers,
>
> --
> Szilárd
>
> On Tue, Jul 23, 2019 at 1:34 AM Carlos Navarro
>  wrote:
> >
> > No one can give me an idea of what can be happening? Or how I can solve
> it?
> > Best regards,
> > Carlos
> >
> > ——
> > Carlos Navarro Retamal
> > Bioinformatic Engineering. PhD.
> > Postdoctoral Researcher in Center of Bioinformatics and Molecular
> > Simulations
> > Universidad de Talca
> > Av. Lircay S/N, Talca, Chile
> > E: carlos.navarr...@gmail.com or cnava...@utalca.cl
> >
> > On July 19, 2019 at 2:20:41 PM, Carlos Navarro (
> carlos.navarr...@gmail.com)
> > wrote:
> >
> > Dear gmx-users,
> > I’m currently working in a server where each node posses 40 physical
> cores
> > (40 threads) and 4 Nvidia-V100.
> > When I launch a single job (1 simulation using a single gpu card) I get a
> > performance of about ~35ns/day in a system of about 300k atoms. Looking
> > into the usage of the video card during the simulation I notice that the
> > card is being used about and ~80%.
> > The problems arise when I increase the number of jobs running at the same
> > time. If for instance 2 jobs are running at the same time, the
> performance
> > drops to ~25ns/day each and the usage of the video cards also drops
> during
> > the simulation to about a ~30-40% (and sometimes dropping to less than
> 5%).
> > Clearly there is a communication problem between the gpu cards and the
> cpu
> > during the simulations, but I don’t know how to solve this.
> > Here is the script I use to run the simulations:
> >
> > #!/bin/bash -x
> > #SBATCH --job-name=testAtTPC1
> > #SBATCH --ntasks-per-node=4
> > #SBATCH --cpus-per-task=20
> > #SBATCH --account=hdd22
> > #SBATCH --nodes=1
> > #SBATCH --mem=0
> > #SBATCH --output=sout.%j
> > #SBATCH --error=s4err.%j
> > #SBATCH --time=00:10:00
> > #SBATCH --partition=develgpus
> > #SBATCH --gres=gpu:4
> >
> > module use /gpfs/software/juwels/otherstages
> > module load Stages/2018b
> > module load Intel/2019.0.117-GCC-7.3.0
> > module load IntelMPI/2019.0.117
> > module load GROMACS/2018.3
> >
> > WORKDIR1=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/1
> > WORKDIR2=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/2
> > WORKDIR3=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/3
> > WORKDIR4=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/4
> >
> > DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 "
> > EXE=" gmx mdrun "
> >
> > cd $WORKDIR1
> > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0
> > -ntomp 20 &>log &
> > cd $WORKDIR2
> > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 10
> > -ntomp 20 &>log &
> > cd $WORKDIR3
> > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20  -nmpi 1 -pin on -pinoffset
> 20
> > -ntomp 20 &>log &
> > cd $WORKDIR4
> > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 30
> > -ntomp 20 &>log &
> >
> >
> > Regarding to pinoffset, I first tried using 20 cores for each job but
> then
> > also tried with 8 cores (so pinoffset 0 for job 1, pinoffset 4 for job 2,
> > pinoffset 8 for job 3 and pinoffset 12 for job) but at the end the
> problem
> > persist.
> >
> > Currently in this machine I’m not able to use more than 1 gpu per job, so
> > this is my only choice to use properly the whole node.
> > If you need more information please just let me know.
> > Best regards.
> > Carlos
> >
> > ——
> > Carlos Navarro Retamal
> > Bioinformatic Engineering. PhD.
> > Postdoctoral Researcher in Center of Bioinformatics and Molecular
> > Simulations
> > Universidad de Talca
> > Av. Lircay S/N, Talca, Chile
> > E: carlos.navarr...@gmail.com or cnava...@utalca.cl
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to

Re: [gmx-users] performance issues running gromacs with more than 1 gpu card in slurm

2019-07-25 Thread Szilárd Páll

Hi,

It is not clear to me how are you trying to set up your runs, so
please provide some details:
- are you trying to run multiple simulations concurrently on the same
node or are you trying to strong-scale?
- what are you simulating?
- can you provide log files of the runs?

Cheers,

--
Szilárd

On Tue, Jul 23, 2019 at 1:34 AM Carlos Navarro
 wrote:
>
> No one can give me an idea of what can be happening? Or how I can solve it?
> Best regards,
> Carlos
>
> ——
> Carlos Navarro Retamal
> Bioinformatic Engineering. PhD.
> Postdoctoral Researcher in Center of Bioinformatics and Molecular
> Simulations
> Universidad de Talca
> Av. Lircay S/N, Talca, Chile
> E: carlos.navarr...@gmail.com or cnava...@utalca.cl
>
> On July 19, 2019 at 2:20:41 PM, Carlos Navarro (carlos.navarr...@gmail.com)
> wrote:
>
> Dear gmx-users,
> I’m currently working in a server where each node posses 40 physical cores
> (40 threads) and 4 Nvidia-V100.
> When I launch a single job (1 simulation using a single gpu card) I get a
> performance of about ~35ns/day in a system of about 300k atoms. Looking
> into the usage of the video card during the simulation I notice that the
> card is being used about and ~80%.
> The problems arise when I increase the number of jobs running at the same
> time. If for instance 2 jobs are running at the same time, the performance
> drops to ~25ns/day each and the usage of the video cards also drops during
> the simulation to about a ~30-40% (and sometimes dropping to less than 5%).
> Clearly there is a communication problem between the gpu cards and the cpu
> during the simulations, but I don’t know how to solve this.
> Here is the script I use to run the simulations:
>
> #!/bin/bash -x
> #SBATCH --job-name=testAtTPC1
> #SBATCH --ntasks-per-node=4
> #SBATCH --cpus-per-task=20
> #SBATCH --account=hdd22
> #SBATCH --nodes=1
> #SBATCH --mem=0
> #SBATCH --output=sout.%j
> #SBATCH --error=s4err.%j
> #SBATCH --time=00:10:00
> #SBATCH --partition=develgpus
> #SBATCH --gres=gpu:4
>
> module use /gpfs/software/juwels/otherstages
> module load Stages/2018b
> module load Intel/2019.0.117-GCC-7.3.0
> module load IntelMPI/2019.0.117
> module load GROMACS/2018.3
>
> WORKDIR1=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/1
> WORKDIR2=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/2
> WORKDIR3=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/3
> WORKDIR4=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/4
>
> DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 "
> EXE=" gmx mdrun "
>
> cd $WORKDIR1
> $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0
> -ntomp 20 &>log &
> cd $WORKDIR2
> $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 10
> -ntomp 20 &>log &
> cd $WORKDIR3
> $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20  -nmpi 1 -pin on -pinoffset 20
> -ntomp 20 &>log &
> cd $WORKDIR4
> $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 30
> -ntomp 20 &>log &
>
>
> Regarding to pinoffset, I first tried using 20 cores for each job but then
> also tried with 8 cores (so pinoffset 0 for job 1, pinoffset 4 for job 2,
> pinoffset 8 for job 3 and pinoffset 12 for job) but at the end the problem
> persist.
>
> Currently in this machine I’m not able to use more than 1 gpu per job, so
> this is my only choice to use properly the whole node.
> If you need more information please just let me know.
> Best regards.
> Carlos
>
> ——
> Carlos Navarro Retamal
> Bioinformatic Engineering. PhD.
> Postdoctoral Researcher in Center of Bioinformatics and Molecular
> Simulations
> Universidad de Talca
> Av. Lircay S/N, Talca, Chile
> E: carlos.navarr...@gmail.com or cnava...@utalca.cl
> --
> Gromacs Users mailing list
>
> * Please search the archive at 
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
> mail to gmx-users-requ...@gromacs.org.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] performance issues running gromacs with more than 1 gpu card in slurm

2019-07-22 Thread Carlos Navarro

No one can give me an idea of what can be happening? Or how I can solve it?
Best regards,
Carlos

——
Carlos Navarro Retamal
Bioinformatic Engineering. PhD.
Postdoctoral Researcher in Center of Bioinformatics and Molecular
Simulations
Universidad de Talca
Av. Lircay S/N, Talca, Chile
E: carlos.navarr...@gmail.com or cnava...@utalca.cl

On July 19, 2019 at 2:20:41 PM, Carlos Navarro (carlos.navarr...@gmail.com)
wrote:

Dear gmx-users,
I’m currently working in a server where each node posses 40 physical cores
(40 threads) and 4 Nvidia-V100.
When I launch a single job (1 simulation using a single gpu card) I get a
performance of about ~35ns/day in a system of about 300k atoms. Looking
into the usage of the video card during the simulation I notice that the
card is being used about and ~80%.
The problems arise when I increase the number of jobs running at the same
time. If for instance 2 jobs are running at the same time, the performance
drops to ~25ns/day each and the usage of the video cards also drops during
the simulation to about a ~30-40% (and sometimes dropping to less than 5%).
Clearly there is a communication problem between the gpu cards and the cpu
during the simulations, but I don’t know how to solve this.
Here is the script I use to run the simulations:

#!/bin/bash -x
#SBATCH --job-name=testAtTPC1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=20
#SBATCH --account=hdd22
#SBATCH --nodes=1
#SBATCH --mem=0
#SBATCH --output=sout.%j
#SBATCH --error=s4err.%j
#SBATCH --time=00:10:00
#SBATCH --partition=develgpus
#SBATCH --gres=gpu:4

module use /gpfs/software/juwels/otherstages
module load Stages/2018b
module load Intel/2019.0.117-GCC-7.3.0
module load IntelMPI/2019.0.117
module load GROMACS/2018.3

WORKDIR1=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/1
WORKDIR2=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/2
WORKDIR3=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/3
WORKDIR4=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/4

DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 "
EXE=" gmx mdrun "

cd $WORKDIR1
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0
-ntomp 20 &>log &
cd $WORKDIR2
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 10
-ntomp 20 &>log &
cd $WORKDIR3
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20  -nmpi 1 -pin on -pinoffset 20
-ntomp 20 &>log &
cd $WORKDIR4
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 30
-ntomp 20 &>log &


Regarding to pinoffset, I first tried using 20 cores for each job but then
also tried with 8 cores (so pinoffset 0 for job 1, pinoffset 4 for job 2,
pinoffset 8 for job 3 and pinoffset 12 for job) but at the end the problem
persist.

Currently in this machine I’m not able to use more than 1 gpu per job, so
this is my only choice to use properly the whole node.
If you need more information please just let me know.
Best regards.
Carlos

——
Carlos Navarro Retamal
Bioinformatic Engineering. PhD.
Postdoctoral Researcher in Center of Bioinformatics and Molecular
Simulations
Universidad de Talca
Av. Lircay S/N, Talca, Chile
E: carlos.navarr...@gmail.com or cnava...@utalca.cl
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

[gmx-users] performance issues running gromacs with more than 1 gpu card in slurm

2019-07-19 Thread Carlos Navarro

Dear gmx-users,
I’m currently working in a server where each node posses 40 physical cores
(40 threads) and 4 Nvidia-V100.
When I launch a single job (1 simulation using a single gpu card) I get a
performance of about ~35ns/day in a system of about 300k atoms. Looking
into the usage of the video card during the simulation I notice that the
card is being used about and ~80%.
The problems arise when I increase the number of jobs running at the same
time. If for instance 2 jobs are running at the same time, the performance
drops to ~25ns/day each and the usage of the video cards also drops during
the simulation to about a ~30-40% (and sometimes dropping to less than 5%).
Clearly there is a communication problem between the gpu cards and the cpu
during the simulations, but I don’t know how to solve this.
Here is the script I use to run the simulations:

#!/bin/bash -x
#SBATCH --job-name=testAtTPC1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=20
#SBATCH --account=hdd22
#SBATCH --nodes=1
#SBATCH --mem=0
#SBATCH --output=sout.%j
#SBATCH --error=s4err.%j
#SBATCH --time=00:10:00
#SBATCH --partition=develgpus
#SBATCH --gres=gpu:4

module use /gpfs/software/juwels/otherstages
module load Stages/2018b
module load Intel/2019.0.117-GCC-7.3.0
module load IntelMPI/2019.0.117
module load GROMACS/2018.3

WORKDIR1=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/1
WORKDIR2=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/2
WORKDIR3=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/3
WORKDIR4=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/4

DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 "
EXE=" gmx mdrun "

cd $WORKDIR1
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0
-ntomp 20 &>log &
cd $WORKDIR2
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 10
-ntomp 20 &>log &
cd $WORKDIR3
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20  -nmpi 1 -pin on -pinoffset 20
-ntomp 20 &>log &
cd $WORKDIR4
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 30
-ntomp 20 &>log &


Regarding to pinoffset, I first tried using 20 cores for each job but then
also tried with 8 cores (so pinoffset 0 for job 1, pinoffset 4 for job 2,
pinoffset 8 for job 3 and pinoffset 12 for job) but at the end the problem
persist.

Currently in this machine I’m not able to use more than 1 gpu per job, so
this is my only choice to use properly the whole node.
If you need more information please just let me know.
Best regards.
Carlos

——
Carlos Navarro Retamal
Bioinformatic Engineering. PhD.
Postdoctoral Researcher in Center of Bioinformatics and Molecular
Simulations
Universidad de Talca
Av. Lircay S/N, Talca, Chile
E: carlos.navarr...@gmail.com or cnava...@utalca.cl
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Performance of GROMACS on GPU's on ORNL Titan?

2019-02-13 Thread pbuscemi

For what it is worth: on our AMD 2990wd 32 core,  2 x 2080ti we can run 100k
atoms at ~ 100ns/day NVT , ~ 150 ns/day NPT so 8 -10 days to get that
microsecond.   I'm curious to learn what kind of results you might obtain
from Oak Ridge and if the cost/clock time analysis makes it worthwhile.

Paul

-Original Message-
From: gromacs.org_gmx-users-boun...@maillist.sys.kth.se
 On Behalf Of Michael
Shirts
Sent: Wednesday, February 13, 2019 1:28 PM
To: Discussion list for GROMACS users ; Michael R
Shirts 
Subject: [gmx-users] Performance of GROMACS on GPU's on ORNL Titan?

Does anyone have experience running GROMACS on GPU's on Oak Ridge National
Labs Titan or Summit machines, especially parallelization over multiple
GPUs? I'm looking at applying for allocations there, and am interested in
experiences that people have had. We're probably mostly looking at systems
in the 100-200K atoms range, but we need to get to long timescales (multiple
microseconds, at least) for some of the phenomena we are looking at.

Thanks!
--
Gromacs Users mailing list

* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a
mail to gmx-users-requ...@gromacs.org.

-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

[gmx-users] Performance of GROMACS on GPU's on ORNL Titan?

2019-02-13 Thread Michael Shirts

Does anyone have experience running GROMACS on GPU's on Oak Ridge National
Labs Titan or Summit machines, especially parallelization over multiple
GPUs? I'm looking at applying for allocations there, and am interested in
experiences that people have had. We're probably mostly looking at systems
in the 100-200K atoms range, but we need to get to long timescales
(multiple microseconds, at least) for some of the phenomena we are looking
at.

Thanks!
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Performance

2018-04-03 Thread Szilárd Páll

Hi,

Your system is exploding, some atoms end up with coordinates of around
10^9 which then throw off the PBC code that tries to put atoms back in
the box. This will normally not happen as constraining will already
fail with such huge coordinates, I think, so technically it is a bug,
we could handle this corner-case better.

However, if you need to verify your system setup as it is unstable
(not well equilibrated or the time-step too long).

--
Szilárd


On Thu, Mar 29, 2018 at 3:50 PM, Myunggi Yi  wrote:
> Dear Szilard,
>
> Can you run this simulation?
>
> Simulation doesn't crush and doesn't generate error message.
> It take forever without updating report in log file or other output files.
>
> Is this a bug?
>
>
>
> On Thu, Mar 29, 2018 at 7:58 AM, Szilárd Páll 
> wrote:
>
>> Thanks. Looks like the messages and error handling is somewhat
>> confusing; you must have the OMP_NUM_THREADS environment variable set
>> which (just as setting -ntomp), without setting -ntmpi too is not
>> supported.
>>
>> Either let mdrun decide about the thread count or set -ntmpi manually.
>>
>> --
>> Szilárd
>>
>>
>> On Wed, Mar 28, 2018 at 7:10 PM, Myunggi Yi  wrote:
>> > Does it work?
>> >
>> > https://drive.google.com/open?id=1n5m1tNGbnV7oZnuAEgZ7gSP6qA6HluNl
>> >
>> > How about this?
>> >
>> >
>> > Myunggi Yi
>> >
>> > On Wed, Mar 28, 2018 at 12:20 PM, Mark Abraham > >
>> > wrote:
>> >
>> >> Hi,
>> >>
>> >> Attachments can't be accepted on the list - please upload to a file
>> sharing
>> >> service and share links to those.
>> >>
>> >> Mark
>> >>
>> >> On Wed, Mar 28, 2018 at 6:16 PM Myunggi Yi 
>> wrote:
>> >>
>> >> > I am attaching the file.
>> >> >
>> >> > Thank you.
>> >> >
>> >> > Myunggi Yi
>> >> >
>> >> > On Wed, Mar 28, 2018 at 11:40 AM, Szilárd Páll <
>> pall.szil...@gmail.com>
>> >> > wrote:
>> >> >
>> >> > > Again, please share the exact log files / description of inputs.
>> What
>> >> > > does "bad performance" mean?
>> >> > > --
>> >> > > Szilárd
>> >> > >
>> >> > >
>> >> > > On Wed, Mar 28, 2018 at 5:31 PM, Myunggi Yi 
>> >> > wrote:
>> >> > > > Dear users,
>> >> > > >
>> >> > > > I have two questions.
>> >> > > >
>> >> > > >
>> >> > > > 1. I used to run typical simulations with the following command.
>> >> > > >
>> >> > > > gmx mdrun -deffnm md
>> >> > > >
>> >> > > > I had no problem.
>> >> > > >
>> >> > > >
>> >> > > > Now I am running a simulation with "Dry_Martini" FF with the
>> >> following
>> >> > > > input.
>> >> > > >
>> >> > > >
>> >> > > > integrator   = sd
>> >> > > > tinit= 0.0
>> >> > > > dt   = 0.040
>> >> > > > nsteps   = 100
>> >> > > >
>> >> > > > nstlog   = 5000
>> >> > > > nstenergy= 5000
>> >> > > > nstxout-compressed   = 5000
>> >> > > > compressed-x-precision   = 100
>> >> > > >
>> >> > > > cutoff-scheme= Verlet
>> >> > > > nstlist  = 10
>> >> > > > ns_type  = grid
>> >> > > > pbc  = xyz
>> >> > > > verlet-buffer-tolerance  = 0.005
>> >> > > >
>> >> > > > epsilon_r= 15
>> >> > > > coulombtype  = reaction-field
>> >> > > > rcoulomb = 1.1
>> >> > > > vdw_type = cutoff
>> >> > > > vdw-modifier = Potential-shift-verlet
>> >> > > > rvdw = 1.1
>> >> > > >
>> >> > > > tc-grps  = system
>> >> > > > tau_t= 4.0
>> >> > > > ref_t= 310
>> >> > > >
>> >> > > > ; Pressure coupling:
>> >> > > > Pcoupl   = no
>> >> > > >
>> >> > > > ; GENERATE VELOCITIES FOR STARTUP RUN:
>> >> > > > gen_vel  = yes
>> >> > > > gen_temp = 310
>> >> > > > gen_seed = 1521731368
>> >> > > >
>> >> > > >
>> >> > > >
>> >> > > > If I use the same command to submit the job.
>> >> > > > I got the following error. I don't know why.
>> >> > > >
>> >> > > > ---
>> >> > > > Program: gmx mdrun, version 2018.1
>> >> > > > Source file: src/gromacs/taskassignment/resourcedivision.cpp
>> (line
>> >> 224)
>> >> > > >
>> >> > > > Fatal error:
>> >> > > > When using GPUs, setting the number of OpenMP threads without
>> >> > specifying
>> >> > > the
>> >> > > > number of ranks can lead to conflicting demands. Please specify
>> the
>> >> > > number
>> >> > > > of
>> >> > > > thread-MPI ranks as well (option -ntmpi).
>> >> > > >
>> >> > > > For more information and tips for troubleshooting, please check
>> the
>> >> > > GROMACS
>> >> > > > website at http://www.gromacs.org/Documentation/Errors
>> >> > > > ---
>> >> > > >
>> >> > > >
>> >> > > > So I did run simulation with the following command.
>> >> > > >
>> >> > >

Re: [gmx-users] Performance

2018-03-29 Thread Myunggi Yi

Dear Szilard,

Can you run this simulation?

Simulation doesn't crush and doesn't generate error message.
It take forever without updating report in log file or other output files.

Is this a bug?



On Thu, Mar 29, 2018 at 7:58 AM, Szilárd Páll 
wrote:

> Thanks. Looks like the messages and error handling is somewhat
> confusing; you must have the OMP_NUM_THREADS environment variable set
> which (just as setting -ntomp), without setting -ntmpi too is not
> supported.
>
> Either let mdrun decide about the thread count or set -ntmpi manually.
>
> --
> Szilárd
>
>
> On Wed, Mar 28, 2018 at 7:10 PM, Myunggi Yi  wrote:
> > Does it work?
> >
> > https://drive.google.com/open?id=1n5m1tNGbnV7oZnuAEgZ7gSP6qA6HluNl
> >
> > How about this?
> >
> >
> > Myunggi Yi
> >
> > On Wed, Mar 28, 2018 at 12:20 PM, Mark Abraham  >
> > wrote:
> >
> >> Hi,
> >>
> >> Attachments can't be accepted on the list - please upload to a file
> sharing
> >> service and share links to those.
> >>
> >> Mark
> >>
> >> On Wed, Mar 28, 2018 at 6:16 PM Myunggi Yi 
> wrote:
> >>
> >> > I am attaching the file.
> >> >
> >> > Thank you.
> >> >
> >> > Myunggi Yi
> >> >
> >> > On Wed, Mar 28, 2018 at 11:40 AM, Szilárd Páll <
> pall.szil...@gmail.com>
> >> > wrote:
> >> >
> >> > > Again, please share the exact log files / description of inputs.
> What
> >> > > does "bad performance" mean?
> >> > > --
> >> > > Szilárd
> >> > >
> >> > >
> >> > > On Wed, Mar 28, 2018 at 5:31 PM, Myunggi Yi 
> >> > wrote:
> >> > > > Dear users,
> >> > > >
> >> > > > I have two questions.
> >> > > >
> >> > > >
> >> > > > 1. I used to run typical simulations with the following command.
> >> > > >
> >> > > > gmx mdrun -deffnm md
> >> > > >
> >> > > > I had no problem.
> >> > > >
> >> > > >
> >> > > > Now I am running a simulation with "Dry_Martini" FF with the
> >> following
> >> > > > input.
> >> > > >
> >> > > >
> >> > > > integrator   = sd
> >> > > > tinit= 0.0
> >> > > > dt   = 0.040
> >> > > > nsteps   = 100
> >> > > >
> >> > > > nstlog   = 5000
> >> > > > nstenergy= 5000
> >> > > > nstxout-compressed   = 5000
> >> > > > compressed-x-precision   = 100
> >> > > >
> >> > > > cutoff-scheme= Verlet
> >> > > > nstlist  = 10
> >> > > > ns_type  = grid
> >> > > > pbc  = xyz
> >> > > > verlet-buffer-tolerance  = 0.005
> >> > > >
> >> > > > epsilon_r= 15
> >> > > > coulombtype  = reaction-field
> >> > > > rcoulomb = 1.1
> >> > > > vdw_type = cutoff
> >> > > > vdw-modifier = Potential-shift-verlet
> >> > > > rvdw = 1.1
> >> > > >
> >> > > > tc-grps  = system
> >> > > > tau_t= 4.0
> >> > > > ref_t= 310
> >> > > >
> >> > > > ; Pressure coupling:
> >> > > > Pcoupl   = no
> >> > > >
> >> > > > ; GENERATE VELOCITIES FOR STARTUP RUN:
> >> > > > gen_vel  = yes
> >> > > > gen_temp = 310
> >> > > > gen_seed = 1521731368
> >> > > >
> >> > > >
> >> > > >
> >> > > > If I use the same command to submit the job.
> >> > > > I got the following error. I don't know why.
> >> > > >
> >> > > > ---
> >> > > > Program: gmx mdrun, version 2018.1
> >> > > > Source file: src/gromacs/taskassignment/resourcedivision.cpp
> (line
> >> 224)
> >> > > >
> >> > > > Fatal error:
> >> > > > When using GPUs, setting the number of OpenMP threads without
> >> > specifying
> >> > > the
> >> > > > number of ranks can lead to conflicting demands. Please specify
> the
> >> > > number
> >> > > > of
> >> > > > thread-MPI ranks as well (option -ntmpi).
> >> > > >
> >> > > > For more information and tips for troubleshooting, please check
> the
> >> > > GROMACS
> >> > > > website at http://www.gromacs.org/Documentation/Errors
> >> > > > ---
> >> > > >
> >> > > >
> >> > > > So I did run simulation with the following command.
> >> > > >
> >> > > >
> >> > > > gmx mdrun -deffnm md -ntmpi 1
> >> > > >
> >> > > >
> >> > > > Now the performance is extremely bad.
> >> > > > Since yesterday, the log file still reporting the first step's
> >> energy.
> >> > > >
> >> > > > 2. This is the second question. Why?
> >> > > >
> >> > > > Can anyone help?
> >> > > >
> >> > > >
> >> > > > Myunggi Yi
> >> > > > --
> >> > > > Gromacs Users mailing list
> >> > > >
> >> > > > * Please search the archive at http://www.gromacs.org/Support
> >> > > /Mailing_Lists/GMX-Users_List before posting!
> >> > > >
> >> > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >> > > >
> >> > > > * For (un)subscribe requests visit
> >> > > >

Re: [gmx-users] Performance

2018-03-29 Thread Myunggi Yi

Thanks.

I used the exactly same program (same installation and same environment
variable).

How come this error depends on the *.mdp file?

I don't get it with typical simulations such as md with PME etc,
but I get it with this Dry_Martini mdp file (using sd, reaction field etc.).



Myunggi Yi

On Thu, Mar 29, 2018 at 7:58 AM, Szilárd Páll 
wrote:

> Thanks. Looks like the messages and error handling is somewhat
> confusing; you must have the OMP_NUM_THREADS environment variable set
> which (just as setting -ntomp), without setting -ntmpi too is not
> supported.
>
> Either let mdrun decide about the thread count or set -ntmpi manually.
>
> --
> Szilárd
>
>
> On Wed, Mar 28, 2018 at 7:10 PM, Myunggi Yi  wrote:
> > Does it work?
> >
> > https://drive.google.com/open?id=1n5m1tNGbnV7oZnuAEgZ7gSP6qA6HluNl
> >
> > How about this?
> >
> >
> > Myunggi Yi
> >
> > On Wed, Mar 28, 2018 at 12:20 PM, Mark Abraham  >
> > wrote:
> >
> >> Hi,
> >>
> >> Attachments can't be accepted on the list - please upload to a file
> sharing
> >> service and share links to those.
> >>
> >> Mark
> >>
> >> On Wed, Mar 28, 2018 at 6:16 PM Myunggi Yi 
> wrote:
> >>
> >> > I am attaching the file.
> >> >
> >> > Thank you.
> >> >
> >> > Myunggi Yi
> >> >
> >> > On Wed, Mar 28, 2018 at 11:40 AM, Szilárd Páll <
> pall.szil...@gmail.com>
> >> > wrote:
> >> >
> >> > > Again, please share the exact log files / description of inputs.
> What
> >> > > does "bad performance" mean?
> >> > > --
> >> > > Szilárd
> >> > >
> >> > >
> >> > > On Wed, Mar 28, 2018 at 5:31 PM, Myunggi Yi 
> >> > wrote:
> >> > > > Dear users,
> >> > > >
> >> > > > I have two questions.
> >> > > >
> >> > > >
> >> > > > 1. I used to run typical simulations with the following command.
> >> > > >
> >> > > > gmx mdrun -deffnm md
> >> > > >
> >> > > > I had no problem.
> >> > > >
> >> > > >
> >> > > > Now I am running a simulation with "Dry_Martini" FF with the
> >> following
> >> > > > input.
> >> > > >
> >> > > >
> >> > > > integrator   = sd
> >> > > > tinit= 0.0
> >> > > > dt   = 0.040
> >> > > > nsteps   = 100
> >> > > >
> >> > > > nstlog   = 5000
> >> > > > nstenergy= 5000
> >> > > > nstxout-compressed   = 5000
> >> > > > compressed-x-precision   = 100
> >> > > >
> >> > > > cutoff-scheme= Verlet
> >> > > > nstlist  = 10
> >> > > > ns_type  = grid
> >> > > > pbc  = xyz
> >> > > > verlet-buffer-tolerance  = 0.005
> >> > > >
> >> > > > epsilon_r= 15
> >> > > > coulombtype  = reaction-field
> >> > > > rcoulomb = 1.1
> >> > > > vdw_type = cutoff
> >> > > > vdw-modifier = Potential-shift-verlet
> >> > > > rvdw = 1.1
> >> > > >
> >> > > > tc-grps  = system
> >> > > > tau_t= 4.0
> >> > > > ref_t= 310
> >> > > >
> >> > > > ; Pressure coupling:
> >> > > > Pcoupl   = no
> >> > > >
> >> > > > ; GENERATE VELOCITIES FOR STARTUP RUN:
> >> > > > gen_vel  = yes
> >> > > > gen_temp = 310
> >> > > > gen_seed = 1521731368
> >> > > >
> >> > > >
> >> > > >
> >> > > > If I use the same command to submit the job.
> >> > > > I got the following error. I don't know why.
> >> > > >
> >> > > > ---
> >> > > > Program: gmx mdrun, version 2018.1
> >> > > > Source file: src/gromacs/taskassignment/resourcedivision.cpp
> (line
> >> 224)
> >> > > >
> >> > > > Fatal error:
> >> > > > When using GPUs, setting the number of OpenMP threads without
> >> > specifying
> >> > > the
> >> > > > number of ranks can lead to conflicting demands. Please specify
> the
> >> > > number
> >> > > > of
> >> > > > thread-MPI ranks as well (option -ntmpi).
> >> > > >
> >> > > > For more information and tips for troubleshooting, please check
> the
> >> > > GROMACS
> >> > > > website at http://www.gromacs.org/Documentation/Errors
> >> > > > ---
> >> > > >
> >> > > >
> >> > > > So I did run simulation with the following command.
> >> > > >
> >> > > >
> >> > > > gmx mdrun -deffnm md -ntmpi 1
> >> > > >
> >> > > >
> >> > > > Now the performance is extremely bad.
> >> > > > Since yesterday, the log file still reporting the first step's
> >> energy.
> >> > > >
> >> > > > 2. This is the second question. Why?
> >> > > >
> >> > > > Can anyone help?
> >> > > >
> >> > > >
> >> > > > Myunggi Yi
> >> > > > --
> >> > > > Gromacs Users mailing list
> >> > > >
> >> > > > * Please search the archive at http://www.gromacs.org/Support
> >> > > /Mailing_Lists/GMX-Users_List before posting!
> >> > > >
> >> > > > * Can't post? Read

Re: [gmx-users] Performance

2018-03-29 Thread Szilárd Páll

Thanks. Looks like the messages and error handling is somewhat
confusing; you must have the OMP_NUM_THREADS environment variable set
which (just as setting -ntomp), without setting -ntmpi too is not
supported.

Either let mdrun decide about the thread count or set -ntmpi manually.

--
Szilárd


On Wed, Mar 28, 2018 at 7:10 PM, Myunggi Yi  wrote:
> Does it work?
>
> https://drive.google.com/open?id=1n5m1tNGbnV7oZnuAEgZ7gSP6qA6HluNl
>
> How about this?
>
>
> Myunggi Yi
>
> On Wed, Mar 28, 2018 at 12:20 PM, Mark Abraham 
> wrote:
>
>> Hi,
>>
>> Attachments can't be accepted on the list - please upload to a file sharing
>> service and share links to those.
>>
>> Mark
>>
>> On Wed, Mar 28, 2018 at 6:16 PM Myunggi Yi  wrote:
>>
>> > I am attaching the file.
>> >
>> > Thank you.
>> >
>> > Myunggi Yi
>> >
>> > On Wed, Mar 28, 2018 at 11:40 AM, Szilárd Páll 
>> > wrote:
>> >
>> > > Again, please share the exact log files / description of inputs. What
>> > > does "bad performance" mean?
>> > > --
>> > > Szilárd
>> > >
>> > >
>> > > On Wed, Mar 28, 2018 at 5:31 PM, Myunggi Yi 
>> > wrote:
>> > > > Dear users,
>> > > >
>> > > > I have two questions.
>> > > >
>> > > >
>> > > > 1. I used to run typical simulations with the following command.
>> > > >
>> > > > gmx mdrun -deffnm md
>> > > >
>> > > > I had no problem.
>> > > >
>> > > >
>> > > > Now I am running a simulation with "Dry_Martini" FF with the
>> following
>> > > > input.
>> > > >
>> > > >
>> > > > integrator   = sd
>> > > > tinit= 0.0
>> > > > dt   = 0.040
>> > > > nsteps   = 100
>> > > >
>> > > > nstlog   = 5000
>> > > > nstenergy= 5000
>> > > > nstxout-compressed   = 5000
>> > > > compressed-x-precision   = 100
>> > > >
>> > > > cutoff-scheme= Verlet
>> > > > nstlist  = 10
>> > > > ns_type  = grid
>> > > > pbc  = xyz
>> > > > verlet-buffer-tolerance  = 0.005
>> > > >
>> > > > epsilon_r= 15
>> > > > coulombtype  = reaction-field
>> > > > rcoulomb = 1.1
>> > > > vdw_type = cutoff
>> > > > vdw-modifier = Potential-shift-verlet
>> > > > rvdw = 1.1
>> > > >
>> > > > tc-grps  = system
>> > > > tau_t= 4.0
>> > > > ref_t= 310
>> > > >
>> > > > ; Pressure coupling:
>> > > > Pcoupl   = no
>> > > >
>> > > > ; GENERATE VELOCITIES FOR STARTUP RUN:
>> > > > gen_vel  = yes
>> > > > gen_temp = 310
>> > > > gen_seed = 1521731368
>> > > >
>> > > >
>> > > >
>> > > > If I use the same command to submit the job.
>> > > > I got the following error. I don't know why.
>> > > >
>> > > > ---
>> > > > Program: gmx mdrun, version 2018.1
>> > > > Source file: src/gromacs/taskassignment/resourcedivision.cpp (line
>> 224)
>> > > >
>> > > > Fatal error:
>> > > > When using GPUs, setting the number of OpenMP threads without
>> > specifying
>> > > the
>> > > > number of ranks can lead to conflicting demands. Please specify the
>> > > number
>> > > > of
>> > > > thread-MPI ranks as well (option -ntmpi).
>> > > >
>> > > > For more information and tips for troubleshooting, please check the
>> > > GROMACS
>> > > > website at http://www.gromacs.org/Documentation/Errors
>> > > > ---
>> > > >
>> > > >
>> > > > So I did run simulation with the following command.
>> > > >
>> > > >
>> > > > gmx mdrun -deffnm md -ntmpi 1
>> > > >
>> > > >
>> > > > Now the performance is extremely bad.
>> > > > Since yesterday, the log file still reporting the first step's
>> energy.
>> > > >
>> > > > 2. This is the second question. Why?
>> > > >
>> > > > Can anyone help?
>> > > >
>> > > >
>> > > > Myunggi Yi
>> > > > --
>> > > > Gromacs Users mailing list
>> > > >
>> > > > * Please search the archive at http://www.gromacs.org/Support
>> > > /Mailing_Lists/GMX-Users_List before posting!
>> > > >
>> > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> > > >
>> > > > * For (un)subscribe requests visit
>> > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>> or
>> > > send a mail to gmx-users-requ...@gromacs.org.
>> > > --
>> > > Gromacs Users mailing list
>> > >
>> > > * Please search the archive at http://www.gromacs.org/Support
>> > > /Mailing_Lists/GMX-Users_List before posting!
>> > >
>> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> > >
>> > > * For (un)subscribe requests visit
>> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> > > send a mail to gmx-users-requ...@gromacs.org.
>> > --
>> > Gromacs Users mailing list
>> >
>>

Re: [gmx-users] Performance

2018-03-28 Thread Myunggi Yi

Does it work?

https://drive.google.com/open?id=1n5m1tNGbnV7oZnuAEgZ7gSP6qA6HluNl

How about this?


Myunggi Yi

On Wed, Mar 28, 2018 at 12:20 PM, Mark Abraham 
wrote:

> Hi,
>
> Attachments can't be accepted on the list - please upload to a file sharing
> service and share links to those.
>
> Mark
>
> On Wed, Mar 28, 2018 at 6:16 PM Myunggi Yi  wrote:
>
> > I am attaching the file.
> >
> > Thank you.
> >
> > Myunggi Yi
> >
> > On Wed, Mar 28, 2018 at 11:40 AM, Szilárd Páll 
> > wrote:
> >
> > > Again, please share the exact log files / description of inputs. What
> > > does "bad performance" mean?
> > > --
> > > Szilárd
> > >
> > >
> > > On Wed, Mar 28, 2018 at 5:31 PM, Myunggi Yi 
> > wrote:
> > > > Dear users,
> > > >
> > > > I have two questions.
> > > >
> > > >
> > > > 1. I used to run typical simulations with the following command.
> > > >
> > > > gmx mdrun -deffnm md
> > > >
> > > > I had no problem.
> > > >
> > > >
> > > > Now I am running a simulation with "Dry_Martini" FF with the
> following
> > > > input.
> > > >
> > > >
> > > > integrator   = sd
> > > > tinit= 0.0
> > > > dt   = 0.040
> > > > nsteps   = 100
> > > >
> > > > nstlog   = 5000
> > > > nstenergy= 5000
> > > > nstxout-compressed   = 5000
> > > > compressed-x-precision   = 100
> > > >
> > > > cutoff-scheme= Verlet
> > > > nstlist  = 10
> > > > ns_type  = grid
> > > > pbc  = xyz
> > > > verlet-buffer-tolerance  = 0.005
> > > >
> > > > epsilon_r= 15
> > > > coulombtype  = reaction-field
> > > > rcoulomb = 1.1
> > > > vdw_type = cutoff
> > > > vdw-modifier = Potential-shift-verlet
> > > > rvdw = 1.1
> > > >
> > > > tc-grps  = system
> > > > tau_t= 4.0
> > > > ref_t= 310
> > > >
> > > > ; Pressure coupling:
> > > > Pcoupl   = no
> > > >
> > > > ; GENERATE VELOCITIES FOR STARTUP RUN:
> > > > gen_vel  = yes
> > > > gen_temp = 310
> > > > gen_seed = 1521731368
> > > >
> > > >
> > > >
> > > > If I use the same command to submit the job.
> > > > I got the following error. I don't know why.
> > > >
> > > > ---
> > > > Program: gmx mdrun, version 2018.1
> > > > Source file: src/gromacs/taskassignment/resourcedivision.cpp (line
> 224)
> > > >
> > > > Fatal error:
> > > > When using GPUs, setting the number of OpenMP threads without
> > specifying
> > > the
> > > > number of ranks can lead to conflicting demands. Please specify the
> > > number
> > > > of
> > > > thread-MPI ranks as well (option -ntmpi).
> > > >
> > > > For more information and tips for troubleshooting, please check the
> > > GROMACS
> > > > website at http://www.gromacs.org/Documentation/Errors
> > > > ---
> > > >
> > > >
> > > > So I did run simulation with the following command.
> > > >
> > > >
> > > > gmx mdrun -deffnm md -ntmpi 1
> > > >
> > > >
> > > > Now the performance is extremely bad.
> > > > Since yesterday, the log file still reporting the first step's
> energy.
> > > >
> > > > 2. This is the second question. Why?
> > > >
> > > > Can anyone help?
> > > >
> > > >
> > > > Myunggi Yi
> > > > --
> > > > Gromacs Users mailing list
> > > >
> > > > * Please search the archive at http://www.gromacs.org/Support
> > > /Mailing_Lists/GMX-Users_List before posting!
> > > >
> > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > >
> > > > * For (un)subscribe requests visit
> > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> or
> > > send a mail to gmx-users-requ...@gromacs.org.
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at http://www.gromacs.org/Support
> > > /Mailing_Lists/GMX-Users_List before posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > > send a mail to gmx-users-requ...@gromacs.org.
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-requ...@gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/
> Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read

Re: [gmx-users] Performance

2018-03-28 Thread Myunggi Yi

I see.

I am trying again.

 ves



Myunggi Yi  (이명기, Ph. D.), Professor

Department of Biomedical Engineering (의공학과 bme.pknu.ac.kr), College of
Engineering
Interdisciplinary Program of Biomedical Mechanical & Electrical Engineering
Center for Marine-Integrated Biomedical Technology (BK21+)
College of Engineering
Pukyong National University (부경대학교 www.pknu.ac.kr)
45 Yongso-ro, Nam-gu (남구 용소로 45)
Busan, 48513, South Korea
Phone: +82 51 629 5773
Fax: +82 51 629 5779

On Wed, Mar 28, 2018 at 12:20 PM, Mark Abraham 
wrote:

> Hi,
>
> Attachments can't be accepted on the list - please upload to a file sharing
> service and share links to those.
>
> Mark
>
> On Wed, Mar 28, 2018 at 6:16 PM Myunggi Yi  wrote:
>
> > I am attaching the file.
> >
> > Thank you.
> >
> > Myunggi Yi
> >
> > On Wed, Mar 28, 2018 at 11:40 AM, Szilárd Páll 
> > wrote:
> >
> > > Again, please share the exact log files / description of inputs. What
> > > does "bad performance" mean?
> > > --
> > > Szilárd
> > >
> > >
> > > On Wed, Mar 28, 2018 at 5:31 PM, Myunggi Yi 
> > wrote:
> > > > Dear users,
> > > >
> > > > I have two questions.
> > > >
> > > >
> > > > 1. I used to run typical simulations with the following command.
> > > >
> > > > gmx mdrun -deffnm md
> > > >
> > > > I had no problem.
> > > >
> > > >
> > > > Now I am running a simulation with "Dry_Martini" FF with the
> following
> > > > input.
> > > >
> > > >
> > > > integrator   = sd
> > > > tinit= 0.0
> > > > dt   = 0.040
> > > > nsteps   = 100
> > > >
> > > > nstlog   = 5000
> > > > nstenergy= 5000
> > > > nstxout-compressed   = 5000
> > > > compressed-x-precision   = 100
> > > >
> > > > cutoff-scheme= Verlet
> > > > nstlist  = 10
> > > > ns_type  = grid
> > > > pbc  = xyz
> > > > verlet-buffer-tolerance  = 0.005
> > > >
> > > > epsilon_r= 15
> > > > coulombtype  = reaction-field
> > > > rcoulomb = 1.1
> > > > vdw_type = cutoff
> > > > vdw-modifier = Potential-shift-verlet
> > > > rvdw = 1.1
> > > >
> > > > tc-grps  = system
> > > > tau_t= 4.0
> > > > ref_t= 310
> > > >
> > > > ; Pressure coupling:
> > > > Pcoupl   = no
> > > >
> > > > ; GENERATE VELOCITIES FOR STARTUP RUN:
> > > > gen_vel  = yes
> > > > gen_temp = 310
> > > > gen_seed = 1521731368
> > > >
> > > >
> > > >
> > > > If I use the same command to submit the job.
> > > > I got the following error. I don't know why.
> > > >
> > > > ---
> > > > Program: gmx mdrun, version 2018.1
> > > > Source file: src/gromacs/taskassignment/resourcedivision.cpp (line
> 224)
> > > >
> > > > Fatal error:
> > > > When using GPUs, setting the number of OpenMP threads without
> > specifying
> > > the
> > > > number of ranks can lead to conflicting demands. Please specify the
> > > number
> > > > of
> > > > thread-MPI ranks as well (option -ntmpi).
> > > >
> > > > For more information and tips for troubleshooting, please check the
> > > GROMACS
> > > > website at http://www.gromacs.org/Documentation/Errors
> > > > ---
> > > >
> > > >
> > > > So I did run simulation with the following command.
> > > >
> > > >
> > > > gmx mdrun -deffnm md -ntmpi 1
> > > >
> > > >
> > > > Now the performance is extremely bad.
> > > > Since yesterday, the log file still reporting the first step's
> energy.
> > > >
> > > > 2. This is the second question. Why?
> > > >
> > > > Can anyone help?
> > > >
> > > >
> > > > Myunggi Yi
> > > > --
> > > > Gromacs Users mailing list
> > > >
> > > > * Please search the archive at http://www.gromacs.org/Support
> > > /Mailing_Lists/GMX-Users_List before posting!
> > > >
> > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > >
> > > > * For (un)subscribe requests visit
> > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> or
> > > send a mail to gmx-users-requ...@gromacs.org.
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at http://www.gromacs.org/Support
> > > /Mailing_Lists/GMX-Users_List before posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > > send a mail to gmx-users-requ...@gromacs.org.
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List

Re: [gmx-users] Performance

2018-03-28 Thread Mark Abraham

Hi,

Attachments can't be accepted on the list - please upload to a file sharing
service and share links to those.

Mark

On Wed, Mar 28, 2018 at 6:16 PM Myunggi Yi  wrote:

> I am attaching the file.
>
> Thank you.
>
> Myunggi Yi
>
> On Wed, Mar 28, 2018 at 11:40 AM, Szilárd Páll 
> wrote:
>
> > Again, please share the exact log files / description of inputs. What
> > does "bad performance" mean?
> > --
> > Szilárd
> >
> >
> > On Wed, Mar 28, 2018 at 5:31 PM, Myunggi Yi 
> wrote:
> > > Dear users,
> > >
> > > I have two questions.
> > >
> > >
> > > 1. I used to run typical simulations with the following command.
> > >
> > > gmx mdrun -deffnm md
> > >
> > > I had no problem.
> > >
> > >
> > > Now I am running a simulation with "Dry_Martini" FF with the following
> > > input.
> > >
> > >
> > > integrator   = sd
> > > tinit= 0.0
> > > dt   = 0.040
> > > nsteps   = 100
> > >
> > > nstlog   = 5000
> > > nstenergy= 5000
> > > nstxout-compressed   = 5000
> > > compressed-x-precision   = 100
> > >
> > > cutoff-scheme= Verlet
> > > nstlist  = 10
> > > ns_type  = grid
> > > pbc  = xyz
> > > verlet-buffer-tolerance  = 0.005
> > >
> > > epsilon_r= 15
> > > coulombtype  = reaction-field
> > > rcoulomb = 1.1
> > > vdw_type = cutoff
> > > vdw-modifier = Potential-shift-verlet
> > > rvdw = 1.1
> > >
> > > tc-grps  = system
> > > tau_t= 4.0
> > > ref_t= 310
> > >
> > > ; Pressure coupling:
> > > Pcoupl   = no
> > >
> > > ; GENERATE VELOCITIES FOR STARTUP RUN:
> > > gen_vel  = yes
> > > gen_temp = 310
> > > gen_seed = 1521731368
> > >
> > >
> > >
> > > If I use the same command to submit the job.
> > > I got the following error. I don't know why.
> > >
> > > ---
> > > Program: gmx mdrun, version 2018.1
> > > Source file: src/gromacs/taskassignment/resourcedivision.cpp (line 224)
> > >
> > > Fatal error:
> > > When using GPUs, setting the number of OpenMP threads without
> specifying
> > the
> > > number of ranks can lead to conflicting demands. Please specify the
> > number
> > > of
> > > thread-MPI ranks as well (option -ntmpi).
> > >
> > > For more information and tips for troubleshooting, please check the
> > GROMACS
> > > website at http://www.gromacs.org/Documentation/Errors
> > > ---
> > >
> > >
> > > So I did run simulation with the following command.
> > >
> > >
> > > gmx mdrun -deffnm md -ntmpi 1
> > >
> > >
> > > Now the performance is extremely bad.
> > > Since yesterday, the log file still reporting the first step's energy.
> > >
> > > 2. This is the second question. Why?
> > >
> > > Can anyone help?
> > >
> > >
> > > Myunggi Yi
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at http://www.gromacs.org/Support
> > /Mailing_Lists/GMX-Users_List before posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-requ...@gromacs.org.
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at http://www.gromacs.org/Support
> > /Mailing_Lists/GMX-Users_List before posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-requ...@gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@gromacs.org.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Performance

2018-03-28 Thread Myunggi Yi

I am attaching the file.

Thank you.

Myunggi Yi

On Wed, Mar 28, 2018 at 11:40 AM, Szilárd Páll 
wrote:

> Again, please share the exact log files / description of inputs. What
> does "bad performance" mean?
> --
> Szilárd
>
>
> On Wed, Mar 28, 2018 at 5:31 PM, Myunggi Yi  wrote:
> > Dear users,
> >
> > I have two questions.
> >
> >
> > 1. I used to run typical simulations with the following command.
> >
> > gmx mdrun -deffnm md
> >
> > I had no problem.
> >
> >
> > Now I am running a simulation with "Dry_Martini" FF with the following
> > input.
> >
> >
> > integrator   = sd
> > tinit= 0.0
> > dt   = 0.040
> > nsteps   = 100
> >
> > nstlog   = 5000
> > nstenergy= 5000
> > nstxout-compressed   = 5000
> > compressed-x-precision   = 100
> >
> > cutoff-scheme= Verlet
> > nstlist  = 10
> > ns_type  = grid
> > pbc  = xyz
> > verlet-buffer-tolerance  = 0.005
> >
> > epsilon_r= 15
> > coulombtype  = reaction-field
> > rcoulomb = 1.1
> > vdw_type = cutoff
> > vdw-modifier = Potential-shift-verlet
> > rvdw = 1.1
> >
> > tc-grps  = system
> > tau_t= 4.0
> > ref_t= 310
> >
> > ; Pressure coupling:
> > Pcoupl   = no
> >
> > ; GENERATE VELOCITIES FOR STARTUP RUN:
> > gen_vel  = yes
> > gen_temp = 310
> > gen_seed = 1521731368
> >
> >
> >
> > If I use the same command to submit the job.
> > I got the following error. I don't know why.
> >
> > ---
> > Program: gmx mdrun, version 2018.1
> > Source file: src/gromacs/taskassignment/resourcedivision.cpp (line 224)
> >
> > Fatal error:
> > When using GPUs, setting the number of OpenMP threads without specifying
> the
> > number of ranks can lead to conflicting demands. Please specify the
> number
> > of
> > thread-MPI ranks as well (option -ntmpi).
> >
> > For more information and tips for troubleshooting, please check the
> GROMACS
> > website at http://www.gromacs.org/Documentation/Errors
> > ---
> >
> >
> > So I did run simulation with the following command.
> >
> >
> > gmx mdrun -deffnm md -ntmpi 1
> >
> >
> > Now the performance is extremely bad.
> > Since yesterday, the log file still reporting the first step's energy.
> >
> > 2. This is the second question. Why?
> >
> > Can anyone help?
> >
> >
> > Myunggi Yi
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at http://www.gromacs.org/Support
> /Mailing_Lists/GMX-Users_List before posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support
> /Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@gromacs.org.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Performance

2018-03-28 Thread Szilárd Páll

Again, please share the exact log files / description of inputs. What
does "bad performance" mean?
--
Szilárd


On Wed, Mar 28, 2018 at 5:31 PM, Myunggi Yi  wrote:
> Dear users,
>
> I have two questions.
>
>
> 1. I used to run typical simulations with the following command.
>
> gmx mdrun -deffnm md
>
> I had no problem.
>
>
> Now I am running a simulation with "Dry_Martini" FF with the following
> input.
>
>
> integrator   = sd
> tinit= 0.0
> dt   = 0.040
> nsteps   = 100
>
> nstlog   = 5000
> nstenergy= 5000
> nstxout-compressed   = 5000
> compressed-x-precision   = 100
>
> cutoff-scheme= Verlet
> nstlist  = 10
> ns_type  = grid
> pbc  = xyz
> verlet-buffer-tolerance  = 0.005
>
> epsilon_r= 15
> coulombtype  = reaction-field
> rcoulomb = 1.1
> vdw_type = cutoff
> vdw-modifier = Potential-shift-verlet
> rvdw = 1.1
>
> tc-grps  = system
> tau_t= 4.0
> ref_t= 310
>
> ; Pressure coupling:
> Pcoupl   = no
>
> ; GENERATE VELOCITIES FOR STARTUP RUN:
> gen_vel  = yes
> gen_temp = 310
> gen_seed = 1521731368
>
>
>
> If I use the same command to submit the job.
> I got the following error. I don't know why.
>
> ---
> Program: gmx mdrun, version 2018.1
> Source file: src/gromacs/taskassignment/resourcedivision.cpp (line 224)
>
> Fatal error:
> When using GPUs, setting the number of OpenMP threads without specifying the
> number of ranks can lead to conflicting demands. Please specify the number
> of
> thread-MPI ranks as well (option -ntmpi).
>
> For more information and tips for troubleshooting, please check the GROMACS
> website at http://www.gromacs.org/Documentation/Errors
> ---
>
>
> So I did run simulation with the following command.
>
>
> gmx mdrun -deffnm md -ntmpi 1
>
>
> Now the performance is extremely bad.
> Since yesterday, the log file still reporting the first step's energy.
>
> 2. This is the second question. Why?
>
> Can anyone help?
>
>
> Myunggi Yi
> --
> Gromacs Users mailing list
>
> * Please search the archive at 
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
> mail to gmx-users-requ...@gromacs.org.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

[gmx-users] Performance

2018-03-28 Thread Myunggi Yi

Dear users,

I have two questions.


1. I used to run typical simulations with the following command.

gmx mdrun -deffnm md

I had no problem.


Now I am running a simulation with "Dry_Martini" FF with the following
input.


integrator   = sd
tinit= 0.0
dt   = 0.040
nsteps   = 100

nstlog   = 5000
nstenergy= 5000
nstxout-compressed   = 5000
compressed-x-precision   = 100

cutoff-scheme= Verlet
nstlist  = 10
ns_type  = grid
pbc  = xyz
verlet-buffer-tolerance  = 0.005

epsilon_r= 15
coulombtype  = reaction-field
rcoulomb = 1.1
vdw_type = cutoff
vdw-modifier = Potential-shift-verlet
rvdw = 1.1

tc-grps  = system
tau_t= 4.0
ref_t= 310

; Pressure coupling:
Pcoupl   = no

; GENERATE VELOCITIES FOR STARTUP RUN:
gen_vel  = yes
gen_temp = 310
gen_seed = 1521731368



If I use the same command to submit the job.
I got the following error. I don't know why.

---
Program: gmx mdrun, version 2018.1
Source file: src/gromacs/taskassignment/resourcedivision.cpp (line 224)

Fatal error:
When using GPUs, setting the number of OpenMP threads without specifying the
number of ranks can lead to conflicting demands. Please specify the number
of
thread-MPI ranks as well (option -ntmpi).

For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
---


So I did run simulation with the following command.


gmx mdrun -deffnm md -ntmpi 1


Now the performance is extremely bad.
Since yesterday, the log file still reporting the first step's energy.

2. This is the second question. Why?

Can anyone help?


Myunggi Yi
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Performance gains with AVX_512 ?

2017-12-12 Thread Kutzner, Carsten

Hi Szilárd,

> On 12. Dec 2017, at 17:58, Szilárd Páll  wrote:
> 
> Hi Carsten,
> 
> The performance behavior you observe is expected, I have observed it
> myself. Nothing seems unusual in the performance numbers you report.
> 
> The AVX512 clock throttle is additional (10-20% IIRC) to the AVX2 throttle,
> and the only code that really gains significantly from AVX512 is the
> nonbonded kernels. When those are offloaded, the gain from higher clocks
> with AVX2 will translate to better CPU performance (and especially if the
> run is CPU-bound, that will make a significant difference).
> 
> BTW, on the low- and mid-range CPUs ("Bronze"/"Silver" and "cut-down" i9s)
> AVX512 is even less likely to ever be worth it.
So using AVX2 on GPU nodes seems generally to be the fastest option. 
Thanks a lot for the info! 

Best,
  Carsten

> 
> Cheers,
> 
> --
> Szilárd
> 
> On Tue, Dec 12, 2017 at 3:07 PM, Kutzner, Carsten  wrote:
> 
>> Hi,
>> 
>> what are the expected performance benefits of AVX_512 SIMD instructions
>> on Intel Skylake processors, compared to AVX2_256? In many cases, I see
>> a significantly (15 %) higher GROMACS 2016 / 2018b2 performance when using
>> AVX2_256 instead of AVX_512. I would have guessed that AVX_512 is at least
>> not slower than inferior instruction sets.
>> 
>> Some quick benchmarks results:
>> Node with 2x12 core (48 threads) Xeon Gold 6146 plus 2x GTX 1080Ti
>> 80k atoms membrane benchmark system, 2 fs time step, pme on cpu
>> 
>> GROMACS v.SIMDns/d
>> 2016  AVX_512 102.3
>> 2016  AVX2_256119.3
>> 2018b2AVX_512 107.9
>> 2018b2AVX2_256123.2
>> 
>> I realize that AVX_512 turbo frequencies are significantly lower
>> compared to AVX2_256 if all cores are in use, and for a serial run,
>> AVX_512 is indeed by about 6% faster than AVX2_256.
>> 
> 
> By "serial" you mean single threaded runs? Single-core turbo on this 165W
> CPU will be pretty high (>=4.2 GHz) and it will not likely to reflect the
> relative difference at the base-clock.
> 
> Gromacs 2018b2, -nb cpu
>> thread-MPI  ns/day   ns/day improvement
>> threads AVX_512  AVX2_256   over AVX2
>> 1   2.8802.702 1.065
>> 2   5.4515.209 1.046
>> 4   9.6179.332 1.031
>> 8  17.469   17.276 1.011
>> 12  21.852   24.245  .901
>> 16  28.579   31.691  .902
>> 24  39.731   41.576  .956
>> 48  41.831   39.336 1.063
>> 
> 
> Does this mean that for all but row 5,7, and 8 last two rows you left
> socket(s) partially empty?
> 
> 
> Cheers,
> --
> Szilárd
> 
> 
>> Can anyone comment on whether that is the expected behavior and why?
>> 
>> Thanks!
>>  Carsten
>> 
>> 
>> 
>> --
>> Gromacs Users mailing list
>> 
>> * Please search the archive at http://www.gromacs.org/
>> Support/Mailing_Lists/GMX-Users_List before posting!
>> 
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> 
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-requ...@gromacs.org.
>> 
> -- 
> Gromacs Users mailing list
> 
> * Please search the archive at 
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
> 
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> 
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
> mail to gmx-users-requ...@gromacs.org.

-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Performance gains with AVX_512 ?

2017-12-12 Thread Szilárd Páll

Hi Carsten,

The performance behavior you observe is expected, I have observed it
myself. Nothing seems unusual in the performance numbers you report.

The AVX512 clock throttle is additional (10-20% IIRC) to the AVX2 throttle,
and the only code that really gains significantly from AVX512 is the
nonbonded kernels. When those are offloaded, the gain from higher clocks
with AVX2 will translate to better CPU performance (and especially if the
run is CPU-bound, that will make a significant difference).

BTW, on the low- and mid-range CPUs ("Bronze"/"Silver" and "cut-down" i9s)
AVX512 is even less likely to ever be worth it.

Cheers,

--
Szilárd

On Tue, Dec 12, 2017 at 3:07 PM, Kutzner, Carsten  wrote:

> Hi,
>
> what are the expected performance benefits of AVX_512 SIMD instructions
> on Intel Skylake processors, compared to AVX2_256? In many cases, I see
> a significantly (15 %) higher GROMACS 2016 / 2018b2 performance when using
> AVX2_256 instead of AVX_512. I would have guessed that AVX_512 is at least
> not slower than inferior instruction sets.
>
> Some quick benchmarks results:
> Node with 2x12 core (48 threads) Xeon Gold 6146 plus 2x GTX 1080Ti
> 80k atoms membrane benchmark system, 2 fs time step, pme on cpu
>
> GROMACS v.SIMDns/d
> 2016  AVX_512 102.3
> 2016  AVX2_256119.3
> 2018b2AVX_512 107.9
> 2018b2AVX2_256123.2
>
> I realize that AVX_512 turbo frequencies are significantly lower
> compared to AVX2_256 if all cores are in use, and for a serial run,
> AVX_512 is indeed by about 6% faster than AVX2_256.
>

By "serial" you mean single threaded runs? Single-core turbo on this 165W
CPU will be pretty high (>=4.2 GHz) and it will not likely to reflect the
relative difference at the base-clock.

Gromacs 2018b2, -nb cpu
> thread-MPI  ns/day   ns/day improvement
> threads AVX_512  AVX2_256   over AVX2
>  1   2.8802.702 1.065
>  2   5.4515.209 1.046
>  4   9.6179.332 1.031
>  8  17.469   17.276 1.011
> 12  21.852   24.245  .901
> 16  28.579   31.691  .902
> 24  39.731   41.576  .956
> 48  41.831   39.336 1.063
>

Does this mean that for all but row 5,7, and 8 last two rows you left
socket(s) partially empty?

Cheers,
--
Szilárd

> Can anyone comment on whether that is the expected behavior and why?
>
> Thanks!
>   Carsten
>
>
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/
> Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@gromacs.org.
>
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

[gmx-users] Performance gains with AVX_512 ?

2017-12-12 Thread Kutzner, Carsten

Hi,

what are the expected performance benefits of AVX_512 SIMD instructions
on Intel Skylake processors, compared to AVX2_256? In many cases, I see
a significantly (15 %) higher GROMACS 2016 / 2018b2 performance when using
AVX2_256 instead of AVX_512. I would have guessed that AVX_512 is at least
not slower than inferior instruction sets.

Some quick benchmarks results:
Node with 2x12 core (48 threads) Xeon Gold 6146 plus 2x GTX 1080Ti
80k atoms membrane benchmark system, 2 fs time step, pme on cpu

GROMACS v.SIMDns/d
2016  AVX_512 102.3
2016  AVX2_256119.3
2018b2AVX_512 107.9
2018b2AVX2_256123.2

I realize that AVX_512 turbo frequencies are significantly lower
compared to AVX2_256 if all cores are in use, and for a serial run,
AVX_512 is indeed by about 6% faster than AVX2_256.

Gromacs 2018b2, -nb cpu
thread-MPI  ns/day   ns/day improvement
threads AVX_512  AVX2_256   over AVX2
 1   2.8802.702 1.065
 2   5.4515.209 1.046
 4   9.6179.332 1.031
 8  17.469   17.276 1.011
12  21.852   24.245  .901
16  28.579   31.691  .902
24  39.731   41.576  .956
48  41.831   39.336 1.063

Can anyone comment on whether that is the expected behavior and why?

Thanks!
  Carsten



-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Performance test

2017-11-27 Thread Javier Luque Di Salvo

Dear community,
I share the results of scaling- performance test. I used this command and
checked the core usage with the help of htop tool (http://hisham.hm/htop/):
gmx mdrun -ntmpi 1 -ntomp N -pin on -deffnm  &

Where N is the number of (logical) cores, and hardware is Intel(R) Core(TM)
i7-6700 @3.40 GHz, 16 GB RAM and no GPU. I tested two polymer chains of
different size (psu10= 552 atoms; psu36= 1956 atoms), 1ns NPT simulations
of the previously-equilibrated system, md settings were Berendsen
thermostat and barostat, V-rescale, time step 1fs, rcut-off 1.0 nm, PME for
coulombic computation. In this link the figures:
https://goo.gl/bVZKcU

And the table with the values if problems in opening the figures:
PSU10 (552 atoms)
N   wall-time  ns/day
1   1057.166   81.7025
2   631.117  136.908
3   461.265  187.448
4   352.821  244.886
5   440.070  196.393
6   386.782  223.346
7   348.273  248.083
8   389.243  255.187
--
PSU36 (1956 atoms)
N   wall-time  ns/day
1   2259.990  38.231
2   1254.619  68.870
3   875.394   99.267
4   672.042   128.570
5   822.385   105.056
6   712.061   121.338
7   628.172   137.551
8   576.145   149.963

Kind regards,
Javi

2017-11-21 13:50 GMT+01:00 Javier E :

> Dear users,
>
> I'm doing a performance analysis following this link
> http://manual.gromacs.org/documentation/5.1/user-guide/
> mdrun-performance.html and wanted to ask:
>
> Is there a "standard" procedure to test performance in gromacs (on single
> nodes, one multi-processor CPU)? Following there are some results, the
> system is a small polymeric chain of 542 atoms with no water and NPT 100 ps
> runs (if more information about md settings are needed please ask):
>
> Running on 1 node with total 4 cores, 8 logical cores
> Hardware detected:
>   CPU info:
> Vendor: GenuineIntel
> Brand:  Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
> SIMD instructions most likely to fit this hardware: AVX2_256
> SIMD instructions selected at GROMACS compile time: AVX2_256
>
> GROMACS version: VERSION 5.1.4
> Precision:single
> Memory model:64 bit
> MPI library:  thread_mpi
> OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 32)
> GPU support:   disabled
> OpenCL support:  disabled
>
>
> gmx mdrun -ntmpi 1 -ntomp # -v -deffnm out.tpr
>
>
> ___
> -ntomp  | MPI / OpenMP- | Wall time (s)|ns/day   |  % CPU   |
> Note?* |
> 
> -
>1|  1/1 | 1075.764  |80.315|
> 100.0   |   No   |
>2|  1/2 |  619.679   |   139.427   |
> 200.0   |   Yes |
>3|  1/3 |  458.721   |   188.350   |
> 299.9   |   Yes |
>4|  1/4 |  356.906   |   242.081   |
> 399.8   |   Yes |
>5|  1/5 |  433.572   |   199.275   |
> 499.0   |   Yes |
>6|  1/6 |  378.951   |   227.998   |
> 598.0   |   Yes |
>7|  1/7 |  355.785   |   242.844   |
> 693.1   |   Yes |
>8|  1/8 (default)|  328.520   |   262.081   |
> 779.0   |   No   |
> 
> -
>
> *NOTE: The number of threads is not equal to the number of (logical) cores
>   and the -pin option is set to auto: will not pin thread to cores.
>
>
> If (MPI-Threads)*(OpenMP-Threads) = number of threads, does mdrun uses
> number of cores= number of threads, and this can be seen in the %CPU usage?
>
> For example, as I installed GROMACS in default, the GMX_OpenMP_MAX_THREAD
> is set at 32 threads, but this will never happen with this hardware (4
> cores, 8 logical), is this correct? By now I'm re-running the exact same
> tests to have at least one replica, and extending the system size and the
> and run time. Any suggestions on how to deep further in this kind of tests
> are welcome,
>
> Best regards
> --
>
> 
>
> *Javier Luque Di Salvo*
>
> Dipartamento di Ingegneria Chimica
>
> Universitá Degli Studi di Palermo
> *Viale delle Scienze, Ed. 6*
> *90128 PALERMO (PA)*
> *+39.09123867503 <+39%20091%202386%207503>*
>



-- 



*Javier Luque Di Salvo*

Dipartamento di Ingegneria Chimica

Universitá Degli Studi di Palermo
*Viale delle Scienze, Ed. 6*
*90128 PALERMO (PA)*
*+39.09123867503*
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit

[gmx-users] Performance test

2017-11-21 Thread Javier E

Dear users,

I'm doing a performance analysis following this link
http://manual.gromacs.org/documentation/5.1/user-guide/mdrun-performance.html
and wanted to ask:

Is there a "standard" procedure to test performance in gromacs (on single
nodes, one multi-processor CPU)? Following there are some results, the
system is a small polymeric chain of 542 atoms with no water and NPT 100 ps
runs (if more information about md settings are needed please ask):

Running on 1 node with total 4 cores, 8 logical cores
Hardware detected:
  CPU info:
Vendor: GenuineIntel
Brand:  Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
SIMD instructions most likely to fit this hardware: AVX2_256
SIMD instructions selected at GROMACS compile time: AVX2_256

GROMACS version: VERSION 5.1.4
Precision:single
Memory model:64 bit
MPI library:  thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 32)
GPU support:   disabled
OpenCL support:  disabled


gmx mdrun -ntmpi 1 -ntomp # -v -deffnm out.tpr


___
-ntomp  | MPI / OpenMP- | Wall time (s)|ns/day   |  % CPU   |
Note?* |
-
   1|  1/1 | 1075.764  |80.315|
100.0   |   No   |
   2|  1/2 |  619.679   |   139.427   |
200.0   |   Yes |
   3|  1/3 |  458.721   |   188.350   |
299.9   |   Yes |
   4|  1/4 |  356.906   |   242.081   |
399.8   |   Yes |
   5|  1/5 |  433.572   |   199.275   |
499.0   |   Yes |
   6|  1/6 |  378.951   |   227.998   |
598.0   |   Yes |
   7|  1/7 |  355.785   |   242.844   |
693.1   |   Yes |
   8|  1/8 (default)|  328.520   |   262.081   |
779.0   |   No   |
-

*NOTE: The number of threads is not equal to the number of (logical) cores
  and the -pin option is set to auto: will not pin thread to cores.

If (MPI-Threads)*(OpenMP-Threads) = number of threads, does mdrun uses
number of cores= number of threads, and this can be seen in the %CPU usage?

For example, as I installed GROMACS in default, the GMX_OpenMP_MAX_THREAD
is set at 32 threads, but this will never happen with this hardware (4
cores, 8 logical), is this correct? By now I'm re-running the exact same
tests to have at least one replica, and extending the system size and the
and run time. Any suggestions on how to deep further in this kind of tests
are welcome,

Best regards
-- 



*Javier Luque Di Salvo*

Dipartamento di Ingegneria Chimica

Universitá Degli Studi di Palermo
*Viale delle Scienze, Ed. 6*
*90128 PALERMO (PA)*
*+39.09123867503*
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Performance difference when using Gromacs 5.0 with different vector instructions

2017-10-19 Thread Mark Abraham

Yes, thus "always"

Mark

On Fri, 20 Oct 2017 00:19 MING HA  wrote:

> Dear Mark,
>
>
> Thanks for the quick response. So, in general, running using AVX
> instructions will yield
> better performance than SSE4.1?
>
>
> Sincerely,
> Ming
>
> On Thu, Oct 19, 2017 at 5:18 PM, Mark Abraham 
> wrote:
>
> > Hi,
> >
> > In short, yes. See
> > http://manual.gromacs.org/documentation/2016.4/install-
> > guide/index.html#simd-support.
> > You should generally always use a GROMACS binary compiled for the highest
> > SIMD level supported by your hardware. Your mdrun .log file will advise
> you
> > when it observes that you are not.
> >
> > Mark
> >
> > On Thu, Oct 19, 2017 at 7:02 PM MING HA  >
> > wrote:
> >
> > > Hi all,
> > >
> > >
> > > I am running several resources using Gromacs 5.0 to run my simulations.
> > > On some resources, Gromacs is compiled using SSE4.1 SIMD instructions,
> > > while on others AVX_256 or AVX2_256 is used. While I don't find much
> of a
> > > performance difference between AVX_256 and AVX2_256 instructions, there
> > > is a large performance difference between resources that use SSE4.1 and
> > > AVX instructions. Specifically, resources using SSE4.1 are about 2-3x
> > > slower
> > > than those that use AVX.
> > >
> > > I'm kind of new to the SIMD instructions used by Gromacs, so I was
> > > wondering
> > > whether the instruction set is causing the large performance
> difference.
> > >
> > >
> > > Sincerely,
> > > Ming
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at
> > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > > send a mail to gmx-users-requ...@gromacs.org.
> > >
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at http://www.gromacs.org/
> > Support/Mailing_Lists/GMX-Users_List before posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-requ...@gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@gromacs.org.
>
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Performance difference when using Gromacs 5.0 with different vector instructions

2017-10-19 Thread MING HA

Dear Mark,


Thanks for the quick response. So, in general, running using AVX
instructions will yield
better performance than SSE4.1?


Sincerely,
Ming

On Thu, Oct 19, 2017 at 5:18 PM, Mark Abraham 
wrote:

> Hi,
>
> In short, yes. See
> http://manual.gromacs.org/documentation/2016.4/install-
> guide/index.html#simd-support.
> You should generally always use a GROMACS binary compiled for the highest
> SIMD level supported by your hardware. Your mdrun .log file will advise you
> when it observes that you are not.
>
> Mark
>
> On Thu, Oct 19, 2017 at 7:02 PM MING HA 
> wrote:
>
> > Hi all,
> >
> >
> > I am running several resources using Gromacs 5.0 to run my simulations.
> > On some resources, Gromacs is compiled using SSE4.1 SIMD instructions,
> > while on others AVX_256 or AVX2_256 is used. While I don't find much of a
> > performance difference between AVX_256 and AVX2_256 instructions, there
> > is a large performance difference between resources that use SSE4.1 and
> > AVX instructions. Specifically, resources using SSE4.1 are about 2-3x
> > slower
> > than those that use AVX.
> >
> > I'm kind of new to the SIMD instructions used by Gromacs, so I was
> > wondering
> > whether the instruction set is causing the large performance difference.
> >
> >
> > Sincerely,
> > Ming
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-requ...@gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/
> Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@gromacs.org.
>
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Performance difference when using Gromacs 5.0 with different vector instructions

2017-10-19 Thread Mark Abraham

Hi,

In short, yes. See
http://manual.gromacs.org/documentation/2016.4/install-guide/index.html#simd-support.
You should generally always use a GROMACS binary compiled for the highest
SIMD level supported by your hardware. Your mdrun .log file will advise you
when it observes that you are not.

Mark

On Thu, Oct 19, 2017 at 7:02 PM MING HA 
wrote:

> Hi all,
>
>
> I am running several resources using Gromacs 5.0 to run my simulations.
> On some resources, Gromacs is compiled using SSE4.1 SIMD instructions,
> while on others AVX_256 or AVX2_256 is used. While I don't find much of a
> performance difference between AVX_256 and AVX2_256 instructions, there
> is a large performance difference between resources that use SSE4.1 and
> AVX instructions. Specifically, resources using SSE4.1 are about 2-3x
> slower
> than those that use AVX.
>
> I'm kind of new to the SIMD instructions used by Gromacs, so I was
> wondering
> whether the instruction set is causing the large performance difference.
>
>
> Sincerely,
> Ming
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@gromacs.org.
>
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

[gmx-users] Performance difference when using Gromacs 5.0 with different vector instructions

2017-10-19 Thread MING HA

Hi all,


I am running several resources using Gromacs 5.0 to run my simulations.
On some resources, Gromacs is compiled using SSE4.1 SIMD instructions,
while on others AVX_256 or AVX2_256 is used. While I don't find much of a
performance difference between AVX_256 and AVX2_256 instructions, there
is a large performance difference between resources that use SSE4.1 and
AVX instructions. Specifically, resources using SSE4.1 are about 2-3x slower
than those that use AVX.

I'm kind of new to the SIMD instructions used by Gromacs, so I was
wondering
whether the instruction set is causing the large performance difference.


Sincerely,
Ming
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

1 2 >

1 - 100 of 135 matches

Mail list logo