Re: mdrun on 8-core AMD + GTX TITAN (was: Re: [gmx-users] Re: Gromacs-4.6 on two Titans GPUs)

2013-11-12 Thread Dwey Kauffman
Hi Mark and Szilard

Thanks for your both suggestions. They are very helpful.

>
> Neither run had a PP-PME work distribution suitable for the hardware it
> was
> running on (and fixing that for each run requires opposite changes).
> Adding
> a GPU and hoping to see scaling requires that there be proportionately
> more
> GPU work available to do, *and* enough absolute work to do. mdrun tries to
> do this, and reports early in the log file, which is one of the reasons
> Szilard asked to see whole log files - please use a file sharing service
> to
> do that.
>

This task involves GPU calculation. We would not see PP-PME work
distribution.
This is a good hint from the angle of PP-PME work distribution.  And I
guessed that two GPUs' calculations are fast / or no enough work for GPU
calculation, which is aligned with your explanation.
 
Please see logs below again.

 ONE GPU##

http://pastebin.com/B6bRUVSa

 TWO GPUs##
http://pastebin.com/SLAYnejP
 
>
> As you can see, this test was made on the same node regardless of
> > networking.  Can the performance be improved  say 50% more when 2 GPUs
> are
> > used on a general task ?  If yes, how ?
> >
> > >Indeed, as Richard pointed out, I was asking for *full* logs, these
> > >summaries can't tell much, the table above the summary entitled "R E A
> > >L   C Y C L E   A N D   T I M E   A C C O U N T I N G" as well as
> > >other reported information across the log file is what I need to make
> > >an assessment of your simulations' performance.
> >
> > Please see below.
> >
> > >>However, in your case I suspect that the
> > >>bottleneck is multi-threaded scaling on the AMD CPUs and you should
> > >>probably decrease the number of threads per MPI rank and share GPUs
> > >>between 2-4 ranks.
> >
> > After I test all three clusters, I found it may NOT be an issue of AMD
> > cpus.
> > Intel cpus has the SAME scaling issue.
> >
> > However, I am curious as to how you justify the setup of 2-4 ranks
> sharing
> > GPUs ? Can you please explain it a bit more ?
> >
>
> NUMA effects on multi-socket AMD processors are particularly severe; the
> way GROMACS uses OpenMP is not well suited to them. Using a rank (or two)
> per socket will greatly reduce those effects, but introduces different
> algorithmic overhead from the need to do DD and explicitly communicate
> between ranks. (You can see the latter in your .log file snippets below.)
> Also, that means the parcel of PP work available from a rank to give to
> the
> GPU is smaller, which is the opposite of what you'd like for GPU
> performance and/or scaling. We are working on a general solution for this
> and lots of related issues in the post-5.0 space, but there is a very hard
> limitation imposed by the need to amortize the cost of CPU-GPU transfer by
> having lots of PP work available to do.
>

Is this reason why the scaling of two GPUs won't happen because of smaller
PP workload ?
>From the implication, I am wondering if we can increase PP workload through
parameters in a mdp file.  The question is what parameters are mostly
related to PP workload ? Would you please give more specific suggestions ?  


>
> > NOTE: The GPU has >20% more load than the CPU. This imbalance causes
> >   performance loss, consider using a shorter cut-off and a finer PME
> > grid.
> >
>
> This note needs to be addressed before maximum throughput is achieved and
> the question of scaling is worth considering. Ideally, "Wait GPU local"
> should be nearly zero, achieved as suggested above. Note that
> launch+force+mesh+wait is the sum of gpu total! But much of the
> information
> needed is higher up the log file, and the whole question is constrained by
> things like rvdw.
>

>From the note, it clearly suggested a shorter cut-off and a finer PME grid.
I am not sure how to set up a finer PME grid but I am able to set up shorter
cut-offs . However, it is risky to do so based on others' reports.
 
Indeed, I see differences among tests for 1 GPU.
Here cutoffs refer to rlist, rvdw and rcoulomb.  

I found that the smaller values of cutoffs, the faster computations.
The question is how small they can go because  it is interesting to know if
these different cutoffs generate equally good simulations.

As to  two GPUs, when I set up larger cut-offs,  these two GPUs in the same
node had been very busy.   However, the outcome in such a configuration is
worse in terms of ns/day and time.

So what dose "a finer PME grid" mean, with respect to GPU workload ?

You mention the sum of GPU total is  launch + force + mesh + wait.I
thought PME mesh is carried out by CPU instead of GPU. Do I miss something
here ?
I thought  GPU is responsible for the calculation of short-ranged non-bonded
force whereas CPU is responsible for that of bonded and PME long-ranged
force.  Can you clarify it here ?

Also, would rvdw play an important role in improving the performance of GPU
calculation ?


> >
> Unfortunately you didn't copy the GPU timing stuff here! Roug

Re: mdrun on 8-core AMD + GTX TITAN (was: Re: [gmx-users] Re: Gromacs-4.6 on two Titans GPUs)

2013-11-09 Thread Dwey Kauffman
Hi Szilard,

 Thank you very much for your suggestions.

>Actually, I was jumping to conclusions too early, as you mentioned AMD
>"cluster", I assumed you must have 12-16-core Opteron CPUs. If you
>have an 8-core (desktop?) AMD CPU, than you may not need to run more
>than one rank per GPU.

Yes, we do have independent clusters of AMD, AMD opteron, Intel Corei7. All
nodes of three clusters are  installed with (at least) 1 GPU card.   I have
run the same test on these three clusters.

Let's focus on a basic scaling issue:  One GPU  v.s Two GPUs within the same
node of 8-core AMD cpu.
Using 1 GPU, we  can  have a performance of ~32 ns/day.  Using two GPU, we
gain not much more ( ~38.5 ns/day ).  It is about ~20% more performance.
However, this is not really true because in some tests, I also saw only 2-5%
more, which really surprised me.

As you can see, this test was made on the same node regardless of
networking.  Can the performance be improved  say 50% more when 2 GPUs are
used on a general task ?  If yes, how ?  

>Indeed, as Richard pointed out, I was asking for *full* logs, these
>summaries can't tell much, the table above the summary entitled "R E A
>L   C Y C L E   A N D   T I M E   A C C O U N T I N G" as well as
>other reported information across the log file is what I need to make
>an assessment of your simulations' performance.

Please see below.

>>However, in your case I suspect that the
>>bottleneck is multi-threaded scaling on the AMD CPUs and you should
>>probably decrease the number of threads per MPI rank and share GPUs
>>between 2-4 ranks.

After I test all three clusters, I found it may NOT be an issue of AMD cpus.
Intel cpus has the SAME scaling issue.

However, I am curious as to how you justify the setup of 2-4 ranks sharing
GPUs ? Can you please explain it a bit more ?


>You could try running
>mpirun -np 4 mdrun -ntomp 2 -gpu_id 0011
>but I suspect this won't help because your scaling issue

Your guess is correct but why is that ?  it is worse. The more nodes are
involved in a task, the performance is worse.


>> in my
>>experience even reaction field runs don't scale across nodes with 10G
>>ethernet if you have more than 4-6 ranks per node trying to
>>communicate (let alone with PME). 

What dose it mean " let alone with PME" ?  how to do so ? by mdrun ?
I do know " mdrun -npme to specify PME process.

Thank you.

Dwey



### One GPU 

 R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

 Computing: Nodes   Th. Count  Wall t (s) G-Cycles   %
-
 Neighbor search18 11 431.81713863.390 1.6
 Launch GPU ops.18501 472.90615182.556 1.7
 Force  185011328.61142654.785 4.9
 PME mesh   18501   11561.327   371174.09042.8
 Wait GPU local 185016888.008   221138.11125.5
 NB X/F buffer ops. 189911216.49939055.455 4.5
 Write traj.18   1030  12.741  409.039 0.0
 Update 185011696.35854461.226 6.3
 Constraints185011969.72663237.647 7.3
 Rest   11458.82046835.133 5.4
-
 Total  1   27036.812   868011.431   100.0
-
-
 PME spread/gather  18   10026975.086   223933.73925.8
 PME 3D-FFT 18   10023928.259   126115.97614.5
 PME solve  18501 636.48820434.327 2.4
-
 GPU timings
-
 Computing: Count  Wall t (s)  ms/step   %
-
 Pair list H2D 11  43.4350.434 0.2
 X / q H2D501 567.1680.113 2.8
 Nonbonded F kernel   400   14174.3163.54470.8
 Nonbonded F+ene k.904314.4384.79421.5
 Nonbonded F+ene+prune k.  11 572.3705.724 2.9
 F D2H501 358.1200.072 1.8
-
 Total  20029.8464.006   100.0
---

[gmx-users] Re: Hardware for best gromacs performance?

2013-11-05 Thread Dwey Kauffman
Hi Szilard,

 Thanks.

>From Timo's benchmark, 
1  node142 ns/day 
2  nodes FDR14 218 ns/day 
4  nodes FDR14 257 ns/day 
8  nodes FDR14 326 ns/day 


It looks like a infiniband network is "required" in order to scale up when
running a task across nodes. Is it correct ?   


Dwey


--
View this message in context: 
http://gromacs.5086.x6.nabble.com/Hardware-for-best-gromacs-performance-tp5012124p5012280.html
Sent from the GROMACS Users Forum mailing list archive at Nabble.com.
-- 
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


[gmx-users] Re: Gromacs-4.6 on two Titans GPUs

2013-11-05 Thread Dwey Kauffman
Hi Szilard,

   Thanks for your suggestions. I am  indeed aware of this page. In a 8-core
AMD with 1GPU, I am very happy about its performance. See below. My
intention is to obtain a even better one because we have multiple nodes.

### 8 core AMD with  1 GPU,
Force evaluation time GPU/CPU: 4.006 ms/2.578 ms = 1.554
For optimal performance this ratio should be close to 1!


NOTE: The GPU has >20% more load than the CPU. This imbalance causes
  performance loss, consider using a shorter cut-off and a finer PME
grid.

   Core t (s)   Wall t (s)(%)
   Time:   216205.51027036.812  799.7
 7h30:36
 (ns/day)(hour/ns)
Performance:   31.9560.751

### 8 core AMD with 2 GPUs

   Core t (s)   Wall t (s)(%)
   Time:   178961.45022398.880  799.0
 6h13:18
 (ns/day)(hour/ns)
Performance:   38.5730.622
Finished mdrun on node 0 Sat Jul 13 09:24:39 2013


>However, in your case I suspect that the 
>bottleneck is multi-threaded scaling on the AMD CPUs and you should 
>probably decrease the number of threads per MPI rank and share GPUs 
>between 2-4 ranks.


OK but can you give a example of mdrun command ? given a 8 core AMD with 2
GPUs.
I will try to run it again.


>Regarding scaling across nodes, you can't expect much from gigabit 
>ethernet - especially not from the cheaper cards/switches, in my 
>experience even reaction field runs don't scale across nodes with 10G 
>ethernet if you have more than 4-6 ranks per node trying to 
>communicate (let alone with PME). However, on infiniband clusters we 
>have seen scaling to 100 atoms/core (at peak). 

>From your comments, it sounds like a cluster of AMD cpus is difficult to
scale across nodes in our current setup.

Let's assume we install Infiniband (20 or 40GB/s) in the same system of 16
nodes of 8 core AMD with 1 GPU only. Considering the same AMD system, what
is a good way to obtain better performance  when we run a task across nodes
? in other words, what dose mudrun_mpi look like ?

Thanks,
Dwey




--
View this message in context: 
http://gromacs.5086.x6.nabble.com/Gromacs-4-6-on-two-Titans-GPUs-tp5012186p5012279.html
Sent from the GROMACS Users Forum mailing list archive at Nabble.com.
-- 
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


[gmx-users] Re: Hardware for best gromacs performance?

2013-11-05 Thread Dwey Kauffman
Hi Timo,

  Can you provide a benchmark with  "1"  Xeon E5-2680 with   "1"  Nvidia
k20x GPGPU on the same test of 29420 atoms ?

Are these two GPU cards (within the same node) connected by a SLI (Scalable
Link Interface) ? 

Thanks,
Dwey

--
View this message in context: 
http://gromacs.5086.x6.nabble.com/Hardware-for-best-gromacs-performance-tp5012124p5012276.html
Sent from the GROMACS Users Forum mailing list archive at Nabble.com.
-- 
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


[gmx-users] Re: Gromacs-4.6 on two Titans GPUs

2013-11-05 Thread Dwey
Hi Mike,


I have similar configurations except a cluster of AMD-based linux
platforms with 2 GPU cards.

Your  suggestion works. However, the performance of 2 GPU  discourages
me  because , for example,  with 1 GPU, our computer node can easily
obtain a  simulation of 31ns/day for a protein of 300 amino acids but
with 2 GPUs, it goes as far as 38 ns/day. I am very curious as to  why
 the performance of 2 GPUs is under expectation. Is there any overhead
that we should pay attention to ?  Note that these 2GPU cards are
linked by a SLI bridge within the same node.

Since the computer nodes of our cluster have at least one GPU  but
they are connected by slow network cards ( 1GB/sec), unfortunately, I
reasonably doubt that the performance will not be proportional to the
total number of  GPU cards.  I am wondering if you have any suggestion
about a cluster of GPU nodes.   For example, will a infiniband
networking help increase a final performance when we execute a mpi
task ? or what else ?  or forget about mpi and use single GPU instead.

Any suggestion is highly appreciated.
Thanks.

Dwey

> Date: Tue, 5 Nov 2013 16:20:39 +0100
> From: Mark Abraham 
> Subject: Re: [gmx-users] Gromacs-4.6 on two Titans GPUs
> To: Discussion list for GROMACS users 
> Message-ID:
> 
> Content-Type: text/plain; charset=ISO-8859-1
>
> On Tue, Nov 5, 2013 at 12:55 PM, James Starlight 
> wrote:
>
>> Dear Richard,
>>
>>
>> 1)  mdrun -ntmpi 1 -ntomp 12 -gpu_id 0 -v  -deffnm md_CaM_test
>> gave me performance about 25ns/day for the explicit solved system consisted
>> of 68k atoms (charmm ff. 1.0 cutoofs)
>>
>> gaves slightly worse performation in comparison to the 1)
>>
>>
> Richard suggested
>
> mdrun -ntmpi 2 -ntomp 6 -gpu_id 01 -v  -deffnm md_CaM_test,
>
> which looks correct to me. -ntomp 6 is probably superfluous
>
> Mark
>
>
>> finally
>>
>> 3) mdrun -deffnm md_CaM_test
>> running in the same regime as in the 2) so its also gave me 22ns/day for
>> the same system.
>>
>> How the efficacy of using of dual-GPUs could be increased?
>>
>> James
>>
>>
>> 2013/11/5 Richard Broadbent 
>>
>> > Dear James,
>> >
>> >
>> > On 05/11/13 11:16, James Starlight wrote:
>> >
>> >> My suggestions:
>> >>
>> >> 1) During compilstion using -march=corei7-avx-i I have obtained error
>> that
>> >> somethng now found ( sorry I didnt save log) so I compile gromacs
>> without
>> >> this flag
>> >>
>> >> 2) I have twice as better performance using just 1 gpu by means of
>> >>
>> >> mdrun -ntmpi 1 -ntomp 12 -gpu_id 0 -v  -deffnm md_CaM_test
>> >>
>> >> than using of both gpus
>> >>
>> >> mdrun -ntmpi 2 -ntomp 12 -gpu_id 01 -v  -deffnm md_CaM_test
>> >>
>> >> in the last case I have obtained warning
>> >>
>> >> WARNING: Oversubscribing the available 12 logical CPU cores with 24
>> >> threads.
>> >>   This will cause considerable performance loss!
>> >>
>> >>  here you are requesting 2 thread mpi processes each with 12 openmp
>> > threads, hence a total of 24 threads however even with hyper threading
>> > enabled there are only 12 threads on your machine. Therefore, only
>> allocate
>> > 12. Try
>> >
>> > mdrun -ntmpi 2 -ntomp 6 -gpu_id 01 -v  -deffnm md_CaM_test
>> >
>> > or even
>> >
>> > mdrun -v  -deffnm md_CaM_test
>> >
>> > I believe it should autodetect the GPUs and run accordingly for details
>> of
>> > how to use gromacs with mpi/thread mpi openmp and GPUs see
>> >
>> > http://www.gromacs.org/Documentation/Acceleration_and_parallelization
>> >
>> > Which describes how to use these systems
>> >
>> > Richard
>> >
>> >
>> >  How it could be fixed?
>> >> All gpu are recognized correctly
>> >>
>> >>
>> >> 2 GPUs detected:
>> >>#0: NVIDIA GeForce GTX TITAN, compute cap.: 3.5, ECC:  no, stat:
>> >> compatible
>> >>#1: NVIDIA GeForce GTX TITAN, compute cap.: 3.5, ECC:  no, stat:
>> >> compatible
>> >>
>> >>
>> >> James
>> >>
>> >>
>> >> 2013/11/4 Szilárd Páll 
>> >>
>> >>  You can use the "-march=native" flag with gcc to optimize for the CPU
>> >>> your are building on or e.g. -march=corei7-avx-i for Intel Ivy Bridge
>>

[gmx-users] RE: average pressure of a system

2013-09-12 Thread Dwey Kauffman
>> I carried out independent NPT processes with different tau_p values =
>
>> 1.5,
>> 1.0 and 0.5
>>
>>
>>
>> ## tau_p 1.5
>> Energy  Average   Err.Est.   RMSD  Tot-Drift
>> ---
>> 
>> Pressure2.628592.6 185.682.67572
>> (bar)
>>
>>
>> ## tau_p 1.0
>> Energy  Average   Err.Est.   RMSD  Tot-Drift
>> ---
>> 
>> Pressure   0.8867691.7187.737  0.739
>> (bar)
>>
>>
>>
>> ## tau_p 0.5
>> Energy  Average   Err.Est.   RMSD  Tot-Drift
>> ---
>> 
>> Pressure2.399112.2185.708 6.8189
>> (bar)
>>
>> ##
>>
>> It is clear that when tau_p =1.0,  average pressure of the system
>> (=0.89
>> bar) is close to ref_p =1.0 bar
>> However, it is unclear to me as to how to assign a good value to tau_p
>> in
>> order to reach at a close value of ref_p. As shown above, both of the
>> average pressures  as  tau_p =1.5 and 0.5 are much higher than that as
>> tau_p
>> =1.0.  A smaller tau_p may or may not help.
> As has been mentioned a number of times 0.9 +- 190 and 2.3 +- 190 are not
> statistically different.  If you use that in a publication then any
> conclusions based on that will be rejected.

Statistically, I understood the indistinguishable difference between the
resulted average pressures. Here, I altered tau_p values to determine if
tau_p helps stabilize a desired value of average pressure.  

>
> To demonstrate to yourself how variable the pressure is, the tau_p=1 run,
> run the pressure analysis again using g_analyze, but using only the first
> half and the last half of the trajectory.  You will find that the average
> values for both parts of the trajectory are not the same.
>

Thank you for the suggestion of applying g_analyze to trajectory.

>> Another issue caused by system pressure  is about pbc box size. Since I
>> use
>> pressure coupling, the box size is not fixed such that protein moved
>> away
>> the center of membrane for a long simulation like 30 ns. Box size
>
> That is not due to the pressure coupling. 

The changed box-size is problematic  because I see that molecules are split.  
During NPT process, the box of dimensions (7.12158   7.14945   9.0) 
changed over time to the end  at that of dimensions ( 6.43804   6.46323  
8.28666).This is because of pressure coupling. See noted also
http://www.gromacs.org/Documentation/Errors#The_cut-off_length_is_longer_than_half_the_shortest_box_vector_or_longer_than_the_smallest_box_diagonal_element._Increase_the_box_size_or_decrease_rlist
 


> Motion of the protein within the
> box is simply due to diffusion etc.  Also remember, that you have in
> effect
> an infinite repeating box in all directions, so the "center" of the box is
> arbitrary. 

If so, how to make a membrane protein relatively fixed (embedded) in bilayer
wthout escaping away during simulation ?  In fact, this membrane has
been embedded in membrane by g_membed.  Due to  diffusion ?? the protein
moved away from bilayer and escaped toward extracellular space.  

Is there a way to fix it
or only allow this protein diffusing in xy plane instead of z direction ?


> If you want the protein to remain in the center for
> visualisation purposes, then you do post processing on the box using
> trjconv.
>
Thanks, but this dose not change the fact that protein moved away bilayer
during a long simulation. 

>> changes
>> significantly during production MD. Is there a way to fix the box size
>> at
>> the very beginning ? although turning off pressure coupling will make
>> box
>> size fixed.
>
> If you want fixed box dimensions / volume then you perform NVT.  But that
> will not help with either issues above.
>
Right.  The box of dimensions remains unchanged if pressure coupling is
removed in production MD. However, can it be justified in a system of 
membrane protein ?   because the purpose of pressure coupling is to
stabilize the pressure and density.  For example, for 10 ns simulation, the
average pressure of this system is -5.55 bar, which is less convincing.

Energy  Average   Err.Est.   RMSD  Tot-Drift
---
Pressure   -5.555722.6155.552   0.846162  (bar)


Thanks.

[gmx-users] RE: average pressure of a system

2013-09-11 Thread Dwey Kauffman
Justin Lemkul wrote
> On 9/11/13 12:12 AM, Dwey Kauffman wrote:
>>> True, but thermostats allow temperatures to oscillate on the order of a
>>> few
>> K,
>>> and that doesn't happen on the macroscopic level either.  Hence the
>>> small
>>> disconnect between a system that has thousands of atoms and one that has
>>> millions or trillions.  Pressure fluctuations decrease on the order of
>> sqrt(N),
>>> so the system size itself is a determining factor for the pressure
>> fluctuations.
>>>   As previous discussions have rightly concluded, pressure is a somewhat
>>> ill-defined quantity in molecular systems like these.
>>
>> Dose it also imply that it is not a good idea to study the relationship
>> between dimer (multimer) dissociation and  macroscopic pressure in this
>> case
>> ?  (due to the ill defined pressure).
>>
> 
> I would simply think it would be very hard to draw any meaningful
> conclusions if 
> they depend on a microscopic quantity that varies so strongly.
> 
>> It is hard to be justified if I assign a set of various ref_p= 0.7, 0.8,
>> 0.9, 1.0, 1.1, 1.2 , perform independent simulations, and then obtain
>> outcomes of targeted quantities for comparison.
>>
> 
> As with the original issue, I would find it hard to believe that any of
> the 
> differences observed in such a setup would be meaningful.  Is 0.7 ± 100
> actually 
> different from 1.2 ± 100?
> 
>>>
>>> You could try altering tau_p, but I doubt there is any value in doing
>>> so.
>>
>> I would give it a try.
>>
> 
> This will really only change the relaxation time.  Smaller values of tau_p
> may 
> improve the average slightly, but may also (more likely) lead to
> instability, 
> especially with Parrinello-Rahman.

I carried out independent NPT processes with different tau_p values = 1.5,
1.0 and 0.5



## tau_p 1.5
Energy  Average   Err.Est.   RMSD  Tot-Drift
---
Pressure2.628592.6 185.682.67572  (bar)


## tau_p 1.0
Energy  Average   Err.Est.   RMSD  Tot-Drift
---
Pressure   0.8867691.7187.737  0.739  (bar)



## tau_p 0.5
Energy  Average   Err.Est.   RMSD  Tot-Drift
---
Pressure2.399112.2185.708 6.8189  (bar)

##

It is clear that when tau_p =1.0,  average pressure of the system (=0.89
bar) is close to ref_p =1.0 bar
However, it is unclear to me as to how to assign a good value to tau_p in
order to reach at a close value of ref_p. As shown above, both of the
average pressures  as  tau_p =1.5 and 0.5 are much higher than that as tau_p
=1.0.  A smaller tau_p may or may not help.


Another issue caused by system pressure  is about pbc box size. Since I use
pressure coupling, the box size is not fixed such that protein moved away
the center of membrane for a long simulation like 30 ns. Box size changes
significantly during production MD. Is there a way to fix the box size  at
the very beginning ? although turning off pressure coupling will make box
size fixed. 

Best regards,

Dwey




--
View this message in context: 
http://gromacs.5086.x6.nabble.com/average-pressure-of-a-system-tp5011095p5011137.html
Sent from the GROMACS Users Forum mailing list archive at Nabble.com.
--
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


[gmx-users] RE: average pressure of a system

2013-09-10 Thread Dwey Kauffman
>True, but thermostats allow temperatures to oscillate on the order of a few
K, 
>and that doesn't happen on the macroscopic level either.  Hence the small 
>disconnect between a system that has thousands of atoms and one that has 
>millions or trillions.  Pressure fluctuations decrease on the order of
sqrt(N), 
>so the system size itself is a determining factor for the pressure
fluctuations. 
>  As previous discussions have rightly concluded, pressure is a somewhat 
>ill-defined quantity in molecular systems like these.

Dose it also imply that it is not a good idea to study the relationship
between dimer (multimer) dissociation and  macroscopic pressure in this case
?  (due to the ill defined pressure).

It is hard to be justified if I assign a set of various ref_p= 0.7, 0.8,
0.9, 1.0, 1.1, 1.2 , perform independent simulations, and then obtain
outcomes of targeted quantities for comparison.  

>
>You could try altering tau_p, but I doubt there is any value in doing so.

I would give it a try.

Thanks for the hint.

Dwey


www interface or send it to gmx-users-request@.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists




--
View this message in context: 
http://gromacs.5086.x6.nabble.com/average-pressure-of-a-system-tp5011095p5011102.html
Sent from the GROMACS Users Forum mailing list archive at Nabble.com.
-- 
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


[gmx-users] RE: average pressure of a system

2013-09-10 Thread Dwey Kauffman
Hi Dallas and Justin,

  Thanks for the reply. Yes, I did plot pressure changes over time by
g_energy and I have been aware of the note  at
http://www.gromacs.org/Documentation/Terminology/Pressure
  
I am concerned about the average pressure is because our experiment shows
that our target membrane protein is a hexamer and our observation is that
variation of system pressure seems causing hexamer or dimer dissociation.
Also, it is quite sensitive to pressure fluctuation. Such a fluctuation of
pressure certainly brings my attention in this specific case, because life
does not exist at large variations of system pressure.
If not because of multimer dissociation likely caused by pressure
fluctuation, I agree with both of you.

I also run longer simulations like 20 ns and 30 ns

### 20 ns

Energy  Average   Err.Est.   RMSD  Tot-Drift
---
Pressure   0.886396   0.84162.6551.38476  (bar)

## 30 ns

Energy  Average   Err.Est.   RMSD  Tot-Drift
---
Pressure1.69086   0.58162.8793.35668  (bar)


Running longer simulations seems to me that the improvement of system
pressure is not helpful too much.
If I need to modify mdp file, what it would be ?



Many thanks,

Dwey


My mdp file for NPT  is used in the simulation like 

define  = -DPOSRES  

integrator  = md
nsteps  = 50
dt  = 0.002 

nstxout = 100   
nstvout = 100   
nstenergy   = 100   
nstlog  = 100   

continuation= yes   
constraint_algorithm = lincs
constraints = all-bonds 
lincs_iter  = 1 
lincs_order = 4 

ns_type = grid  
nstlist = 5 
rlist   = 1.2   
rcoulomb= 1.2   
rvdw= 1.2   

coulombtype = PME   
pme_order   = 4 
fourierspacing  = 0.16  

tcoupl  = Nose-Hoover   
tc-grps = Protein DPPC  SOL_CL  
tau_t   = 0.5   0.5 0.5 
ref_t   = 323   323 323 

pcoupl  = Parrinello-Rahman 
pcoupltype  = semiisotropic 
tau_p   = 5.0   
ref_p   = 1.0   1.0 
compressibility = 4.5e-54.5e-5  

pbc = xyz   

DispCorr= EnerPres  

gen_vel = no


nstcomm = 1
comm-mode   = Linear
comm-grps   = Protein_DPPC SOL_CL

refcoord_scaling = com
cutoff-scheme = Verlet




 



  
  



  



--
View this message in context: 
http://gromacs.5086.x6.nabble.com/average-pressure-of-a-system-tp5011095p5011098.html
Sent from the GROMACS Users Forum mailing list archive at Nabble.com.
-- 
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


[gmx-users] average pressure of a system

2013-09-10 Thread Dwey
Hi All.

   I work on a simulation of a membrane protein in a model membrane
(DPPC) with a total of 26,859 atoms.  In a step of NPT equilibration,
my mdp file is used for a 100 ps NPT and it has a reference pressure
of 1 bar.

 At the end of simulation,  I obtained a result of the average
pressure of the system ( ~ -0.90 bar)  like

Energy  Average   Err.Est.   RMSD  Tot-Drift
---
Pressure-0.90492.6186.30413.8503  (bar)


I am wondering if  I should keep running this NPT process until the
average pressure of the system reaches ~ 1.0 bar.

If so, how long ( how many steps) ?

Or should I modify mdp file  ? Can anyone provide some suggestions ?


Likewise, at the end of Production MD for 1 ns, I obtained  average
pressure (~2.23 bar)  of the system like

Energy  Average   Err.Est.   RMSD  Tot-Drift
---
Pressure2.233492.1 164.9710.9381  (bar)

Should I run a longer simulation until the average pressure reaches ~
1.0 bar,  although average energy, average temperature (323 K),
average density (1022 kg/m^3) are already at  the desired values ?

How should I do to stabilize average pressure  at a desired value (~1 bar) ?


Thanks for any input.
Dwey
-- 
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


[gmx-users] Re: GPU version of Gromacs

2013-08-19 Thread Dwey Kauffman
Hi Grita,

   Yes it is.  You need to re-compile a GPU version of Gromacs from source
codes.  You also need to use Verlet cutoff-scheme. That is,  place a new
line like

cutoff-scheme = Verlet
in your mdp file.

Finally, run the GPU version of mdrun,  adding a parameter   -gpu_id 0if
you have one GPU in your box.

hope this helps.

Dwey







  



--
View this message in context: 
http://gromacs.5086.x6.nabble.com/GPU-version-of-Gromacs-tp5010581p5010606.html
Sent from the GROMACS Users Forum mailing list archive at Nabble.com.
-- 
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


[gmx-users] GPU / CPU load imblance

2013-06-25 Thread Dwey
Hi gmx-users,

I used  8-cores AMD CPU  with a GTX680 GPU [ with 1536 CUDA Cores]  to
run an example of Umbrella Sampling provided by Justin.
I am happy that GPU acceleration indeed helps me reduce significant time (
from 34 hours to 7 hours)  of computation in this example.
However, I found there was a NOTE on the screen like

++
 The GPU has >20% more load than the CPU. This imbalance causes
performance loss, consider using a shorter cut-off and a finer PME grid
 ++

Given a 20% load imbalance, I wonder if someone can give suggestions as to
how to avoid performance loss in terms of hardware (GPU/CPU)
improvement  or  the modification of  mdp file (see below).

In terms of hardware,  dose this NOTE suggest that I should use a
higher-capacity GPU like GTX 780 [ with 2304 CUDA Cores] to balance load or
catch up speed  ?
If so,   can it help by adding  another card with  GTX 680 GPU in the same
box ?  but will it cause GPU/CPU imbalance load  again, which two GPU keep
waiting for 8-cores CPU  ?

Second,

++
Force evaluation time GPU/CPU: 4.006 ms/2.578 ms = 1.554
For optimal performance this ratio should be close to 1
++

I have no idea how this is evaluated by 4.006 ms and 2.578 ms for GPU and
CPU time, respectively.

It will be very helpful to modify  the attached mdp for a better
load balance between GPU and CPU.

I appreciate kind advice and hints to improve this mdp file.

Thanks,

Dwey

### courtesy  to  Justin #

title   = Umbrella pulling simulation
define  = -DPOSRES_B
; Run parameters
integrator  = md
dt  = 0.002
tinit   = 0
nsteps  = 500   ; 10 ns
nstcomm = 10
; Output parameters
nstxout = 5 ; every 100 ps
nstvout = 5
nstfout = 5000
nstxtcout   = 5000  ; every 10 ps
nstenergy   = 5000
; Bond parameters
constraint_algorithm= lincs
constraints = all-bonds
continuation= yes
; Single-range cutoff scheme
nstlist = 5
ns_type = grid
rlist   = 1.4
rcoulomb= 1.4
rvdw= 1.4
; PME electrostatics parameters
coulombtype = PME
fourierspacing  = 0.12
fourier_nx  = 0
fourier_ny  = 0
fourier_nz  = 0
pme_order   = 4
ewald_rtol  = 1e-5
optimize_fft= yes
; Berendsen temperature coupling is on in two groups
Tcoupl  = Nose-Hoover
tc_grps = Protein   Non-Protein
tau_t   = 0.5   0.5
ref_t   = 310   310
; Pressure coupling is on
Pcoupl  = Parrinello-Rahman
pcoupltype  = isotropic
tau_p   = 1.0
compressibility = 4.5e-5
ref_p   = 1.0
refcoord_scaling = com
; Generate velocities is off
gen_vel = no
; Periodic boundary conditions are on in all directions
pbc = xyz
; Long-range dispersion correction
DispCorr= EnerPres
cutoff-scheme   = Verlet
; Pull code
pull= umbrella
pull_geometry   = distance
pull_dim= N N Y
pull_start  = yes
pull_ngroups= 1
pull_group0 = Chain_B
pull_group1 = Chain_A
pull_init1  = 0
pull_rate1  = 0.0
pull_k1 = 1000  ; kJ mol^-1 nm^-2
pull_nstxout= 1000  ; every 2 ps
pull_nstfout= 1000  ; every 2 ps
-- 
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


[gmx-users] Re: free energy calculations of methane in water computed by GMX ver 4.5.7 and ver 4.6.2

2013-06-22 Thread Dwey
Hi Justin,

Thank you  for sharing your experience with me.

As suggested, Gromacs ver 4.5.5 is compiled  within the same Linux box
and I am able to reproduce a similar result ( DG= -9.30 kJmol-1).
Gromacs ver 4.5.5 and 4.6.2 both are compiled from source codes, while
Gromacs ver 4.5.7 as reported earlier is installed with pre-compiled
binary files.
Thus, Gromacs ver 4.5.7 is now re-compiled again by myself. Below are
shown results.

In addition,  L-BFGS mdp file works  well for all versions after it is
modified by  adding "define = -DFLEXIBLE".



Gromacs ver 4.5.5 (compiled from source codes)

point  0.050 -  0.100,   DG -0.04 +/-  0.00
point  0.100 -  0.150,   DG -0.08 +/-  0.00
point  0.150 -  0.200,   DG -0.13 +/-  0.01
point  0.200 -  0.250,   DG -0.19 +/-  0.00
point  0.250 -  0.300,   DG -0.27 +/-  0.00
point  0.300 -  0.350,   DG -0.35 +/-  0.00
point  0.350 -  0.400,   DG -0.43 +/-  0.01
point  0.400 -  0.450,   DG -0.55 +/-  0.01
point  0.450 -  0.500,   DG -0.71 +/-  0.01
point  0.500 -  0.550,   DG -0.94 +/-  0.00
point  0.550 -  0.600,   DG -1.25 +/-  0.00
point  0.600 -  0.650,   DG -1.40 +/-  0.01
point  0.650 -  0.700,   DG -1.29 +/-  0.01
point  0.700 -  0.750,   DG -1.01 +/-  0.00
point  0.750 -  0.800,   DG -0.67 +/-  0.00
point  0.800 -  0.850,   DG -0.36 +/-  0.00
point  0.850 -  0.900,   DG -0.09 +/-  0.00
point  0.900 -  0.950,   DG  0.14 +/-  0.00
point  0.950 -  1.000,   DG  0.33 +/-  0.00

total  0.050 -  1.000,   DG -9.30 +/-  0.03




Gromacs ver 4.5.7 (compiled from source codes)

lambda  0.000 -  0.050,   DG  0.05 +/-  0.00
lambda  0.050 -  0.100,   DG  0.02 +/-  0.00
lambda  0.100 -  0.150,   DG -0.04 +/-  0.00
lambda  0.150 -  0.200,   DG -0.09 +/-  0.00
lambda  0.200 -  0.250,   DG -0.14 +/-  0.01
lambda  0.250 -  0.300,   DG -0.21 +/-  0.01
lambda  0.300 -  0.350,   DG -0.29 +/-  0.01
lambda  0.350 -  0.400,   DG -0.37 +/-  0.00
lambda  0.400 -  0.450,   DG -0.48 +/-  0.01
lambda  0.450 -  0.500,   DG -0.65 +/-  0.01
lambda  0.500 -  0.550,   DG -0.89 +/-  0.01
lambda  0.550 -  0.600,   DG -1.19 +/-  0.01
lambda  0.600 -  0.650,   DG -1.34 +/-  0.01
lambda  0.650 -  0.700,   DG -1.23 +/-  0.00
lambda  0.700 -  0.750,   DG -0.95 +/-  0.01
lambda  0.750 -  0.800,   DG -0.62 +/-  0.00
lambda  0.800 -  0.850,   DG -0.31 +/-  0.00
lambda  0.850 -  0.900,   DG -0.03 +/-  0.00
lambda  0.900 -  0.950,   DG  0.19 +/-  0.00
lambda  0.950 -  1.000,   DG  0.38 +/-  0.00

total   0.000 -  1.000,   DG -8.19 +/-  0.03



After comparing the output from ver 4.5.5 with that from ver 4.5.7, I
do find quirky information from  the g_bar in ver 4.5.7.
In ver 4.5.7, for example, it shows that  ver 4.5.7 dose not give
information of dH/dl (see below)
Moreover, I also try g_bar of ver 4.5.5 or 4.6.2 to process the output
data (such as  md*.xvg  generated by ver 4.5.7).  The result is
unchanged and  DG ( -8.19 kJmol-1)  remains incorrect.

Thanks,
Dwey


++
g_bar ver 4.5.7,

md0.05.xvg: 0.0 - 5000.0; lambda = 0.050
foreign lambdas: 0.050 (250001 pts) 0.000 (250001 pts) 0.100 (250001 pts)

md0.15.xvg: 0.0 - 5000.0; lambda = 0.150
foreign lambdas: 0.150 (250001 pts) 0.100 (250001 pts) 0.200 (250001 pts)

md0.1.xvg: 0.0 - 5000.0; lambda = 0.100
foreign lambdas: 0.100 (250001 pts) 0.050 (250001 pts) 0.150 (250001 pts)
.
.
.


g_bar ver 4.5.5 or  4.6.2,

md0.05.xvg: 0.0 - 5000.0; lambda = 0.05
dH/dl & foreign lambdas:
dH/dl (250001 pts)
delta H to 0 (250001 pts)
delta H to 0.1 (250001 pts)


md0.15.xvg: 0.0 - 5000.0; lambda = 0.15
dH/dl & foreign lambdas:
dH/dl (250001 pts)
delta H to 0.1 (250001 pts)
delta H to 0.2 (250001 pts)


md0.1.xvg: 0.0 - 5000.0; lambda = 0.1
dH/dl & foreign lambdas:
dH/dl (250001 pts)
delta H to 0.05 (250001 pts)
delta H to 0.15 (250001 pts)
.
.
.










> On 6/21/13 11:07 AM, Dwey wrote:
>> Hi gmx-users,
>>
>>   I almost  reproduced  free energy calculations of methane in water on
>> Justin's website. First of all, I am able to follow the workflow of
>> computing solvation free energy  for several times with Gromacs version
>> 4.5.7 and version 4.6.2 installed in two identical Linux boxes.
>>
>> However.  the output results of GMX ver 4.5.7 and ver 4.6.2 show different
>> values of dG
>>
>> ##
>> GMX Ver. 4.5.7:
>>
>> lambda  0.000 -  0.050,   DG  0.05 +/-  0.00
>> lambda  0.050 -  0.100,   DG  0.01 +/-  0.00
>> lambda  0.100 -  0.150,   DG -0.03 +/-  0.01
>> lambda  0.150 -  0.200,   DG -0.08 +/-  0.00
>> lambda  0.200 -  0.250,   DG

[gmx-users] free energy calculations of methane in water computed by GMX ver 4.5.7 and ver 4.6.2

2013-06-21 Thread Dwey
Hi gmx-users,

 I almost  reproduced  free energy calculations of methane in water on
Justin's website. First of all, I am able to follow the workflow of
computing solvation free energy  for several times with Gromacs version
4.5.7 and version 4.6.2 installed in two identical Linux boxes.

However.  the output results of GMX ver 4.5.7 and ver 4.6.2 show different
values of dG

##
GMX Ver. 4.5.7:

lambda  0.000 -  0.050,   DG  0.05 +/-  0.00
lambda  0.050 -  0.100,   DG  0.01 +/-  0.00
lambda  0.100 -  0.150,   DG -0.03 +/-  0.01
lambda  0.150 -  0.200,   DG -0.08 +/-  0.00
lambda  0.200 -  0.250,   DG -0.15 +/-  0.00
lambda  0.250 -  0.300,   DG -0.21 +/-  0.01
lambda  0.300 -  0.350,   DG -0.28 +/-  0.00
lambda  0.350 -  0.400,   DG -0.38 +/-  0.00
lambda  0.400 -  0.450,   DG -0.50 +/-  0.01
lambda  0.450 -  0.500,   DG -0.66 +/-  0.01
lambda  0.500 -  0.550,   DG -0.90 +/-  0.01
lambda  0.550 -  0.600,   DG -1.21 +/-  0.01
lambda  0.600 -  0.650,   DG -1.37 +/-  0.01
lambda  0.650 -  0.700,   DG -1.25 +/-  0.01
lambda  0.700 -  0.750,   DG -0.96 +/-  0.00
lambda  0.750 -  0.800,   DG -0.62 +/-  0.00
lambda  0.800 -  0.850,   DG -0.31 +/-  0.00
lambda  0.850 -  0.900,   DG -0.03 +/-  0.00
lambda  0.900 -  0.950,   DG  0.20 +/-  0.00
lambda  0.950 -  1.000,   DG  0.38 +/-  0.00

total   0.000 -  1.000,   DG -8.31 +/-  0.04

##

GMX ver. 4.6.2 ver

point  0.000 -  0.050,   DG  0.00 +/-  0.00
point  0.050 -  0.100,   DG -0.03 +/-  0.00
point  0.100 -  0.150,   DG -0.08 +/-  0.00
point  0.150 -  0.200,   DG -0.14 +/-  0.00
point  0.200 -  0.250,   DG -0.20 +/-  0.00
point  0.250 -  0.300,   DG -0.27 +/-  0.00
point  0.300 -  0.350,   DG -0.34 +/-  0.00
point  0.350 -  0.400,   DG -0.43 +/-  0.01
point  0.400 -  0.450,   DG -0.54 +/-  0.01
point  0.450 -  0.500,   DG -0.71 +/-  0.01
point  0.500 -  0.550,   DG -0.94 +/-  0.01
point  0.550 -  0.600,   DG -1.24 +/-  0.02
point  0.600 -  0.650,   DG -1.39 +/-  0.02
point  0.650 -  0.700,   DG -1.28 +/-  0.01
point  0.700 -  0.750,   DG -1.00 +/-  0.00
point  0.750 -  0.800,   DG -0.67 +/-  0.00
point  0.800 -  0.850,   DG -0.36 +/-  0.00
point  0.850 -  0.900,   DG -0.09 +/-  0.00
point  0.900 -  0.950,   DG  0.14 +/-  0.00
point  0.950 -  1.000,   DG  0.33 +/-  0.00

total  0.000 -  1.000,   DG -9.23 +/-  0.03

##


The value of DG (= -9.23 kJ mol -1)  by GMX ver 4.6.2 is very close to that
by Justin or Shirts et.al of 2.24 kcal mol-1  (~ -9.36 kJ mol -1), while
that of GMX ver 4.5.7 ( =-8.31 kJ mol-1) is far away.

I wonder if someone has similar experience to explain the inconsistency
between the outputs from ver 4.5.7 (~-8.31 kJmol-1)  and ver 4.6.2 (-9.23
kJmol-1 ), despite  the values of dG computed by others.


Second,  the reason why I almost reproduced dG  but not completely is
 because I removed the step of  L- BFGS minimization.  I was not able to
pass this step  for both GMX versions.  Here shows the warning.



Fatal error:
The combination of constraints and L-BFGS minimization is not implemented.
Either do not use constraints, or use another minimizer (e.g. steepest
descent).



L-BFGS mdp file can be found  at
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin/gmx-tutorials/free_energy/Files/em_l-bfgs.mdp

Again, I appreciate advice or  a hint.

Thanks,

Dwey
-- 
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists