Re: [gmx-users] GPU crashes

2012-06-07 Thread lloyd riggs
Did you play with the time step?  Just currious, but I woundered what happened 
with 0.0008, 0.0005, 0.0002.  I found if I had a good behaving protein, as soon 
as I added a small (non-protein) molecule which rotated wildly while attached 
to the protein, it would crash unless I reduced the time step to the above when 
constraints were removed after EQ ... always it seemed to me it didnt like the 
rotation or bond angles, seeing them as a violation but acted like it was an 
amino acid? (the same bond type but with wider rotation as one end wasnt fixed 
to a chain)  If your loop moves via backbone, the calculated angles, bonds or 
whatever might appear to the computer to be violating the parameter settings 
for problems, errors, etc as it cant track them fast enough over the time step. 
Ie atom 1-2-3 and then delta 1-2-3 with xyz parameters, but then the particular 
set has additional rotation, etc and may include the chain atoms which bend 
wildly (n-Ca-Cb-Cg maybe a dihedral) but probab
 ly not this. 

Just a thought but probably not the right answere as well, it might be the way 
it is broken down (above) over GPUs, which convert everything to matricies 
(non-standard just for basic math operations not real matricies per say) for 
exicution and then some library problem which would not account for long range 
rapid (0.0005) movements at the chain (Ca,N,O to something else) and then tries 
to apply these to Cb-Cg-O-H, etc using the initial points while looking at the 
parameters for say a single amino acid...Maybe the constraints would cause 
this, which would make it a pain to EQ, but this allowed me to increase the 
time step, but would ruin the experiment I had worked on as I needed it 
unconstrained to show it didnt float away when proteins were pulled, etc...I 
was using a different integrator though...just normal MD.  

ANd your cutoffs for vdw, etc...Why are they 0?  I dont know if this means a 
defautl set is then used...but if not ?  Wouldnt they try integrating using 
both types of formula, or would it be just using coulumb or vice versa? (dont 
know what that would do to the code but assume it means no vdw, and all coulumb 
but then zeros are alwyas a problem for computers).  

Thats my thoughts on that.  Probably something else though.

Good luck,

Stephan

 Original-Nachricht 
 Datum: Wed, 06 Jun 2012 18:42:45 -0400
 Von: Justin A. Lemkul jalem...@vt.edu
 An: Discussion list for GROMACS users gmx-users@gromacs.org
 Betreff: [gmx-users] GPU crashes

 
 Hi All,
 
 I'm wondering if anyone has experienced what I'm seeing with Gromacs 4.5.5
 on 
 GPU.  It seems that certain systems fail inexplicably.  The system I am
 working 
 with is a heterodimeric protein complex bound to DNA.  After about 1 ns of
 simulation time using mdrun-gpu, all the energies become NaN.  The
 simulations 
 don't stop, they just carry on merrily producing nonsense.  I would love
 to see 
 some action regarding http://redmine.gromacs.org/issues/941 for this
 reason ;)
 
 I ran simulations of each of the components of the system individually -
 each 
 protein alone, and DNA - to try to track down what might be causing this 
 problem.  The DNA simulation is perfectly stable out to 10 ns, but each
 protein 
 fails within 2 ns.  Each protein has two domains with a flexible linker,
 and it 
 seems that as soon as the linker flexes a bit, the simulations go poof. 
 Well-behaved proteins like lysozyme and DHFR (from the benchmark set) seem
 fine, 
 but anything that twitches even a small amount fails.  This is very
 unfortunate 
 for us, as we are hoping to see domain motions on a feasible time scale
 using 
 implicit solvent on GPU hardware.
 
 Has anyone seen anything like this?  Our Gromacs implementation is being
 run on 
 an x86_64 Linux system with Tesla S2050 GPU cards.  The CUDA version is
 3.1 and 
 Gromacs is linked against OpenMM-2.0.  An .mdp file is appended below.  I
 have 
 also tested finite values for cutoffs, but the results were worse
 (failures 
 occurred more quickly).
 
 I have not been able to use the latest git version of Gromacs to test
 whether 
 anything has been fixed, but will post separately to gmx-developers
 regarding 
 the reasons for that soon.
 
 -Justin
 
 === md.mdp ===
 
 title   = Implicit solvent test
 ; Run parameters
 integrator  = sd
 dt  = 0.002
 nsteps  = 500   ; 1 ps (10 ns)
 nstcomm = 1
 comm_mode   = angular   ; non-periodic system
 ; Output parameters
 nstxout = 0
 nstvout = 0
 nstfout = 0
 nstxtcout   = 1000  ; every 2 ps
 nstlog  = 5000  ; every 10 ps
 nstenergy   = 1000  ; every 2 ps
 ; Bond parameters
 constraint_algorithm= lincs
 constraints = all-bonds
 continuation= no; starting up
 ; required cutoffs for implicit
 nstlist = 0
 ns_type = grid
 rlist   = 0
 rcoulomb= 0
 rvdw= 0
 

Re: [gmx-users] GPU crashes

2012-06-07 Thread Justin A. Lemkul



On 6/7/12 3:57 AM, lloyd riggs wrote:

Did you play with the time step?  Just currious, but I woundered what
happened with 0.0008, 0.0005, 0.0002.  I found if I had a good behaving
protein, as soon as I added a small (non-protein) molecule which rotated
wildly while attached to the protein, it would crash unless I reduced the
time step to the above when constraints were removed after EQ ... always it
seemed to me it didnt like the rotation or bond angles, seeing them as a
violation but acted like it was an amino acid? (the same bond type but with
wider rotation as one end wasnt fixed to a chain)  If your loop moves via
backbone, the calculated angles, bonds or whatever might appear to the
computer to be violating the parameter settings for problems, errors, etc as
it cant track them fast enough over the time step. Ie atom 1-2-3 and then
delta 1-2-3 with xyz parameters, but then the particular set has additional
rotation, etc and may include the chain atoms which bend wildly (n-Ca-Cb-Cg
maybe a dihedral) but proba! bly not this.

Just a thought but probably not the right answere as well, it might be the
way it is broken down (above) over GPUs, which convert everything to
matricies (non-standard just for basic math operations not real matricies per
say) for exicution and then some library problem which would not account for
long range rapid (0.0005) movements at the chain (Ca,N,O to something else)
and then tries to apply these to Cb-Cg-O-H, etc using the initial points
while looking at the parameters for say a single amino acid...Maybe the
constraints would cause this, which would make it a pain to EQ, but this
allowed me to increase the time step, but would ruin the experiment I had
worked on as I needed it unconstrained to show it didnt float away when
proteins were pulled, etc...I was using a different integrator though...just
normal MD.



I have long wondered if constraints were properly handled by the OpenMM library. 
 I am constraining all bonds, so in principle, dt of 0.002 should not be a 
problem.  The note printed indicates that the constraint algorithm is changed 
from the one selected (LINCS) to whatever OpenMM uses (SHAKE and a few others in 
combination).  Perhaps I can try running without constraints and a reduced dt, 
but I'd like to avoid it.


I wish I could efficiently test to see if this behavior was GPU-specific, but 
unfortunately the non-GPU implementation of the implicit code can currently only 
be run in serial or on 2 CPU due to an existing bug.  I can certainly test it, 
but due to the large number of atoms, it will take several days to even approach 
1 ns.



ANd your cutoffs for vdw, etc...Why are they 0?  I dont know if this means a
defautl set is then used...but if not ?  Wouldnt they try integrating using
both types of formula, or would it be just using coulumb or vice versa? (dont
know what that would do to the code but assume it means no vdw, and all
coulumb but then zeros are alwyas a problem for computers).



The setup is for the all-vs-all kernels.  Setting cutoffs equal to zero and 
using a fixed neighbor list triggers these special optimized kernels.  I have 
also noticed that long, finite cutoffs (on the order of 4.0 nm) lead to 
unacceptable energy drift and structural instability in well-behaved systems 
(even the benchmarks).  For instance, the backbone RMSD of lysozyme is twice as 
large in the case of a 4.0-nm cutoff relative to the all-vs-all setup, and the 
energy drift is quite substantial.


-Justin

--


Justin A. Lemkul, Ph.D.
Research Scientist
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at]vt.edu | (540) 231-9080
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin


--
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.

Can't post? Read http://www.gromacs.org/Support/Mailing_Lists