Sarah Witzke wrote:
Dear gromacs users,
First, Justin, thank you for your reply!
Second, I have a question regarding how to use a checkpoint file to rerun my broken .trr file (dmpclim1-870.trr). As I previously described my I have divided my simulation into smaller simulations, each of 200 ps duration. I did that in order not to loose too much data if the simulation should crash. I have further divided my simulations into five folders - each folder consists of a little over 200 small .trr files (and the corresponding .gro, .log, .gro, .tpr, and .out files to each of theses small .trr files) - this division is because of a limit of max. 200 hours simulation time per job on the cluster I'm using. Each time the 200 hours have been used, a new folder is created from where the simulation is continued.
I would think dealing with 200 files would be a major headache :) Checkpointing
makes such practice really obsolete. If the system goes down and your run
crashes, you never lose more than -cpt amount of time (default of 15 minutes).
Otherwise, specifying nstxout, etc might be your friend :)
In each of these five folder I only found these two .cpt files: "state.cpt" and
"state_prev.cpt".
My commands are:
tpbconv -f dmpclim1-XX.trr -s dmpclim1-XX.tpr -e dmpclim1-XX.edr -extend 200
dmpclim1-YY.tpr
mdrun_mpi -np 4 -v -s dmpclim1-YY.tpr -o dmpclim1-YY.trr -c dmpclim1-YY.gro -e dmpclim1-YY.edr -g dmpclim1-YY.log >& dmpclim1-YY.out
Using tpbconv in this way is also obsolete and introduces small (probably
negligible) errors. To get a binary identical continuation, you need to make
use of the checkpoint file:
http://wiki.gromacs.org/index.php/Extending_Simulations
I guess that since a new checkpoint file is written every 15 minutes that it will overwrite
the previous one. Is that correct? It seems unfortunate to me that it does not make new .cpt
files for each small .trr file as it is done for e.g. .log files (naming them something like
"#state.cpt.1# and so on). If I have understood it correctly I'm not able to use my
checkpoint file, because my simulation continued without errors thus overwriting the needed
.cpt files several times. To learn from my mistakes: Next time I do simulations will an option
like "-cpo dmpclim1-YY.cpt" create a checkpoint file for each small .trr file?
That would just clog up disk space, really. If your simulation has proceeded
from the previous checkpoint with no problem, then really all that *should* be
necessary in most cases is the most recent checkpoint.
I think your problem likely stems from a file system blip. I've experienced
similar behavior when our NFS server acts up, and an incomplete frame is
written, so the Gromacs tools detect massive coordinates/velocities/forces or
whatever when processing the output.
-Justin
Best,
Sarah
-----Original Message-----
From: gmx-users-boun...@gromacs.org on behalf of Justin A. Lemkul
Sent: Sat 21-03-2009 13:20
To: Discussion list for GROMACS users
Subject: Re: [gmx-users] One more broken .trr file
Sarah Witzke wrote:
Dear gromacs users,
I would very much appreciate it if anyone could give me an advice on the
following situation:
I have run a simulation of a small molecule diffusion into a lipid membrane (gromacs
version 4.0). The simulation was run for ~220 ns and stored in small individual .trr
files each of ~0.2 ns (giving a total of 1098 small .trr files). There were no errors or
otherwise "suspicious" behavior during the simulation.
After the simulation I concatenated all the small .trr files into one big .trr
file (version 4.0.2 to correspond with other simulations):
trjcat -f *.trr -o dmpclim1-all.trr
trjcat gave no error message, the last line output to the screen was:
"last frame written was 219600.015625 ps"
After the concatenation I checked the big .trr file with gmxcheck:
gmxcheck -f dmpclim1-all.trr
The result was:
Checking file dmpclim1-all.trr
trn version: GMX_trn_file (single precision)
Reading frame 0 time 0.000
# Atoms 35508
Reading frame 17000 time 170000.016 Warning at frame 17379: coordinates for
atom 10917 are large (-2.99061e+19)
Warning at frame 17379: coordinates for atom 10921 are large (1.42767e+31)
Warning at frame 17379: coordinates for atom 10925 are large (-1.29194e+13)
Warning at frame 17379: coordinates for atom 10925 are large (1.51714e+34)
Reading frame 21000 time 210000.016
Item #frames Timestep (ps)
Step 21961 10
Time 21961 10
Lambda 21961 10
Coords 21961 10
Velocities 21961 10
Forces 0
Box 21961 10
Frame 17379 is located in the small .trr file number 870. .trr file 870
consists of 22 frames and the error is in frame 20.
Looking at dmpclim1-870.trr in VMD reveals that two water molecules are far,
far away (as noted by gmxcheck) in frame 20. Both in frame 19 and in frame 21
the two waters are placed nicely in the box.
The dmpclim1-870.log and the screen output from 870 are both normal (i.e. they
look similar to all the other steps), so my guess is that something happened
during writing to file?
I remember a similar problem posted very recently:
http://www.gromacs.org/component/option,com_wrapper/Itemid,165/
Reading these emails I understand that there is no way to delete just a single
frame - is that correct?
When posting links, right-click the frame and open it in a new window/tab. Then
you will have the link that actually points to the message you found. This link
is just the search page :)
I have thought about two possible options for me now:
1) Use the suggestion given by Justin Lemkul in the email mentioned:
trjconv -f dmpclim1-870.trr -b 0 -e 19 -o xxx.trr
This is guess would make me loose 3 frames corresponding to (0.2 ns/22)*3 = 0.027 ns. It's not a problem to have 0.027 ns less of simulation, but will it affect later on when I concatenate the small .trr files, convert them to an .xtc file, and then use that to calculate e.g. area/lipid or membrane thickness? Will there be a time-mismatch?
Yes, you will likely get complaints from all the Gromacs tools in such a case.
The other option is to uniformly cut out frames from all your .trr files
(trjconv -skip), such that the bad frame would never appear, and you would have
uniformly-spaced frames in all of your .trr files. That may be sacrificing
quite a bit of data, however.
2) Redo step 870. I'm able to redo step 870 quite easily, but what will then happen when I try to
concatenate all the small .trr files? I fear that the "old" -870.trr wouldn't be exactly
identical to the "new" -870.trr (due to round-off) and that this would make a mismatch
with -871.trr?
If you have a checkpoint file, you should get a binary identical continuation.
-Justin
I'm very sorry to ask this kind of question again, but I hope you'll bear with
me and have the patience to help me!
Best regards,
Sarah
_______________________________________________
gmx-users mailing list gmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/mailing_lists/users.php
--
========================================
Justin A. Lemkul
Graduate Research Assistant
ICTAS Doctoral Scholar
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at]vt.edu | (540) 231-9080
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin
========================================
_______________________________________________
gmx-users mailing list gmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/mailing_lists/users.php