Re: [gmx-users] Long trajectory split
Dear Dr, Which details or files do you need? I would be very happy to solve this question by posting any kind of files that you request. 2014-02-23 22:21 GMT+01:00 Dr. Vitaly Chaban vvcha...@gmail.com: You do not provide all the details. As was pointed at the very beginning, most likely you have incorrect parallelism in this case. Can you post all the files you obtain for people to inspect? Dr. Vitaly V. Chaban On Sun, Feb 23, 2014 at 9:04 PM, Marcelo Depólo marcelodep...@gmail.com wrote: Justin, as far as I realized, the next log file starts at 0ps what would mean that it is re-starting for some reason. At first, I imagined that it was only splitting the data among files due to some kind of size limit, as you said, but when I tried to concatenate the trajectories, it gives me a non-sense output, with a lot of 'beginnings'. I will check with the cluster experts if there is some kind of size limit.It seems to be the most logical source of the problem to me. Mark, the only difference this time is the time-scale set since the beginning. Apart from the protein itself, even the .mdp files were copied from a sucessful folder. But thank you both for the support. 2014-02-23 20:20 GMT+01:00 Mark Abraham mark.j.abra...@gmail.com: On Sun, Feb 23, 2014 at 6:48 PM, Marcelo Depólo marcelodep...@gmail.com wrote: Justin, the other runs with the very same binary do not produce the same problem. Mark, I just omitted the _mpi of the line here, but is was compiled as _mpi. OK, that rules that problem out, but please don't simplify and approximate. Computers are exact, and trouble shooting problems with them requires all the information. If we all understood perfectly we wouldn't be having problems ;-) Those files do get closed at checkpoint intervals, so they can be hashed for the hash value to be saved in the checkpoint. It is conceivable some file system would not close-and-re-open them properly. The .log files would comment about at least some such conditions. But the real question is what you are doing differently from the times when you have observed normal behaviour! Mark My log file top: *Gromacs version:VERSION 4.6.1Precision: singleMemory model: 64 bitMPI library:MPIOpenMP support: disabledGPU support:disabledinvsqrt routine:gmx_software_invsqrt(x)CPU acceleration: SSE4.1FFT library:fftw-3.3.2-sse2Large file support: enabledRDTSCP usage: enabledBuilt on: Sex Nov 29 16:08:45 BRST 2013Built by: root@jupiter [CMAKE]Build OS/arch: Linux 2.6.32.13-0.4-default x86_64Build CPU vendor: GenuineIntelBuild CPU brand:Intel(R) Xeon(R) CPU X5650 @ 2.67GHzBuild CPU family: 6 Model: 44 Stepping: 2Build CPU features: apic clfsh cmov cx8 cx16 htt lahf_lm mmx msr nonstop_tsc pcid pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3(...)* *Initializing Domain Decomposition on 24 nodesDynamic load balancing: autoWill sort the charge groups at every domain (re)decompositionInitial maximum inter charge-group distances:two-body bonded interactions: 0.621 nm, LJ-14, atoms 3801 3812 multi-body bonded interactions: 0.621 nm, G96Angle, atoms 3802 3812Minimum cell size due to bonded interactions: 0.683 nmMaximum distance for 5 constraints, at 120 deg. angles, all-trans: 0.820 nmEstimated maximum distance required for P-LINCS: 0.820 nmThis distance will limit the DD cell size, you can override this with -rconGuess for relative PME load: 0.26Will use 18 particle-particle and 6 PME only nodesThis is a guess, check the performance at the end of the log fileUsing 6 separate PME nodesScaling the initial minimum size with 1/0.8 (option -dds) = 1.25Optimizing the DD grid for 18 cells with a minimum initial size of 1.025 nmThe maximum allowed number of cells is: X 8 Y 8 Z 8Domain decomposition grid 3 x 2 x 3, separate PME nodes 6PME domain decomposition: 3 x 2 x 1Interleaving PP and PME nodesThis is a particle-particle only nodeDomain decomposition nodeid 0, coordinates 0 0 0* 2014-02-23 18:08 GMT+01:00 Justin Lemkul jalem...@vt.edu: On 2/23/14, 11:32 AM, Marcelo Depólo wrote: Maybe I should explain it better. I am using *mpirun -np 24 mdrun -s prt.tpr -e prt.edr -o prt.trr*, pretty much a standard line. This job in a batch creates the outputs and, after some (random) time, a back up is done and new files are written, but the job itself do not finish. It would help if you can post the .log file from one of the runs to see the information regarding mdrun's parallel capabilities. This still sounds like a case
Re: [gmx-users] Long trajectory split
The only real way to troubleshot this kind of problems is that someone here starts your system at his local PC and sees the problem by the own eyes. As no one confirmed the same issue as yours, it is most likely that the cause of problem lies outside gromacs code. Either something is wrong with your operational environment or you interpret your observations incorrectly. Dr. Vitaly V. Chaban On Thu, Feb 27, 2014 at 1:33 PM, Marcelo Depólo marcelodep...@gmail.com wrote: Dear Dr, Which details or files do you need? I would be very happy to solve this question by posting any kind of files that you request. 2014-02-23 22:21 GMT+01:00 Dr. Vitaly Chaban vvcha...@gmail.com: You do not provide all the details. As was pointed at the very beginning, most likely you have incorrect parallelism in this case. Can you post all the files you obtain for people to inspect? Dr. Vitaly V. Chaban On Sun, Feb 23, 2014 at 9:04 PM, Marcelo Depólo marcelodep...@gmail.com wrote: Justin, as far as I realized, the next log file starts at 0ps what would mean that it is re-starting for some reason. At first, I imagined that it was only splitting the data among files due to some kind of size limit, as you said, but when I tried to concatenate the trajectories, it gives me a non-sense output, with a lot of 'beginnings'. I will check with the cluster experts if there is some kind of size limit.It seems to be the most logical source of the problem to me. Mark, the only difference this time is the time-scale set since the beginning. Apart from the protein itself, even the .mdp files were copied from a sucessful folder. But thank you both for the support. 2014-02-23 20:20 GMT+01:00 Mark Abraham mark.j.abra...@gmail.com: On Sun, Feb 23, 2014 at 6:48 PM, Marcelo Depólo marcelodep...@gmail.com wrote: Justin, the other runs with the very same binary do not produce the same problem. Mark, I just omitted the _mpi of the line here, but is was compiled as _mpi. OK, that rules that problem out, but please don't simplify and approximate. Computers are exact, and trouble shooting problems with them requires all the information. If we all understood perfectly we wouldn't be having problems ;-) Those files do get closed at checkpoint intervals, so they can be hashed for the hash value to be saved in the checkpoint. It is conceivable some file system would not close-and-re-open them properly. The .log files would comment about at least some such conditions. But the real question is what you are doing differently from the times when you have observed normal behaviour! Mark My log file top: *Gromacs version:VERSION 4.6.1Precision: singleMemory model: 64 bitMPI library:MPIOpenMP support: disabledGPU support:disabledinvsqrt routine: gmx_software_invsqrt(x)CPU acceleration: SSE4.1FFT library:fftw-3.3.2-sse2Large file support: enabledRDTSCP usage: enabledBuilt on: Sex Nov 29 16:08:45 BRST 2013Built by: root@jupiter [CMAKE]Build OS/arch: Linux 2.6.32.13-0.4-default x86_64Build CPU vendor: GenuineIntelBuild CPU brand:Intel(R) Xeon(R) CPU X5650 @ 2.67GHzBuild CPU family: 6 Model: 44 Stepping: 2Build CPU features: apic clfsh cmov cx8 cx16 htt lahf_lm mmx msr nonstop_tsc pcid pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3(...)* *Initializing Domain Decomposition on 24 nodesDynamic load balancing: autoWill sort the charge groups at every domain (re)decompositionInitial maximum inter charge-group distances:two-body bonded interactions: 0.621 nm, LJ-14, atoms 3801 3812 multi-body bonded interactions: 0.621 nm, G96Angle, atoms 3802 3812Minimum cell size due to bonded interactions: 0.683 nmMaximum distance for 5 constraints, at 120 deg. angles, all-trans: 0.820 nmEstimated maximum distance required for P-LINCS: 0.820 nmThis distance will limit the DD cell size, you can override this with -rconGuess for relative PME load: 0.26Will use 18 particle-particle and 6 PME only nodesThis is a guess, check the performance at the end of the log fileUsing 6 separate PME nodesScaling the initial minimum size with 1/0.8 (option -dds) = 1.25Optimizing the DD grid for 18 cells with a minimum initial size of 1.025 nmThe maximum allowed number of cells is: X 8 Y 8 Z 8Domain decomposition grid 3 x 2 x 3, separate PME nodes 6PME domain decomposition: 3 x 2 x 1Interleaving PP and PME nodesThis is a particle-particle only nodeDomain decomposition nodeid 0, coordinates 0 0 0* 2014-02-23 18:08 GMT+01:00 Justin Lemkul jalem...@vt.edu: On 2/23/14, 11:32
Re: [gmx-users] Long trajectory split
On 2/23/14, 10:43 AM, Marcelo Depólo wrote: Hey, I am running this 1000ns simulation but for some reason mdrun is backing up the data in multiple files (.edr.1# - .edr.9#, for instance). Is it a normal behavior? No, that means rather than launching a parallel mdrun process, you're running multiple instances of mdrun that are producing output files that are each producing their own output files. Gromacs tools don't overwrite existing files, so they backup existing files of the same name. -Justin -- == Justin A. Lemkul, Ph.D. Ruth L. Kirschstein NRSA Postdoctoral Fellow Department of Pharmaceutical Sciences School of Pharmacy Health Sciences Facility II, Room 601 University of Maryland, Baltimore 20 Penn St. Baltimore, MD 21201 jalem...@outerbanks.umaryland.edu | (410) 706-7441 http://mackerell.umaryland.edu/~jalemkul == -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] Long trajectory split
But it is not quite happening simultaneously, Justin. It is producing one after another and, consequently, backing up the files. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] Long trajectory split
On 2/23/14, 11:00 AM, Marcelo Depólo wrote: But it is not quite happening simultaneously, Justin. It is producing one after another and, consequently, backing up the files. You'll have to provide the exact commands you're issuing. Likely you're leaving the output names to the default, which causes them to be backed up rather than overwritten. -Justin -- == Justin A. Lemkul, Ph.D. Ruth L. Kirschstein NRSA Postdoctoral Fellow Department of Pharmaceutical Sciences School of Pharmacy Health Sciences Facility II, Room 601 University of Maryland, Baltimore 20 Penn St. Baltimore, MD 21201 jalem...@outerbanks.umaryland.edu | (410) 706-7441 http://mackerell.umaryland.edu/~jalemkul == -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] Long trajectory split
are you sure that your binary is parallel? how many frames do those trajectory files contain? Dr. Vitaly V. Chaban On Sun, Feb 23, 2014 at 5:32 PM, Marcelo Depólo marcelodep...@gmail.com wrote: Maybe I should explain it better. I am using *mpirun -np 24 mdrun -s prt.tpr -e prt.edr -o prt.trr*, pretty much a standard line. This job in a batch creates the outputs and, after some (random) time, a back up is done and new files are written, but the job itself do not finish. 2014-02-23 17:12 GMT+01:00 Justin Lemkul jalem...@vt.edu: On 2/23/14, 11:00 AM, Marcelo Depólo wrote: But it is not quite happening simultaneously, Justin. It is producing one after another and, consequently, backing up the files. You'll have to provide the exact commands you're issuing. Likely you're leaving the output names to the default, which causes them to be backed up rather than overwritten. -Justin -- == Justin A. Lemkul, Ph.D. Ruth L. Kirschstein NRSA Postdoctoral Fellow Department of Pharmaceutical Sciences School of Pharmacy Health Sciences Facility II, Room 601 University of Maryland, Baltimore 20 Penn St. Baltimore, MD 21201 jalem...@outerbanks.umaryland.edu | (410) 706-7441 http://mackerell.umaryland.edu/~jalemkul == -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/ Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Marcelo Depólo Polêto Uppsala Universitet - Sweden Science without Borders - CAPES Phone: +46 76 581 67 49 -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] Long trajectory split
Pretty sure. I ran other simulations in the same system and worked just fine. About the frames, each file contains different number of frames, apparently random numbers (one file contains 400ns of data and other contains 10ns) 2014-02-23 17:54 GMT+01:00 Dr. Vitaly Chaban vvcha...@gmail.com: are you sure that your binary is parallel? how many frames do those trajectory files contain? Dr. Vitaly V. Chaban On Sun, Feb 23, 2014 at 5:32 PM, Marcelo Depólo marcelodep...@gmail.com wrote: Maybe I should explain it better. I am using *mpirun -np 24 mdrun -s prt.tpr -e prt.edr -o prt.trr*, pretty much a standard line. This job in a batch creates the outputs and, after some (random) time, a back up is done and new files are written, but the job itself do not finish. 2014-02-23 17:12 GMT+01:00 Justin Lemkul jalem...@vt.edu: On 2/23/14, 11:00 AM, Marcelo Depólo wrote: But it is not quite happening simultaneously, Justin. It is producing one after another and, consequently, backing up the files. You'll have to provide the exact commands you're issuing. Likely you're leaving the output names to the default, which causes them to be backed up rather than overwritten. -Justin -- == Justin A. Lemkul, Ph.D. Ruth L. Kirschstein NRSA Postdoctoral Fellow Department of Pharmaceutical Sciences School of Pharmacy Health Sciences Facility II, Room 601 University of Maryland, Baltimore 20 Penn St. Baltimore, MD 21201 jalem...@outerbanks.umaryland.edu | (410) 706-7441 http://mackerell.umaryland.edu/~jalemkul == -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/ Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Marcelo Depólo Polêto Uppsala Universitet - Sweden Science without Borders - CAPES Phone: +46 76 581 67 49 -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Marcelo Depólo Polêto Uppsala Universitet - Sweden Science without Borders - CAPES Phone: +46 76 581 67 49 -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] Long trajectory split
On 2/23/14, 11:32 AM, Marcelo Depólo wrote: Maybe I should explain it better. I am using *mpirun -np 24 mdrun -s prt.tpr -e prt.edr -o prt.trr*, pretty much a standard line. This job in a batch creates the outputs and, after some (random) time, a back up is done and new files are written, but the job itself do not finish. It would help if you can post the .log file from one of the runs to see the information regarding mdrun's parallel capabilities. This still sounds like a case of an incorrectly compiled binary. Do other runs with the same binary produce the same problem? -Justin 2014-02-23 17:12 GMT+01:00 Justin Lemkul jalem...@vt.edu: On 2/23/14, 11:00 AM, Marcelo Depólo wrote: But it is not quite happening simultaneously, Justin. It is producing one after another and, consequently, backing up the files. You'll have to provide the exact commands you're issuing. Likely you're leaving the output names to the default, which causes them to be backed up rather than overwritten. -Justin -- == Justin A. Lemkul, Ph.D. Ruth L. Kirschstein NRSA Postdoctoral Fellow Department of Pharmaceutical Sciences School of Pharmacy Health Sciences Facility II, Room 601 University of Maryland, Baltimore 20 Penn St. Baltimore, MD 21201 jalem...@outerbanks.umaryland.edu | (410) 706-7441 http://mackerell.umaryland.edu/~jalemkul == -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/ Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- == Justin A. Lemkul, Ph.D. Ruth L. Kirschstein NRSA Postdoctoral Fellow Department of Pharmaceutical Sciences School of Pharmacy Health Sciences Facility II, Room 601 University of Maryland, Baltimore 20 Penn St. Baltimore, MD 21201 jalem...@outerbanks.umaryland.edu | (410) 706-7441 http://mackerell.umaryland.edu/~jalemkul == -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] Long trajectory split
Normally an MPI-enabled mdrun would be named mdrun_mpi, and running a non-MPI mdrun would produce symptoms like yours depending exactly how your filesystem chooses to do things, so Justin and Vitaly's theory is sound. Look at the top section of your .log file for what mdrun thinks about MPI! Mark On Sun, Feb 23, 2014 at 5:32 PM, Marcelo Depólo marcelodep...@gmail.comwrote: Maybe I should explain it better. I am using *mpirun -np 24 mdrun -s prt.tpr -e prt.edr -o prt.trr*, pretty much a standard line. This job in a batch creates the outputs and, after some (random) time, a back up is done and new files are written, but the job itself do not finish. 2014-02-23 17:12 GMT+01:00 Justin Lemkul jalem...@vt.edu: On 2/23/14, 11:00 AM, Marcelo Depólo wrote: But it is not quite happening simultaneously, Justin. It is producing one after another and, consequently, backing up the files. You'll have to provide the exact commands you're issuing. Likely you're leaving the output names to the default, which causes them to be backed up rather than overwritten. -Justin -- == Justin A. Lemkul, Ph.D. Ruth L. Kirschstein NRSA Postdoctoral Fellow Department of Pharmaceutical Sciences School of Pharmacy Health Sciences Facility II, Room 601 University of Maryland, Baltimore 20 Penn St. Baltimore, MD 21201 jalem...@outerbanks.umaryland.edu | (410) 706-7441 http://mackerell.umaryland.edu/~jalemkul == -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/ Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Marcelo Depólo Polêto Uppsala Universitet - Sweden Science without Borders - CAPES Phone: +46 76 581 67 49 -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] Long trajectory split
On 2/23/14, 12:10 PM, Marcelo Depólo wrote: Pretty sure. I ran other simulations in the same system and worked just fine. About the frames, each file contains different number of frames, apparently random numbers (one file contains 400ns of data and other contains 10ns) What are the starting and ending points of those data? Is the run re-starting or just writing successive time intervals to new files when it shouldn't be? Do you have some limitation on file size that is being reached, causing the new files to be generated? -Justin 2014-02-23 17:54 GMT+01:00 Dr. Vitaly Chaban vvcha...@gmail.com: are you sure that your binary is parallel? how many frames do those trajectory files contain? Dr. Vitaly V. Chaban On Sun, Feb 23, 2014 at 5:32 PM, Marcelo Depólo marcelodep...@gmail.com wrote: Maybe I should explain it better. I am using *mpirun -np 24 mdrun -s prt.tpr -e prt.edr -o prt.trr*, pretty much a standard line. This job in a batch creates the outputs and, after some (random) time, a back up is done and new files are written, but the job itself do not finish. 2014-02-23 17:12 GMT+01:00 Justin Lemkul jalem...@vt.edu: On 2/23/14, 11:00 AM, Marcelo Depólo wrote: But it is not quite happening simultaneously, Justin. It is producing one after another and, consequently, backing up the files. You'll have to provide the exact commands you're issuing. Likely you're leaving the output names to the default, which causes them to be backed up rather than overwritten. -Justin -- == Justin A. Lemkul, Ph.D. Ruth L. Kirschstein NRSA Postdoctoral Fellow Department of Pharmaceutical Sciences School of Pharmacy Health Sciences Facility II, Room 601 University of Maryland, Baltimore 20 Penn St. Baltimore, MD 21201 jalem...@outerbanks.umaryland.edu | (410) 706-7441 http://mackerell.umaryland.edu/~jalemkul == -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/ Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Marcelo Depólo Polêto Uppsala Universitet - Sweden Science without Borders - CAPES Phone: +46 76 581 67 49 -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- == Justin A. Lemkul, Ph.D. Ruth L. Kirschstein NRSA Postdoctoral Fellow Department of Pharmaceutical Sciences School of Pharmacy Health Sciences Facility II, Room 601 University of Maryland, Baltimore 20 Penn St. Baltimore, MD 21201 jalem...@outerbanks.umaryland.edu | (410) 706-7441 http://mackerell.umaryland.edu/~jalemkul == -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] Long trajectory split
Justin, as far as I realized, the next log file starts at 0ps what would mean that it is re-starting for some reason. At first, I imagined that it was only splitting the data among files due to some kind of size limit, as you said, but when I tried to concatenate the trajectories, it gives me a non-sense output, with a lot of 'beginnings'. I will check with the cluster experts if there is some kind of size limit.It seems to be the most logical source of the problem to me. Mark, the only difference this time is the time-scale set since the beginning. Apart from the protein itself, even the .mdp files were copied from a sucessful folder. But thank you both for the support. 2014-02-23 20:20 GMT+01:00 Mark Abraham mark.j.abra...@gmail.com: On Sun, Feb 23, 2014 at 6:48 PM, Marcelo Depólo marcelodep...@gmail.com wrote: Justin, the other runs with the very same binary do not produce the same problem. Mark, I just omitted the _mpi of the line here, but is was compiled as _mpi. OK, that rules that problem out, but please don't simplify and approximate. Computers are exact, and trouble shooting problems with them requires all the information. If we all understood perfectly we wouldn't be having problems ;-) Those files do get closed at checkpoint intervals, so they can be hashed for the hash value to be saved in the checkpoint. It is conceivable some file system would not close-and-re-open them properly. The .log files would comment about at least some such conditions. But the real question is what you are doing differently from the times when you have observed normal behaviour! Mark My log file top: *Gromacs version:VERSION 4.6.1Precision: singleMemory model: 64 bitMPI library:MPIOpenMP support: disabledGPU support:disabledinvsqrt routine:gmx_software_invsqrt(x)CPU acceleration: SSE4.1FFT library:fftw-3.3.2-sse2Large file support: enabledRDTSCP usage: enabledBuilt on: Sex Nov 29 16:08:45 BRST 2013Built by: root@jupiter [CMAKE]Build OS/arch: Linux 2.6.32.13-0.4-default x86_64Build CPU vendor: GenuineIntelBuild CPU brand:Intel(R) Xeon(R) CPU X5650 @ 2.67GHzBuild CPU family: 6 Model: 44 Stepping: 2Build CPU features: apic clfsh cmov cx8 cx16 htt lahf_lm mmx msr nonstop_tsc pcid pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3(...)* *Initializing Domain Decomposition on 24 nodesDynamic load balancing: autoWill sort the charge groups at every domain (re)decompositionInitial maximum inter charge-group distances:two-body bonded interactions: 0.621 nm, LJ-14, atoms 3801 3812 multi-body bonded interactions: 0.621 nm, G96Angle, atoms 3802 3812Minimum cell size due to bonded interactions: 0.683 nmMaximum distance for 5 constraints, at 120 deg. angles, all-trans: 0.820 nmEstimated maximum distance required for P-LINCS: 0.820 nmThis distance will limit the DD cell size, you can override this with -rconGuess for relative PME load: 0.26Will use 18 particle-particle and 6 PME only nodesThis is a guess, check the performance at the end of the log fileUsing 6 separate PME nodesScaling the initial minimum size with 1/0.8 (option -dds) = 1.25Optimizing the DD grid for 18 cells with a minimum initial size of 1.025 nmThe maximum allowed number of cells is: X 8 Y 8 Z 8Domain decomposition grid 3 x 2 x 3, separate PME nodes 6PME domain decomposition: 3 x 2 x 1Interleaving PP and PME nodesThis is a particle-particle only nodeDomain decomposition nodeid 0, coordinates 0 0 0* 2014-02-23 18:08 GMT+01:00 Justin Lemkul jalem...@vt.edu: On 2/23/14, 11:32 AM, Marcelo Depólo wrote: Maybe I should explain it better. I am using *mpirun -np 24 mdrun -s prt.tpr -e prt.edr -o prt.trr*, pretty much a standard line. This job in a batch creates the outputs and, after some (random) time, a back up is done and new files are written, but the job itself do not finish. It would help if you can post the .log file from one of the runs to see the information regarding mdrun's parallel capabilities. This still sounds like a case of an incorrectly compiled binary. Do other runs with the same binary produce the same problem? -Justin 2014-02-23 17:12 GMT+01:00 Justin Lemkul jalem...@vt.edu: On 2/23/14, 11:00 AM, Marcelo Depólo wrote: But it is not quite happening simultaneously, Justin. It is producing one after another and, consequently, backing up the files. You'll have to provide the exact commands you're issuing. Likely you're leaving the output names to the default, which causes them to be backed up rather than overwritten. -Justin -- == Justin A.