Hi all, I got the following error message when I tried to restart gromacs simulation from checkpoint file. I restart the simulation using fewer nodes and processes, and also I exclude one node using '--exclude=' option (in slurm) for experimental purpose.
I'm sure fewer nodes and processes are not the cause of this error as I already test that. I have checked that the cause of this error is '--exclude=' usage. I excluded 1 node named 'compute-node' when restart from checkpoint (at first run, I use all node including 'compute-node'). it seems that at first run, the submit job script was built at compute-node. So, at restart, build user mismatch appeared because compute-node was not found (excluded). Am I right ? is this behavior normal ? or is that a way to avoid this, so I can freely restart from checkpoint using any nodes without limitation. thank you in advance Regards, Husen ==========================restart script================= #!/bin/bash #SBATCH -J ayo #SBATCH -o md%j.out #SBATCH -A necis #SBATCH -N 2 #SBATCH -n 16 #SBATCH --exclude=compute-node #SBATCH --time=144:00:00 #SBATCH --mail-user=hus...@gmail.com #SBATCH --mail-type=begin #SBATCH --mail-type=end mpirun gmx_mpi mdrun -cpi md_test.cpt -deffnm md_test ===================================================== ==================================output error======================== Reading checkpoint file md_test.cpt generated: Wed Jun 15 16:30:44 2016 Build time mismatch, current program: Sel Apr 5 13:37:32 WIB 2016 checkpoint file: Rab Apr 6 09:44:51 WIB 2016 Build user mismatch, current program: pro@head-node [CMAKE] checkpoint file: pro@compute-node [CMAKE] #ranks mismatch, current program: 16 checkpoint file: 24 #PME-ranks mismatch, current program: -1 checkpoint file: 6 GROMACS patchlevel, binary or parallel settings differ from previous run. Continuation is exact, but not guaranteed to be binary identical. ------------------------------------------------------- Program gmx mdrun, VERSION 5.1.2 Source code file: /home/pro/gromacs-5.1.2/src/gromacs/gmxlib/checkpoint.cpp, line: 2216 Fatal error: Truncation of file md_test.xtc failed. Cannot do appending because of this failure. For more information and tips for troubleshooting, please check the GROMACS website at http://www.gromacs.org/Documentation/Errors ------------------------------------------------------- ================================================================ -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.