Hello, I am trying to use DMTCP for checkpointing MPI program but got stuck at the hellompi program. I was following the instruction on https://github.com/dmtcp/dmtcp/blob/master/QUICK-START.md <https://github.com/dmtcp/dmtcp/blob/master/QUICK-START.md>. The sequential demo works well, I can checkpoint the counting example. However, when I tried the hellmpi example using dmtcp_launch -i 5 mpirun -np 2 ./hellompi. I have tried the school’s cluster by interactive mode and also a AWS instance and they all failed to checkpoint the hellompi example. Should I use slurm or torque script to submit the jobs? Can I know what is the latest environment (version of dmtcp, mpi, etc.) you have tested on?
— Lihao Zhang
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ Dmtcp-forum mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
