I'm a beginner in DMTCP and trying to run an MPI application. My environment,
running in KVM hypervisor:
Master/coordinator:
OS: RHEL Server 7.6
CPUs: 2
RAM: 2048
Node1 and Node 2 (each):
OS: RHEl Server 7.6
CPUs: 1
RAM: 1024
If I try to run the dmtcp without OpenMP all works right, the same is true to
run OpenMP without the dmtcp. Only when I run the DMTCP with OpenMP I receive
the error:
Command used:
dmtcp_launch mpirun -host 192.168.122.225,192.168.122.163 gmx mdrun -s
ion_channel.tpr -maxh 0.50 -resethway -noconfout -nsteps 10000 -g logfile
"[48000] WARNING at signalwrappers.cpp:141 in sigaction;
REASON='JWARNING(false) failed'
"Application trying to use DMTCP's signal for it's own use.\n" " You
should employ a different signal by setting the\n" " environment variable
DMTCP_SIGCKPT to the number\n" " of the signal that DMTCP should use for
checkpointing." = Application trying to use DMTCP's signal for it's own use.
You should employ a different signal by setting the
environment variable DMTCP_SIGCKPT to the number
of the signal that DMTCP should use for checkpointing.
stopSignal = 12
[48000] WARNING at socketconnection.cpp:219 in TcpConnection;
REASON='JWARNING(false) failed'
type = 2
Message: Datagram Sockets not supported. Hopefully, this is a short lived
connection!
[49000] NOTE at ssh.cpp:423 in prepareForExec; REASON='New ssh command'
newCommand = /usr/local/bin/dmtcp_ssh --ssh-slave
/usr/local/bin/dmtcp_nocheckpoint /usr/bin/ssh -x 192.168.122.225
/usr/local/bin/dmtcp_launch --coord-host 0.0.0.0 --coord-port 7779 --ckptdir
/home/roribeir /usr/local/bin/dmtcp_sshd --ssh-slave orted --hnp-topo-sig
0N:2S:2L3:2L2:2L1:2C:2H:x86_64 -mca ess "env" -mca orte_ess_jobid "1796145152"
-mca orte_ess_vpid 1 -mca orte_ess_num_procs "2" -mca orte_hnp_uri
"1796145152.0;tcp://192.168.122.158:50099" --tree-spawn -mca plm "rsh"
--tree-spawn
I tried to change the signal to 16 and to 21 using the option --ckpt-signal,
but keep don't working, only the error was changed:
Command:
dmtcp_launch --ckpt-signal 21 mpirun -host 192.168.122.225 gmx mdrun -s
ion_channel.tpr -maxh 0.50 -resethway -noconfout -nsteps 10000 -g logfile
error:
[51000] WARNING at socketconnection.cpp:219 in TcpConnection;
REASON='JWARNING(false) failed'
type = 2
Message: Datagram Sockets not supported. Hopefully, this is a short lived
connection!
[52000] NOTE at ssh.cpp:423 in prepareForExec; REASON='New ssh command'
newCommand = /usr/local/bin/dmtcp_ssh --ssh-slave
/usr/local/bin/dmtcp_nocheckpoint /usr/bin/ssh -x 192.168.122.225
/usr/local/bin/dmtcp_launch --coord-host 0.0.0.0 --coord-port 7779
--ckpt-signal 21 --ckptdir /home/roribeir /usr/local/bin/dmtcp_sshd
--ssh-slave orted --hnp-topo-sig 0N:2S:2L3:2L2:2L1:2C:2H:x86_64 -mca ess "env"
-mca orte_ess_jobid "397869056" -mca orte_ess_vpid 1 -mca orte_ess_num_procs
"2" -mca orte_hnp_uri "397869056.0;tcp://192.168.122.158:43551" --tree-spawn
-mca plm "rsh" --tree-spawn
--
Best regards,
Rodrigo Vitor Ribeiro
Intern
Red Hat <https://www.redhat.com>
3900 Brigadeiro Faria Lima Ave.
Sao Paulo, SP 04538 BR
[email protected]<mailto:[email protected]> M:
+55-11-981537326<tel:+55-11-981537326>
[https://ci5.googleusercontent.com/proxy/xXqg35UrSxPgyjuIn0l27pX9ZCdn0XNE5N1LwbvIoIktj6W7NLafQwtbezJ4YuhNgbC8VocSlYRAohr0UPRS7E0mN9vkdiZYV8ZmZSM=s0-d-e1-ft#https://www.redhat.com/files/brand/email/sig-redhat.png]<https://red.ht/sig>
_______________________________________________
Dmtcp-forum mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum