Hi,

I am now trying using dmtcp with two nodes, each one with 4 cores, and
Debian jessie amd64,
OpenMPI 1.6.5, DMTCP: 2.3.1 (using the last trank from git is the same problem).

When I lauch the program, it hangs out, showing this:

hpcpro@m112a:~/NPB3.3/NPB3.3-MPI/bin$ ~/dmtcp-trunk/bin/dmtcp_launch
mpirun -np 8 -hostfile hosts lu.B.8
[45000] WARNING at socketconnection.cpp:187 in TcpConnection;
REASON='JWARNING(false) failed'
     type = 2
Message: Datagram Sockets not supported. Hopefully, this is a short
lived connection!
[46000] NOTE at ssh.cpp:348 in prepareForExec; REASON='New ssh command'
     newCommand = /home/hpcpro/dmtcp-trunk/bin/dmtcp_ssh
/home/hpcpro/dmtcp-trunk/bin/dmtcp_nocheckpoint /usr/bin/ssh -x
10.0.2.21 /home/hpcpro/dmtcp-trunk/bin/dmtcp_launch --ssh-slave --host
m112a --ckptdir /home/hpcpro/NPB3.3/NPB3.3-MPI/bin
/home/hpcpro/dmtcp-trunk/bin/dmtcp_sshd  orted --daemonize -mca ess
env -mca orte_ess_jobid 758841344 -mca orte_ess_vpid 1 -mca
orte_ess_num_procs 2 --hnp-uri "758841344.0;tcp://10.0.2.22:59106"
-mca plm rsh



and the coordinator shows:

[7832] NOTE at dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker
connected'
     hello_remote.from = 1310c956110-8088-544eef91
[7832] NOTE at dmtcp_coordinator.cpp:825 in onData; REASON='Updating
process Information after exec()'
     progname = orterun
     msg.from = 1310c956110-51000-544eef91
     client->identity() = 1310c956110-8088-544eef91
[7832] NOTE at dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker
connected'
     hello_remote.from = 1310c956110-51000-544eef91
[7832] NOTE at dmtcp_coordinator.cpp:816 in onData; REASON='Updating
process Information after fork()'
     client->hostname() = m112a
     client->progname() = orterun_(forked)
     msg.from = 1310c956110-52000-544eef91
     client->identity() = 1310c956110-51000-544eef91
[7832] NOTE at dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker
connected'
     hello_remote.from = 1310c956110-52000-544eef91
[7832] NOTE at dmtcp_coordinator.cpp:816 in onData; REASON='Updating
process Information after fork()'
     client->hostname() = m112a
     client->progname() = dmtcp_ssh_(forked)
     msg.from = 1310c956110-53000-544eef91
     client->identity() = 1310c956110-52000-544eef91
[7832] NOTE at dmtcp_coordinator.cpp:875 in onDisconnect;
REASON='client disconnected'
     client->identity() = 1310c956110-53000-544eef91
[7832] NOTE at dmtcp_coordinator.cpp:825 in onData; REASON='Updating
process Information after exec()'
     progname = dmtcp_ssh
     msg.from = 1310c956110-52000-544eef91
     client->identity() = 1310c956110-52000-544eef91
[7832] NOTE at dmtcp_coordinator.cpp:875 in onDisconnect;
REASON='client disconnected'
     client->identity() = 1310c956110-52000-544eef91
l
Client List:
#, PROG[virtPID:realPID]@HOST, DMTCP-UNIQUEPID, STATE
32, orterun[51000:8088]@m112a, 1310c956110-51000-544eef91, RUNNING


Any suggestions?

Thanks in advance!
Marina

------------------------------------------------------------------------------
_______________________________________________
Dmtcp-forum mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to