Hi Jiajun,

Can you take a look at this one?

Kapil

On Tue, Oct 28, 2014 at 8:50 AM, Marina Moran <[email protected]>
wrote:

> Hi,
>
> I am now trying using dmtcp with two nodes, each one with 4 cores, and
> Debian jessie amd64,
> OpenMPI 1.6.5, DMTCP: 2.3.1 (using the last trank from git is the same
> problem).
>
> When I lauch the program, it hangs out, showing this:
>
> hpcpro@m112a:~/NPB3.3/NPB3.3-MPI/bin$ ~/dmtcp-trunk/bin/dmtcp_launch
> mpirun -np 8 -hostfile hosts lu.B.8
> [45000] WARNING at socketconnection.cpp:187 in TcpConnection;
> REASON='JWARNING(false) failed'
>      type = 2
> Message: Datagram Sockets not supported. Hopefully, this is a short
> lived connection!
> [46000] NOTE at ssh.cpp:348 in prepareForExec; REASON='New ssh command'
>      newCommand = /home/hpcpro/dmtcp-trunk/bin/dmtcp_ssh
> /home/hpcpro/dmtcp-trunk/bin/dmtcp_nocheckpoint /usr/bin/ssh -x
> 10.0.2.21 /home/hpcpro/dmtcp-trunk/bin/dmtcp_launch --ssh-slave --host
> m112a --ckptdir /home/hpcpro/NPB3.3/NPB3.3-MPI/bin
> /home/hpcpro/dmtcp-trunk/bin/dmtcp_sshd  orted --daemonize -mca ess
> env -mca orte_ess_jobid 758841344 -mca orte_ess_vpid 1 -mca
> orte_ess_num_procs 2 --hnp-uri "758841344.0;tcp://10.0.2.22:59106"
> -mca plm rsh
>
>
>
> and the coordinator shows:
>
> [7832] NOTE at dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker
> connected'
>      hello_remote.from = 1310c956110-8088-544eef91
> [7832] NOTE at dmtcp_coordinator.cpp:825 in onData; REASON='Updating
> process Information after exec()'
>      progname = orterun
>      msg.from = 1310c956110-51000-544eef91
>      client->identity() = 1310c956110-8088-544eef91
> [7832] NOTE at dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker
> connected'
>      hello_remote.from = 1310c956110-51000-544eef91
> [7832] NOTE at dmtcp_coordinator.cpp:816 in onData; REASON='Updating
> process Information after fork()'
>      client->hostname() = m112a
>      client->progname() = orterun_(forked)
>      msg.from = 1310c956110-52000-544eef91
>      client->identity() = 1310c956110-51000-544eef91
> [7832] NOTE at dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker
> connected'
>      hello_remote.from = 1310c956110-52000-544eef91
> [7832] NOTE at dmtcp_coordinator.cpp:816 in onData; REASON='Updating
> process Information after fork()'
>      client->hostname() = m112a
>      client->progname() = dmtcp_ssh_(forked)
>      msg.from = 1310c956110-53000-544eef91
>      client->identity() = 1310c956110-52000-544eef91
> [7832] NOTE at dmtcp_coordinator.cpp:875 in onDisconnect;
> REASON='client disconnected'
>      client->identity() = 1310c956110-53000-544eef91
> [7832] NOTE at dmtcp_coordinator.cpp:825 in onData; REASON='Updating
> process Information after exec()'
>      progname = dmtcp_ssh
>      msg.from = 1310c956110-52000-544eef91
>      client->identity() = 1310c956110-52000-544eef91
> [7832] NOTE at dmtcp_coordinator.cpp:875 in onDisconnect;
> REASON='client disconnected'
>      client->identity() = 1310c956110-52000-544eef91
> l
> Client List:
> #, PROG[virtPID:realPID]@HOST, DMTCP-UNIQUEPID, STATE
> 32, orterun[51000:8088]@m112a, 1310c956110-51000-544eef91, RUNNING
>
>
> Any suggestions?
>
> Thanks in advance!
> Marina
>
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Dmtcp-forum mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>
------------------------------------------------------------------------------
_______________________________________________
Dmtcp-forum mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to