Hi Jiajunm! It is Ethernet (100Mbps), and I am *not* using torque nor any batch queue system.
Regards Marina On 10/28/14, Jiajun Cao <[email protected]> wrote: > Hi Marina, > > What's the network are you using? Is it Ethernet or InfiniBand? > > On Tue, Oct 28, 2014 at 12:26 PM, Kapil Arya <[email protected]> > wrote: > >> Hi Jiajun, >> >> Can you take a look at this one? >> >> Kapil >> >> On Tue, Oct 28, 2014 at 8:50 AM, Marina Moran < >> [email protected]> wrote: >> >>> Hi, >>> >>> I am now trying using dmtcp with two nodes, each one with 4 cores, and >>> Debian jessie amd64, >>> OpenMPI 1.6.5, DMTCP: 2.3.1 (using the last trank from git is the same >>> problem). >>> >>> When I lauch the program, it hangs out, showing this: >>> >>> hpcpro@m112a:~/NPB3.3/NPB3.3-MPI/bin$ ~/dmtcp-trunk/bin/dmtcp_launch >>> mpirun -np 8 -hostfile hosts lu.B.8 >>> [45000] WARNING at socketconnection.cpp:187 in TcpConnection; >>> REASON='JWARNING(false) failed' >>> type = 2 >>> Message: Datagram Sockets not supported. Hopefully, this is a short >>> lived connection! >>> [46000] NOTE at ssh.cpp:348 in prepareForExec; REASON='New ssh command' >>> newCommand = /home/hpcpro/dmtcp-trunk/bin/dmtcp_ssh >>> /home/hpcpro/dmtcp-trunk/bin/dmtcp_nocheckpoint /usr/bin/ssh -x >>> 10.0.2.21 /home/hpcpro/dmtcp-trunk/bin/dmtcp_launch --ssh-slave --host >>> m112a --ckptdir /home/hpcpro/NPB3.3/NPB3.3-MPI/bin >>> /home/hpcpro/dmtcp-trunk/bin/dmtcp_sshd orted --daemonize -mca ess >>> env -mca orte_ess_jobid 758841344 -mca orte_ess_vpid 1 -mca >>> orte_ess_num_procs 2 --hnp-uri "758841344.0;tcp://10.0.2.22:59106" >>> -mca plm rsh >>> >>> >>> >>> and the coordinator shows: >>> >>> [7832] NOTE at dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker >>> connected' >>> hello_remote.from = 1310c956110-8088-544eef91 >>> [7832] NOTE at dmtcp_coordinator.cpp:825 in onData; REASON='Updating >>> process Information after exec()' >>> progname = orterun >>> msg.from = 1310c956110-51000-544eef91 >>> client->identity() = 1310c956110-8088-544eef91 >>> [7832] NOTE at dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker >>> connected' >>> hello_remote.from = 1310c956110-51000-544eef91 >>> [7832] NOTE at dmtcp_coordinator.cpp:816 in onData; REASON='Updating >>> process Information after fork()' >>> client->hostname() = m112a >>> client->progname() = orterun_(forked) >>> msg.from = 1310c956110-52000-544eef91 >>> client->identity() = 1310c956110-51000-544eef91 >>> [7832] NOTE at dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker >>> connected' >>> hello_remote.from = 1310c956110-52000-544eef91 >>> [7832] NOTE at dmtcp_coordinator.cpp:816 in onData; REASON='Updating >>> process Information after fork()' >>> client->hostname() = m112a >>> client->progname() = dmtcp_ssh_(forked) >>> msg.from = 1310c956110-53000-544eef91 >>> client->identity() = 1310c956110-52000-544eef91 >>> [7832] NOTE at dmtcp_coordinator.cpp:875 in onDisconnect; >>> REASON='client disconnected' >>> client->identity() = 1310c956110-53000-544eef91 >>> [7832] NOTE at dmtcp_coordinator.cpp:825 in onData; REASON='Updating >>> process Information after exec()' >>> progname = dmtcp_ssh >>> msg.from = 1310c956110-52000-544eef91 >>> client->identity() = 1310c956110-52000-544eef91 >>> [7832] NOTE at dmtcp_coordinator.cpp:875 in onDisconnect; >>> REASON='client disconnected' >>> client->identity() = 1310c956110-52000-544eef91 >>> l >>> Client List: >>> #, PROG[virtPID:realPID]@HOST, DMTCP-UNIQUEPID, STATE >>> 32, orterun[51000:8088]@m112a, 1310c956110-51000-544eef91, RUNNING >>> >>> >>> Any suggestions? >>> >>> Thanks in advance! >>> Marina >>> >>> >>> ------------------------------------------------------------------------------ >>> _______________________________________________ >>> Dmtcp-forum mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum >>> >> >> > ------------------------------------------------------------------------------ _______________________________________________ Dmtcp-forum mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
