Hi Marina, Is it possible to access the nodes from outside? If so, can I have access to the machines? It'll be more convenient if we can diagnose the bug locally.
Best, Jiajun On Tue, Oct 28, 2014 at 3:52 PM, Marina Moran <[email protected]> wrote: > Hi Jiajunm! > > It is Ethernet (100Mbps), and I am *not* using torque nor any batch > queue system. > > Regards > Marina > > On 10/28/14, Jiajun Cao <[email protected]> wrote: > > Hi Marina, > > > > What's the network are you using? Is it Ethernet or InfiniBand? > > > > On Tue, Oct 28, 2014 at 12:26 PM, Kapil Arya <[email protected]> > > wrote: > > > >> Hi Jiajun, > >> > >> Can you take a look at this one? > >> > >> Kapil > >> > >> On Tue, Oct 28, 2014 at 8:50 AM, Marina Moran < > >> [email protected]> wrote: > >> > >>> Hi, > >>> > >>> I am now trying using dmtcp with two nodes, each one with 4 cores, and > >>> Debian jessie amd64, > >>> OpenMPI 1.6.5, DMTCP: 2.3.1 (using the last trank from git is the same > >>> problem). > >>> > >>> When I lauch the program, it hangs out, showing this: > >>> > >>> hpcpro@m112a:~/NPB3.3/NPB3.3-MPI/bin$ ~/dmtcp-trunk/bin/dmtcp_launch > >>> mpirun -np 8 -hostfile hosts lu.B.8 > >>> [45000] WARNING at socketconnection.cpp:187 in TcpConnection; > >>> REASON='JWARNING(false) failed' > >>> type = 2 > >>> Message: Datagram Sockets not supported. Hopefully, this is a short > >>> lived connection! > >>> [46000] NOTE at ssh.cpp:348 in prepareForExec; REASON='New ssh command' > >>> newCommand = /home/hpcpro/dmtcp-trunk/bin/dmtcp_ssh > >>> /home/hpcpro/dmtcp-trunk/bin/dmtcp_nocheckpoint /usr/bin/ssh -x > >>> 10.0.2.21 /home/hpcpro/dmtcp-trunk/bin/dmtcp_launch --ssh-slave --host > >>> m112a --ckptdir /home/hpcpro/NPB3.3/NPB3.3-MPI/bin > >>> /home/hpcpro/dmtcp-trunk/bin/dmtcp_sshd orted --daemonize -mca ess > >>> env -mca orte_ess_jobid 758841344 -mca orte_ess_vpid 1 -mca > >>> orte_ess_num_procs 2 --hnp-uri "758841344.0;tcp://10.0.2.22:59106" > >>> -mca plm rsh > >>> > >>> > >>> > >>> and the coordinator shows: > >>> > >>> [7832] NOTE at dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker > >>> connected' > >>> hello_remote.from = 1310c956110-8088-544eef91 > >>> [7832] NOTE at dmtcp_coordinator.cpp:825 in onData; REASON='Updating > >>> process Information after exec()' > >>> progname = orterun > >>> msg.from = 1310c956110-51000-544eef91 > >>> client->identity() = 1310c956110-8088-544eef91 > >>> [7832] NOTE at dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker > >>> connected' > >>> hello_remote.from = 1310c956110-51000-544eef91 > >>> [7832] NOTE at dmtcp_coordinator.cpp:816 in onData; REASON='Updating > >>> process Information after fork()' > >>> client->hostname() = m112a > >>> client->progname() = orterun_(forked) > >>> msg.from = 1310c956110-52000-544eef91 > >>> client->identity() = 1310c956110-51000-544eef91 > >>> [7832] NOTE at dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker > >>> connected' > >>> hello_remote.from = 1310c956110-52000-544eef91 > >>> [7832] NOTE at dmtcp_coordinator.cpp:816 in onData; REASON='Updating > >>> process Information after fork()' > >>> client->hostname() = m112a > >>> client->progname() = dmtcp_ssh_(forked) > >>> msg.from = 1310c956110-53000-544eef91 > >>> client->identity() = 1310c956110-52000-544eef91 > >>> [7832] NOTE at dmtcp_coordinator.cpp:875 in onDisconnect; > >>> REASON='client disconnected' > >>> client->identity() = 1310c956110-53000-544eef91 > >>> [7832] NOTE at dmtcp_coordinator.cpp:825 in onData; REASON='Updating > >>> process Information after exec()' > >>> progname = dmtcp_ssh > >>> msg.from = 1310c956110-52000-544eef91 > >>> client->identity() = 1310c956110-52000-544eef91 > >>> [7832] NOTE at dmtcp_coordinator.cpp:875 in onDisconnect; > >>> REASON='client disconnected' > >>> client->identity() = 1310c956110-52000-544eef91 > >>> l > >>> Client List: > >>> #, PROG[virtPID:realPID]@HOST, DMTCP-UNIQUEPID, STATE > >>> 32, orterun[51000:8088]@m112a, 1310c956110-51000-544eef91, RUNNING > >>> > >>> > >>> Any suggestions? > >>> > >>> Thanks in advance! > >>> Marina > >>> > >>> > >>> > ------------------------------------------------------------------------------ > >>> _______________________________________________ > >>> Dmtcp-forum mailing list > >>> [email protected] > >>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum > >>> > >> > >> > > >
------------------------------------------------------------------------------
_______________________________________________ Dmtcp-forum mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
