Hi Jiajunm!

It is Ethernet (100Mbps), and I am *not* using torque nor any batch
queue system.

Regards
Marina

On 10/28/14, Jiajun Cao <[email protected]> wrote:
> Hi Marina,
>
>   What's the network are you using? Is it Ethernet or InfiniBand?
>
> On Tue, Oct 28, 2014 at 12:26 PM, Kapil Arya <[email protected]>
> wrote:
>
>> Hi Jiajun,
>>
>> Can you take a look at this one?
>>
>> Kapil
>>
>> On Tue, Oct 28, 2014 at 8:50 AM, Marina Moran <
>> [email protected]> wrote:
>>
>>> Hi,
>>>
>>> I am now trying using dmtcp with two nodes, each one with 4 cores, and
>>> Debian jessie amd64,
>>> OpenMPI 1.6.5, DMTCP: 2.3.1 (using the last trank from git is the same
>>> problem).
>>>
>>> When I lauch the program, it hangs out, showing this:
>>>
>>> hpcpro@m112a:~/NPB3.3/NPB3.3-MPI/bin$ ~/dmtcp-trunk/bin/dmtcp_launch
>>> mpirun -np 8 -hostfile hosts lu.B.8
>>> [45000] WARNING at socketconnection.cpp:187 in TcpConnection;
>>> REASON='JWARNING(false) failed'
>>>      type = 2
>>> Message: Datagram Sockets not supported. Hopefully, this is a short
>>> lived connection!
>>> [46000] NOTE at ssh.cpp:348 in prepareForExec; REASON='New ssh command'
>>>      newCommand = /home/hpcpro/dmtcp-trunk/bin/dmtcp_ssh
>>> /home/hpcpro/dmtcp-trunk/bin/dmtcp_nocheckpoint /usr/bin/ssh -x
>>> 10.0.2.21 /home/hpcpro/dmtcp-trunk/bin/dmtcp_launch --ssh-slave --host
>>> m112a --ckptdir /home/hpcpro/NPB3.3/NPB3.3-MPI/bin
>>> /home/hpcpro/dmtcp-trunk/bin/dmtcp_sshd  orted --daemonize -mca ess
>>> env -mca orte_ess_jobid 758841344 -mca orte_ess_vpid 1 -mca
>>> orte_ess_num_procs 2 --hnp-uri "758841344.0;tcp://10.0.2.22:59106"
>>> -mca plm rsh
>>>
>>>
>>>
>>> and the coordinator shows:
>>>
>>> [7832] NOTE at dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker
>>> connected'
>>>      hello_remote.from = 1310c956110-8088-544eef91
>>> [7832] NOTE at dmtcp_coordinator.cpp:825 in onData; REASON='Updating
>>> process Information after exec()'
>>>      progname = orterun
>>>      msg.from = 1310c956110-51000-544eef91
>>>      client->identity() = 1310c956110-8088-544eef91
>>> [7832] NOTE at dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker
>>> connected'
>>>      hello_remote.from = 1310c956110-51000-544eef91
>>> [7832] NOTE at dmtcp_coordinator.cpp:816 in onData; REASON='Updating
>>> process Information after fork()'
>>>      client->hostname() = m112a
>>>      client->progname() = orterun_(forked)
>>>      msg.from = 1310c956110-52000-544eef91
>>>      client->identity() = 1310c956110-51000-544eef91
>>> [7832] NOTE at dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker
>>> connected'
>>>      hello_remote.from = 1310c956110-52000-544eef91
>>> [7832] NOTE at dmtcp_coordinator.cpp:816 in onData; REASON='Updating
>>> process Information after fork()'
>>>      client->hostname() = m112a
>>>      client->progname() = dmtcp_ssh_(forked)
>>>      msg.from = 1310c956110-53000-544eef91
>>>      client->identity() = 1310c956110-52000-544eef91
>>> [7832] NOTE at dmtcp_coordinator.cpp:875 in onDisconnect;
>>> REASON='client disconnected'
>>>      client->identity() = 1310c956110-53000-544eef91
>>> [7832] NOTE at dmtcp_coordinator.cpp:825 in onData; REASON='Updating
>>> process Information after exec()'
>>>      progname = dmtcp_ssh
>>>      msg.from = 1310c956110-52000-544eef91
>>>      client->identity() = 1310c956110-52000-544eef91
>>> [7832] NOTE at dmtcp_coordinator.cpp:875 in onDisconnect;
>>> REASON='client disconnected'
>>>      client->identity() = 1310c956110-52000-544eef91
>>> l
>>> Client List:
>>> #, PROG[virtPID:realPID]@HOST, DMTCP-UNIQUEPID, STATE
>>> 32, orterun[51000:8088]@m112a, 1310c956110-51000-544eef91, RUNNING
>>>
>>>
>>> Any suggestions?
>>>
>>> Thanks in advance!
>>> Marina
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> _______________________________________________
>>> Dmtcp-forum mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>>>
>>
>>
>

------------------------------------------------------------------------------
_______________________________________________
Dmtcp-forum mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to