Hi Kapil!

It is the same as before.

hpcpro@m112a:~/NPB3.3/NPB3.3-MPI/bin$ ~/dmtcp-trunk/bin/dmtcp_restart
ckpt_*.dmtcp
[7894] mtcp_restart.c:1310 open_shared_file:
  unable to create file
/tmp/openmpi-sessions-hpcpro@m112a_0/7859/1/shared_mem_pool.m112a: 2
[7892] mtcp_restart.c:1310 open_shared_file:
  unable to create file
/tmp/openmpi-sessions-hpcpro@m112a_0/7859/1/shared_mem_pool.m112a: 2
[7895] mtcp_restart.c:1310 open_shared_file:
  unable to create file
/tmp/openmpi-sessions-hpcpro@m112a_0/7859/1/shared_mem_pool.m112a: 2
[7893] mtcp_restart.c:1310 open_shared_file:
  unable to create file
/tmp/openmpi-sessions-hpcpro@m112a_0/7859/1/shared_mem_pool.m112a: 2


and at the coordinator:
[7832] NOTE at dmtcp_coordinator.cpp:1096 in
validateRestartingWorkerProcess; REASON='FIRST dmtcp_restart
connection.  Set numPeers. Generate timestamp'
     numPeers = 5
     curTimeStamp = 22631325571
     compId = 1310c956110-40000-544ee9ab
[7832] NOTE at dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker
connected'
     hello_remote.from = 1310c956110-40000-544ee9ab
[7832] NOTE at dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker
connected'
     hello_remote.from = 1310c956110-41000-544ee9ab
[7832] NOTE at dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker
connected'
     hello_remote.from = 1310c956110-42000-544ee9ab
[7832] NOTE at dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker
connected'
     hello_remote.from = 1310c956110-43000-544ee9ab
[7832] NOTE at dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker
connected'
     hello_remote.from = 1310c956110-44000-544ee9ab
[7832] NOTE at dmtcp_coordinator.cpp:875 in onDisconnect;
REASON='client disconnected'
     client->identity() = 1310c956110-43000-544ee9ab
[7832] NOTE at dmtcp_coordinator.cpp:875 in onDisconnect;
REASON='client disconnected'
     client->identity() = 1310c956110-41000-544ee9ab
[7832] NOTE at dmtcp_coordinator.cpp:875 in onDisconnect;
REASON='client disconnected'
     client->identity() = 1310c956110-44000-544ee9ab
[7832] NOTE at dmtcp_coordinator.cpp:875 in onDisconnect;
REASON='client disconnected'
     client->identity() = 1310c956110-42000-544ee9ab
l
Client List:
#, PROG[virtPID:realPID]@HOST, DMTCP-UNIQUEPID, STATE
11, orterun[40000:7881]@m112a, 1310c956110-40000-544ee9ab, CHECKPOINTED


On 10/27/14, Kapil Arya <[email protected]> wrote:
> Hi Marina,
>
> Could you do the following and then reproduce the error and send us the
> output:
>
>     git clone https://github.com/dmtcp/dmtcp.git dmtcp-trunk
>     cd dmtcp-trunk
>     ./configure
>     make
>
> Now use this code to run your tests.
>
> This will pull the latest trunk to allow us to diagnose the error.
>
> Kapil
>
> On Mon, Oct 27, 2014 at 8:24 PM, Marina Moran
> <[email protected]>
> wrote:
>
>> Hi everyone:
>>
>> I have a node (intel i5) with 4 cores with:
>> Debian jessie amd64
>> OpenMPI 1.6.5
>> DMTCP: 2.3.1
>> NAS benchmarks
>>
>> My first try is using one node (four processes):
>>
>> I started the coordinator in one terminal:
>>
>>     hpcpro@m112a:~/NPB3.3/NPB3.3-MPI/bin$dmtcp_coordinator
>>
>>
>> In another terminal I launch the program:
>>
>>     hpcpro@m112a:~/NPB3.3/NPB3.3-MPI/bin$dmtcp_launch mpirun -np 4 lu.A.4
>>
>>
>> In another terminal I call the checkpoint:
>>     hpcpro@m112a:~/NPB3.3/NPB3.3-MPI/bin$ dmtcp_command --checkpoint
>>
>>
>> Call the restart script, where it hangs out:
>>
>>    hpcpro@m112a:~/NPB3.3/NPB3.3-MPI/bin$ ./dmtcp_restart_script.sh
>>  [1057] mtcp_restart.c:1303 open_shared_file:
>>   unable to create file
>> /tmp/openmpi-sessions-hpcpro@m112a_0/16803/1/shared_mem_pool.m112a
>> [1058] mtcp_restart.c:1303 open_shared_file:
>>   unable to create file
>> /tmp/openmpi-sessions-hpcpro@m112a_0/16803/1/shared_mem_pool.m112a
>> [1060] mtcp_restart.c:1303 open_shared_file:
>>   unable to create file
>> /tmp/openmpi-sessions-hpcpro@m112a_0/16803/1/shared_mem_pool.m112a
>> [1059] mtcp_restart.c:1303 open_shared_file:
>>   unable to create file
>> /tmp/openmpi-sessions-hpcpro@m112a_0/16803/1/shared_mem_pool.m112a
>>
>>
>> While the coordinator window show this:
>>
>> [964] NOTE at dmtcp_coordinator.cpp:1096 in
>> validateRestartingWorkerProcess; REASON='FIRST dmtcp_restart
>> connection.  Set numPeers. Generate timestamp'
>>      numPeers = 5
>>      curTimeStamp = 22631250933
>>      compId = 1310c956110-60000-544ed77d
>> [964] NOTE at dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker
>> connected'
>>      hello_remote.from = 1310c956110-60000-544ed77d
>> [964] NOTE at dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker
>> connected'
>>      hello_remote.from = 1310c956110-61000-544ed77d
>> [964] NOTE at dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker
>> connected'
>>      hello_remote.from = 1310c956110-62000-544ed77d
>> [964] NOTE at dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker
>> connected'
>>      hello_remote.from = 1310c956110-63000-544ed77d
>> [964] NOTE at dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker
>> connected'
>>      hello_remote.from = 1310c956110-64000-544ed77d
>> [964] NOTE at dmtcp_coordinator.cpp:875 in onDisconnect;
>> REASON='client disconnected'
>>      client->identity() = 1310c956110-63000-544ed77d
>> [964] NOTE at dmtcp_coordinator.cpp:875 in onDisconnect;
>> REASON='client disconnected'
>>      client->identity() = 1310c956110-62000-544ed77d
>> [964] NOTE at dmtcp_coordinator.cpp:875 in onDisconnect;
>> REASON='client disconnected'
>>      client->identity() = 1310c956110-64000-544ed77d
>> [964] NOTE at dmtcp_coordinator.cpp:875 in onDisconnect;
>> REASON='client disconnected'
>>      client->identity() = 1310c956110-61000-544ed77d
>> l
>> Client List:
>> #, PROG[virtPID:realPID]@HOST, DMTCP-UNIQUEPID, STATE
>> 41, orterun[60000:1405]@m112a, 1310c956110-60000-544ed77d, CHECKPOINTED
>>
>>
>> I was looking in this foro and internet about this error but can't get
>> any luck. Any help will be very appreciated!
>>
>> Regards,
>> Marina
>>
>>
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> Dmtcp-forum mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>>
>

------------------------------------------------------------------------------
_______________________________________________
Dmtcp-forum mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to