Hi everyone:
I have a node (intel i5) with 4 cores with:
Debian jessie amd64
OpenMPI 1.6.5
DMTCP: 2.3.1
NAS benchmarks
My first try is using one node (four processes):
I started the coordinator in one terminal:
hpcpro@m112a:~/NPB3.3/NPB3.3-MPI/bin$dmtcp_coordinator
In another terminal I launch the program:
hpcpro@m112a:~/NPB3.3/NPB3.3-MPI/bin$dmtcp_launch mpirun -np 4 lu.A.4
In another terminal I call the checkpoint:
hpcpro@m112a:~/NPB3.3/NPB3.3-MPI/bin$ dmtcp_command --checkpoint
Call the restart script, where it hangs out:
hpcpro@m112a:~/NPB3.3/NPB3.3-MPI/bin$ ./dmtcp_restart_script.sh
[1057] mtcp_restart.c:1303 open_shared_file:
unable to create file
/tmp/openmpi-sessions-hpcpro@m112a_0/16803/1/shared_mem_pool.m112a
[1058] mtcp_restart.c:1303 open_shared_file:
unable to create file
/tmp/openmpi-sessions-hpcpro@m112a_0/16803/1/shared_mem_pool.m112a
[1060] mtcp_restart.c:1303 open_shared_file:
unable to create file
/tmp/openmpi-sessions-hpcpro@m112a_0/16803/1/shared_mem_pool.m112a
[1059] mtcp_restart.c:1303 open_shared_file:
unable to create file
/tmp/openmpi-sessions-hpcpro@m112a_0/16803/1/shared_mem_pool.m112a
While the coordinator window show this:
[964] NOTE at dmtcp_coordinator.cpp:1096 in
validateRestartingWorkerProcess; REASON='FIRST dmtcp_restart
connection. Set numPeers. Generate timestamp'
numPeers = 5
curTimeStamp = 22631250933
compId = 1310c956110-60000-544ed77d
[964] NOTE at dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker connected'
hello_remote.from = 1310c956110-60000-544ed77d
[964] NOTE at dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker connected'
hello_remote.from = 1310c956110-61000-544ed77d
[964] NOTE at dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker connected'
hello_remote.from = 1310c956110-62000-544ed77d
[964] NOTE at dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker connected'
hello_remote.from = 1310c956110-63000-544ed77d
[964] NOTE at dmtcp_coordinator.cpp:1040 in onConnect; REASON='worker connected'
hello_remote.from = 1310c956110-64000-544ed77d
[964] NOTE at dmtcp_coordinator.cpp:875 in onDisconnect;
REASON='client disconnected'
client->identity() = 1310c956110-63000-544ed77d
[964] NOTE at dmtcp_coordinator.cpp:875 in onDisconnect;
REASON='client disconnected'
client->identity() = 1310c956110-62000-544ed77d
[964] NOTE at dmtcp_coordinator.cpp:875 in onDisconnect;
REASON='client disconnected'
client->identity() = 1310c956110-64000-544ed77d
[964] NOTE at dmtcp_coordinator.cpp:875 in onDisconnect;
REASON='client disconnected'
client->identity() = 1310c956110-61000-544ed77d
l
Client List:
#, PROG[virtPID:realPID]@HOST, DMTCP-UNIQUEPID, STATE
41, orterun[60000:1405]@m112a, 1310c956110-60000-544ed77d, CHECKPOINTED
I was looking in this foro and internet about this error but can't get
any luck. Any help will be very appreciated!
Regards,
Marina
------------------------------------------------------------------------------
_______________________________________________
Dmtcp-forum mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum