Bear with me, this example might be a bit long winded. I have a situation
where running with the trunk is resulting in a failure during checkpointing.
It appears to be due to something with sub-processes not being in the same
directory as the launching process. When I run with 1.2.6 checkpointing
works, but it fails during restore. High-level flow looks like this:
Works with both checkpoint and restore both 1.2.6 and trunk:
DMTCP run.csh --> run_sleep.csh --> sleep_ckpt
Fails either in ckpt or restore depending on DMTCP version:
DMTCP run.csh --> cd run_dir --> run_sleep.csh --> ../sleep_ckpt
Failure signature during restore (1.2.6). It looks like we end up running the
program fresh:
[28002] WARNING at connection.cpp:1160 in restore; REASON='JWARNING(false)
failed'
Message: Size of file smaller than what we expected
[28002] WARNING at connection.cpp:1183 in restore; REASON='JWARNING(false)
failed'
_path = <my_path_removed>/run_sleep.csh
_offset = 26
_stat.st_size = 26
buf.st_size = 25
Message: No lseek done: offset is larger than min of old and new size.
Failure signature during checkpoint (trunk):
[40000] ERROR at fileconnection.cpp:522 in handleUnlinkedFile;
REASON='JASSERT(_type == FILE_DELETED) failed'
_path = ./run.csh
currPath = <my_path_removed>/run.csh
Message: File not found on disk and yet the filename doesn't contain the suffix
'(deleted)'
How to build sleep_ckpt.c:
setenv DMTCP_INSTALLATION <Your DMTCP here>
gcc sleep_ckpt.c -I$DMTCP_INSTALLATION/include \
-L$DMTCP_INSTALLATION/lib -ldmtcpaware \
-Xlinker -rpath -Xlinker $DMTCP_INSTALLATION/lib -g -o sleep_ckpt
How to checkpoint:
Run program using run.csh, keeping the cd into run_dir line
Checkpoint by doing "kill -s ALRM <pid of sleep_ckpt>"
Sources:
/*** sleep_ckpt.c ***/
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <dmtcpaware.h>
#define SLEEP_SEC 10
static int ckpt_requested = 0;
void ckpt_handler(int signum) {
ckpt_requested = 1;
}
int main(int argc, char *argv[]) {
int i;
/* Setup signal handler for doing a checkpoint */
signal(SIGALRM, ckpt_handler);
printf("Sleeping for %u seconds\n", SLEEP_SEC);
for (i = 0; i < SLEEP_SEC; i++) {
sleep(1);
if (ckpt_requested && dmtcpIsEnabled()) {
printf("Checkpointing at %u seconds\n", i);
dmtcpCheckpoint();
ckpt_requested = 0;
}
}
}
/*** run.csh ***/
#!/bin/csh
# This cd, when there causes problems, using script in local directory is fine
cd ./run_dir
./run_sleep.csh
/*** run_sleep.csh ***/
#!/bin/csh
./sleep_ckpt
/*** run_dir/run_sleep.csh ***/
#!/bin/csh
../sleep_ckpt
Joshua Louie
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Dmtcp-forum mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum