The branch, 1.0.112 has been updated via 16a5cad37fa9093beb3ab5e4c24bbd61056c89f8 (commit) via 35b719c8e2d97ec7014401a132937a01a1f2da7f (commit) from d0c57b915d225bcf4c924ff57df7abb99b3ebfd1 (commit)
http://gitweb.samba.org/?p=sahlberg/ctdb.git;a=shortlog;h=1.0.112 - Log ----------------------------------------------------------------- commit 16a5cad37fa9093beb3ab5e4c24bbd61056c89f8 Author: Ronnie Sahlberg <ronniesahlb...@gmail.com> Date: Fri Sep 3 11:58:27 2010 +1000 When memory allocations for recovery fails, dont dereference a null pointer while trying to print the log message for the failure. also shutdown ctdb with ctdb_fatal() commit 35b719c8e2d97ec7014401a132937a01a1f2da7f Author: Rusty Russell <ru...@rustcorp.com.au> Date: Thu Sep 2 12:44:21 2010 +0930 eventscript: make sure we die when we timeout. Volker noticed that system() can hang on a futex: we do this inside a signal handler simply to dump extra diagnostics when we timeout, which is very questionable but usually works. Add a timeout of 90 seconds: after that, commit suicide. (This is a workaround for this branch: master does this correctly). Signed-off-by: Rusty Russell <ru...@rustcorp.com.au> ----------------------------------------------------------------------- Summary of changes: server/ctdb_recover.c | 6 ++---- server/eventscript.c | 13 +++++++++++++ 2 files changed, 15 insertions(+), 4 deletions(-) Changeset truncated at 500 lines: diff --git a/server/ctdb_recover.c b/server/ctdb_recover.c index f61b6e7..b48b4e7 100644 --- a/server/ctdb_recover.c +++ b/server/ctdb_recover.c @@ -340,10 +340,8 @@ static int traverse_pulldb(struct tdb_context *tdb, TDB_DATA key, TDB_DATA data, } params->pulldata = talloc_realloc_size(NULL, params->pulldata, rec->length + params->len); if (params->pulldata == NULL) { - DEBUG(DEBUG_ERR,(__location__ " Failed to expand pulldb_data to %u (%u records)\n", - rec->length + params->len, params->pulldata->count)); - params->failed = true; - return -1; + DEBUG(DEBUG_CRIT,(__location__ " Failed to expand pulldb_data to %u\n", rec->length + params->len)); + ctdb_fatal(params->ctdb, "failed to allocate memory for recovery. shutting down\n"); } params->pulldata->count++; memcpy(params->len+(uint8_t *)params->pulldata, rec, rec->length); diff --git a/server/eventscript.c b/server/eventscript.c index c403772..37306db 100644 --- a/server/eventscript.c +++ b/server/eventscript.c @@ -34,6 +34,13 @@ static struct { static void ctdb_event_script_timeout(struct event_context *ev, struct timed_event *te, struct timeval t, void *p); +static void sigalarm(int sig) +{ + /* all the child processes will be running in the same process group */ + kill(-getpgrp(), SIGKILL); + _exit(1); +} + /* ctdbd sends us a SIGTERM when we should time out the current script */ @@ -42,6 +49,12 @@ static void sigterm(int sig) char tbuf[100], buf[200]; time_t t; + /* Calling system() inside a signal handler can do strange things: + * it usually works, and that's enough for us: it's only for debugging. + * But make sure we terminate. */ + signal(SIGTERM, sigalarm); + alarm(90); + DEBUG(DEBUG_ERR,("Timed out running script '%s' after %.1f seconds pid :%d\n", child_state.script_running, timeval_elapsed(&child_state.start), getpid())); -- CTDB repository