[SCM] CTDB repository - branch 1.0.112 updated - ctdb-1.0.111-142-g16a5cad

Ronnie Sahlberg Thu, 02 Sep 2010 19:01:07 -0700

The branch, 1.0.112 has been updated
       via  16a5cad37fa9093beb3ab5e4c24bbd61056c89f8 (commit)
       via  35b719c8e2d97ec7014401a132937a01a1f2da7f (commit)
      from  d0c57b915d225bcf4c924ff57df7abb99b3ebfd1 (commit)


http://gitweb.samba.org/?p=sahlberg/ctdb.git;a=shortlog;h=1.0.112


- Log -----------------------------------------------------------------
commit 16a5cad37fa9093beb3ab5e4c24bbd61056c89f8
Author: Ronnie Sahlberg <ronniesahlb...@gmail.com>
Date:   Fri Sep 3 11:58:27 2010 +1000

    When memory allocations for recovery fails,
    dont dereference a null pointer while trying to print the log message for 
the failure.
    
    also shutdown ctdb with ctdb_fatal()

commit 35b719c8e2d97ec7014401a132937a01a1f2da7f
Author: Rusty Russell <ru...@rustcorp.com.au>
Date:   Thu Sep 2 12:44:21 2010 +0930

    eventscript: make sure we die when we timeout.
    
    Volker noticed that system() can hang on a futex: we do this inside a
    signal handler simply to dump extra diagnostics when we timeout, which is
    very questionable but usually works.
    
    Add a timeout of 90 seconds: after that, commit suicide.
    (This is a workaround for this branch: master does this correctly).
    
    Signed-off-by: Rusty Russell <ru...@rustcorp.com.au>

-----------------------------------------------------------------------

Summary of changes:
 server/ctdb_recover.c |    6 ++----
 server/eventscript.c  |   13 +++++++++++++
 2 files changed, 15 insertions(+), 4 deletions(-)


Changeset truncated at 500 lines:

diff --git a/server/ctdb_recover.c b/server/ctdb_recover.c
index f61b6e7..b48b4e7 100644
--- a/server/ctdb_recover.c
+++ b/server/ctdb_recover.c
@@ -340,10 +340,8 @@ static int traverse_pulldb(struct tdb_context *tdb, 
TDB_DATA key, TDB_DATA data,
        }
        params->pulldata = talloc_realloc_size(NULL, params->pulldata, 
rec->length + params->len);
        if (params->pulldata == NULL) {
-               DEBUG(DEBUG_ERR,(__location__ " Failed to expand pulldb_data to 
%u (%u records)\n", 
-                        rec->length + params->len, params->pulldata->count));
-               params->failed = true;
-               return -1;
+               DEBUG(DEBUG_CRIT,(__location__ " Failed to expand pulldb_data 
to %u\n", rec->length + params->len));
+               ctdb_fatal(params->ctdb, "failed to allocate memory for 
recovery. shutting down\n");
        }
        params->pulldata->count++;
        memcpy(params->len+(uint8_t *)params->pulldata, rec, rec->length);
diff --git a/server/eventscript.c b/server/eventscript.c
index c403772..37306db 100644
--- a/server/eventscript.c
+++ b/server/eventscript.c
@@ -34,6 +34,13 @@ static struct {
 
 static void ctdb_event_script_timeout(struct event_context *ev, struct 
timed_event *te, struct timeval t, void *p);
 
+static void sigalarm(int sig)
+{
+       /* all the child processes will be running in the same process group */
+       kill(-getpgrp(), SIGKILL);
+       _exit(1);
+}
+
 /*
   ctdbd sends us a SIGTERM when we should time out the current script
  */
@@ -42,6 +49,12 @@ static void sigterm(int sig)
        char tbuf[100], buf[200];
        time_t t;
 
+       /* Calling system() inside a signal handler can do strange things:
+        * it usually works, and that's enough for us: it's only for debugging.
+        * But make sure we terminate. */
+       signal(SIGTERM, sigalarm);
+       alarm(90);
+
        DEBUG(DEBUG_ERR,("Timed out running script '%s' after %.1f seconds pid 
:%d\n", 
                 child_state.script_running, 
timeval_elapsed(&child_state.start), getpid()));
 


-- 
CTDB repository

[SCM] CTDB repository - branch 1.0.112 updated - ctdb-1.0.111-142-g16a5cad

Reply via email to