The branch, v4-21-test has been updated
via 7f1fc08c428 ctdb-daemon: Modernise some DEBUGs
via 3a16697b9b2 ctdb-daemon: Add configuration option shutdown extra
timeout
via ffe9e620cc9 ctdb-daemon: Run "startipreallocate" event in SHUTDOWN
runstate
via dbb008703b6 ctdb-daemon: Add configuration option shutdown failover
timeout
via e7e4b44f372 ctdb-daemon: Add failover on shutdown
via 72b32a4ee76 ctdb-protocol: Add CTDB server SRVID range
via 1e773a73529 ctdb-daemon: Avoid aborting during early shutdown
from 84d23c82272 vfs_ceph_snapshots: Always calculate absolute snapshot
path
https://git.samba.org/?p=samba.git;a=shortlog;h=v4-21-test
- Log -----------------------------------------------------------------
commit 7f1fc08c428ba64031cf7afd21478fc1664756b8
Author: Martin Schwenke <[email protected]>
Date: Mon May 19 10:06:21 2025 +1000
ctdb-daemon: Modernise some DEBUGs
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15858
Signed-off-by: Martin Schwenke <[email protected]>
Reviewed-by: Amitay Isaacs <[email protected]>
Autobuild-User(master): Martin Schwenke <[email protected]>
Autobuild-Date(master): Thu May 29 10:57:35 UTC 2025 on atb-devel-224
(cherry picked from commit 5a582bddd834fffe2b27cc8b2e9468fa84dfc6f2)
Autobuild-User(v4-21-test): Jule Anger <[email protected]>
Autobuild-Date(v4-21-test): Mon Jun 2 12:44:29 UTC 2025 on atb-devel-224
commit 3a16697b9b23f962869eacbff128d68833d537d9
Author: Martin Schwenke <[email protected]>
Date: Mon May 19 09:06:38 2025 +1000
ctdb-daemon: Add configuration option shutdown extra timeout
See documentation change for details.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15858
Signed-off-by: Martin Schwenke <[email protected]>
Reviewed-by: Amitay Isaacs <[email protected]>
(cherry picked from commit 3a770c8d46934870f42059640b0aaa0c76a3f4fb)
commit ffe9e620cc9cd9b8bb9fb790e4a1f578dd0d309d
Author: Martin Schwenke <[email protected]>
Date: Thu May 15 14:01:16 2025 +1000
ctdb-daemon: Run "startipreallocate" event in SHUTDOWN runstate
Even though all nodes may be shutting down there is still a very small
window for a race when multiple nodes are shut down. For simplicity,
assume 2 nodes. Assume the shutdowns of nodes are staggered, which is
usual because they're usually initiated by a loop (e.g. onnode -p all
ctdb shutdown). Although commands can continue in parallel, some
commands are started later than others.
Consider this sequence:
1. Node 0 reaches ctdb_shutdown_takeover() in
ctdb_shutdown_sequence() and a takeover run starts
2. Node 1 has not yet set its runlevel to SHUTDOWN in
ctdb_shutdown_sequence()
3. The leader node asks node 1 which IPs it can host
4. Node 1 replies "all of them"
5. Node 1 now sets its runlevel to SHUTDOWN in
ctdb_shutdown_sequence()
6. The leader node continues with the takeover run, first asking all
nodes to run "startipreallocate"
7. Node 0 runs "startipreallocate", so its NFS server starts grace
8. Node 1 does not run "startipreallocate" because it is not in
RUNNING runstate, so its NFS server does not start grace
9. The leader node continues with the takeover run, first asking all
nodes to run "releaseip" for IPs they can no longer hold
10. Node 0 releases all IPs, since it is SHUTDOWN runstate (so can't
host IPs)
11. As part of this, the NFS server on node 0 releases locks held
against IPs it is releasing
12. A client connected to node 1, where the NFS server is not in
grace, takes ("steals") one of those locks
This client is then permitted to reclaim the lock when nodes are
restarted.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15858
Signed-off-by: Martin Schwenke <[email protected]>
Reviewed-by: Amitay Isaacs <[email protected]>
(cherry picked from commit 4877541cfd8f782f516f6471edc52629720963fb)
commit dbb008703b6d18f615be220fb87060cb603565fc
Author: Martin Schwenke <[email protected]>
Date: Mon May 12 12:00:28 2025 +1000
ctdb-daemon: Add configuration option shutdown failover timeout
Allows the timeout for failover during shutdown to be modified.
Defaults to 10s.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15858
Signed-off-by: Martin Schwenke <[email protected]>
SQ
Reviewed-by: Amitay Isaacs <[email protected]>
(cherry picked from commit dd9b73119afd3a0c60c87c938b5aefc766ca78d2)
commit e7e4b44f3726f7ee0a81cc6ccc655890259906d3
Author: Martin Schwenke <[email protected]>
Date: Mon May 12 11:33:19 2025 +1000
ctdb-daemon: Add failover on shutdown
Without this, NFS servers on other nodes will not go into grace before
this node releases locks. This should also support improved behaviour
for SMB durable file handles.
The timeout is currently a constant 10s. However, it will
subsequently be switched to an option.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15858
Signed-off-by: Martin Schwenke <[email protected]>
Reviewed-by: Amitay Isaacs <[email protected]>
(cherry picked from commit b84fbd7b3fedc998633400981ce0c5dc963d052e)
commit 72b32a4ee764e17e4db5156e37070cfd65f27f34
Author: Martin Schwenke <[email protected]>
Date: Wed May 14 16:55:51 2025 +1000
ctdb-protocol: Add CTDB server SRVID range
Normally, communication from other components to ctdbd is done via
controls. However, there are contexts where receiving SRVID messages
in ctdbd makes sense, such as replies to outgoing SRVID messages.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15858
Signed-off-by: Martin Schwenke <[email protected]>
Reviewed-by: Amitay Isaacs <[email protected]>
(cherry picked from commit 631d1d38ad10c73aa559561bea6b5ed45c2226c4)
commit 1e773a73529ab14defa1c9862758e1300e38850e
Author: Martin Schwenke <[email protected]>
Date: Wed May 21 22:17:42 2025 +1000
ctdb-daemon: Avoid aborting during early shutdown
An early shutdown can put ctdbd into SHUTDOWN runstate before ctdbd
has completed all early initialisation. Some of the start-time
transitions then attempt to set the runstate to FIRST_RECOVERY or
RUNNING, which would make the runstate go backwards, so ctdbd aborts.
Upcoming changes cause ctdbd shutdown to take longer, so the problem
will become more likely. With those changes, this can be
unreliably (50% of the time?) triggered by:
ctdb/tests/INTEGRATION/simple/cluster.091.version_check.sh
since it does an early shutdown due to a version mismatch.
Avoid this by noticing when the runstate is SHUTDOWN and refusing to
continue with subsequent early initialisation steps, which aren't
needed when shutting down.
Earlier runstate transitions do not seems likely to cause an abort
during early shutdown. The following:
./tests/local_daemons.sh foo start 0; ./tests/local_daemons.sh foo stop 0
sees ctdbd already into FIRST_RECOVERY before the shutdown is
processed.
The change to ctdb_run_startup() probably isn't strictly necessary.
There will be no abort in this case. ctdb_shutdown_sequence() will
always run the "shutdown" event and then stop the event daemon, so it
doesn't seem possible that services could be left running. However,
we might as well avoid running the "startup" event when shutting down,
even if only to avoid confusing logs.
Ultimately, it seems like some redesign would be needed to avoid this
in a more predictable manner, rather than responding when an early
initialisation step inconveniently completes during shutdown. For
example, hanging a lot of the start-time event handling off a common
talloc context, could allow it to be cancelled with a single
TALLOC_FREE(). However, a change like that would involve a lot of
analysis to ensure that the talloc hierarchy is correct and there is
no change of free'd pointers being dereferenced. So, we're probably
better off just keeping this issue in mind during a broader redesign.
This workaround appears to be sufficient.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15858
Signed-off-by: Martin Schwenke <[email protected]>
Reviewed-by: Amitay Isaacs <[email protected]>
(cherry picked from commit c03e6b9d50cac67fe33dc6b120996d1915331be6)
-----------------------------------------------------------------------
Summary of changes:
ctdb/conf/ctdb_config.c | 8 ++
ctdb/conf/ctdb_config.h | 2 +
ctdb/conf/failover_conf.c | 12 ++
ctdb/conf/failover_conf.h | 3 +
ctdb/doc/ctdb.conf.5.xml | 50 +++++++
ctdb/protocol/protocol.h | 7 +
ctdb/server/ctdb_daemon.c | 229 ++++++++++++++++++++++++++++++-
ctdb/server/ctdb_monitor.c | 18 +++
ctdb/server/ctdb_takeover.c | 5 +-
ctdb/tests/UNIT/cunit/config_test_001.sh | 2 +
10 files changed, 331 insertions(+), 5 deletions(-)
Changeset truncated at 500 lines:
diff --git a/ctdb/conf/ctdb_config.c b/ctdb/conf/ctdb_config.c
index e3e8cce8d6b..27623a8972a 100644
--- a/ctdb/conf/ctdb_config.c
+++ b/ctdb/conf/ctdb_config.c
@@ -106,6 +106,14 @@ static void setup_config_pointers(struct conf_context
*conf)
FAILOVER_CONF_SECTION,
FAILOVER_CONF_DISABLED,
&ctdb_config.failover_disabled);
+ conf_assign_integer_pointer(conf,
+ FAILOVER_CONF_SECTION,
+ FAILOVER_CONF_SHUTDOWN_EXTRA_TIMEOUT,
+ &ctdb_config.shutdown_extra_timeout);
+ conf_assign_integer_pointer(conf,
+ FAILOVER_CONF_SECTION,
+ FAILOVER_CONF_SHUTDOWN_FAILOVER_TIMEOUT,
+ &ctdb_config.shutdown_failover_timeout);
/*
* Legacy
diff --git a/ctdb/conf/ctdb_config.h b/ctdb/conf/ctdb_config.h
index 7b588c3cd59..656a99e36bc 100644
--- a/ctdb/conf/ctdb_config.h
+++ b/ctdb/conf/ctdb_config.h
@@ -43,6 +43,8 @@ struct ctdb_config {
/* Failover */
bool failover_disabled;
+ int shutdown_extra_timeout;
+ int shutdown_failover_timeout;
/* Legacy */
bool realtime_scheduling;
diff --git a/ctdb/conf/failover_conf.c b/ctdb/conf/failover_conf.c
index 3f9f749fcae..424021b7a22 100644
--- a/ctdb/conf/failover_conf.c
+++ b/ctdb/conf/failover_conf.c
@@ -50,4 +50,16 @@ void failover_conf_init(struct conf_context *conf)
FAILOVER_CONF_DISABLED,
false,
check_static_boolean_change);
+
+ conf_define_integer(conf,
+ FAILOVER_CONF_SECTION,
+ FAILOVER_CONF_SHUTDOWN_EXTRA_TIMEOUT,
+ 0,
+ NULL);
+
+ conf_define_integer(conf,
+ FAILOVER_CONF_SECTION,
+ FAILOVER_CONF_SHUTDOWN_FAILOVER_TIMEOUT,
+ 10,
+ NULL);
}
diff --git a/ctdb/conf/failover_conf.h b/ctdb/conf/failover_conf.h
index d7ac0ac507d..08f5fb8939c 100644
--- a/ctdb/conf/failover_conf.h
+++ b/ctdb/conf/failover_conf.h
@@ -25,6 +25,9 @@
#define FAILOVER_CONF_SECTION "failover"
#define FAILOVER_CONF_DISABLED "disabled"
+#define FAILOVER_CONF_SHUTDOWN_EXTRA_TIMEOUT "shutdown extra timeout"
+#define FAILOVER_CONF_SHUTDOWN_FAILOVER_TIMEOUT "shutdown failover timeout"
+
void failover_conf_init(struct conf_context *conf);
diff --git a/ctdb/doc/ctdb.conf.5.xml b/ctdb/doc/ctdb.conf.5.xml
index b9bf3a6d08b..5b2de2b7a07 100644
--- a/ctdb/doc/ctdb.conf.5.xml
+++ b/ctdb/doc/ctdb.conf.5.xml
@@ -454,6 +454,56 @@
</listitem>
</varlistentry>
+ <varlistentry>
+ <term>shutdown extra timeout = <parameter>TIMEOUT</parameter></term>
+ <listitem>
+ <para>
+ CTDB will wait for TIMEOUT seconds after failover
+ completes during shutdown. This can provide extra time
+ for SMB durable handles to be reclaimed. If set to 0 then
+ no extra timeout occurs.
+ </para>
+ <para>
+ This timeout only occurs if both of the following
+ conditions are true:
+ </para>
+ <itemizedlist>
+ <listitem>
+ <para>
+ shutdown failover timeout (below) is not 0
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Failover during shutdown completes and does not time out
+ </para>
+ </listitem>
+ </itemizedlist>
+ <para>
+ Default: <literal>0</literal>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>shutdown failover timeout = <parameter>TIMEOUT</parameter></term>
+ <listitem>
+ <para>
+ CTDB will wait for TIMEOUT seconds for failover to
+ complete during shutdown. This allows NFS servers on
+ other nodes to go into grace during graceful shutdown of a
+ node. Failover during shutdown also helps with SMB
+ durable handle reclaim.
+ </para>
+ <para>
+ Set this to 0 to disable explicit failover on shutdown.
+ </para>
+ <para>
+ Default: <literal>10</literal>
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</refsect1>
diff --git a/ctdb/protocol/protocol.h b/ctdb/protocol/protocol.h
index c775c4bcc64..ecec0a45891 100644
--- a/ctdb/protocol/protocol.h
+++ b/ctdb/protocol/protocol.h
@@ -234,6 +234,13 @@ struct ctdb_call {
#define CTDB_SRVID_TEST_RANGE 0xAE00000000000000LL
+/* Range of ports reserved for CTDB server (top 8 bits)
+ * All ports matching the 8 top bits are reserved for exclusive use by
+ * the CTDB server
+ */
+#define CTDB_SRVID_SERVER_RANGE 0x9E00000000000000LL
+
+
enum ctdb_controls {CTDB_CONTROL_PROCESS_EXISTS = 0,
CTDB_CONTROL_STATISTICS = 1,
/* #2 removed */
diff --git a/ctdb/server/ctdb_daemon.c b/ctdb/server/ctdb_daemon.c
index 97dfc80ffd1..25e742961bf 100644
--- a/ctdb/server/ctdb_daemon.c
+++ b/ctdb/server/ctdb_daemon.c
@@ -23,6 +23,7 @@
#include "system/wait.h"
#include "system/time.h"
+#include <errno.h>
#include <talloc.h>
/* Allow use of deprecated function tevent_loop_allow_nesting() */
#define TEVENT_DEPRECATED
@@ -41,6 +42,7 @@
#include "ctdb_client.h"
#include "protocol/protocol.h"
+#include "protocol/protocol_basic.h"
#include "protocol/protocol_api.h"
#include "common/rb_tree.h"
@@ -50,7 +52,9 @@
#include "common/logging.h"
#include "common/pidfile.h"
#include "common/sock_io.h"
+#include "common/srvid.h"
+#include "conf/ctdb_config.h"
#include "conf/node.h"
struct ctdb_client_pid_list {
@@ -2219,15 +2223,234 @@ done:
return ret;
}
+/*
+ * Construct a SRVID for accepting replies to this ctdbd. The bottom
+ * 24 bits of the PNN are used in the top half. extra_mask is used in
+ * the bottom half.
+ */
+
+static uint64_t ctdb_srvid_id(struct ctdb_context *ctdb, uint32_t extra_mask)
+{
+ uint64_t pnn_mask = (uint64_t)(ctdb->pnn & 0xFFFFFF) << 32;
+
+ return CTDB_SRVID_SERVER_RANGE | pnn_mask | extra_mask;
+}
+
+/*
+ * Do a takeover run on shutdown
+ *
+ * This allows for a graceful transition of resources to another node.
+ * This ensures all nodes go into grace for NFS and, with an extra
+ * timeout, allows data transfer for SMB durable handles.
+ *
+ * Nodes need to be in CTDB_RUNSTATE_RUNNING to host public IP
+ * addresses. So, this node will release all IPs. The good news is
+ * that a node can remain leader when in CTDB_RUNSTATE_SHUTDOWN, so
+ * shutting down the cluster will not be adversely delayed by this.
+ * The only issue to guard against is delaying shutdown of this node
+ * if it is the only node and doesn't have CTDB_CAP_RECMASTER, in
+ * which case there is no node to do the takeover run. Hence, the
+ * timeout.
+ */
+
+struct shutdown_takeover_state {
+ bool takeover_done;
+ bool timed_out;
+ struct tevent_timer *te;
+ unsigned int leader_broadcast_count;
+};
+
+static void shutdown_takeover_handler(uint64_t srvid,
+ TDB_DATA data,
+ void *private_data)
+{
+ struct shutdown_takeover_state *state = private_data;
+ int32_t result = 0;
+ size_t count = 0;
+ int ret = 0;
+
+ ret = ctdb_int32_pull(data.dptr, data.dsize, &result, &count);
+ if (ret == EMSGSIZE) {
+ /*
+ * Can't happen unless there's bug somewhere else, so
+ * just ignore - ctdb_shutdown_takeover() will
+ * probably time out...
+ */
+ DBG_WARNING("Wrong size for result\n");
+ return;
+ }
+
+ if (result == -1) {
+ /*
+ * No early return - can't afford endless retries
+ * during shutdown...
+ */
+ DBG_WARNING("Takeover run failed\n");
+ } else {
+ DBG_NOTICE("Takeover run successful by node=%"PRIi32"\n",
+ result);
+ }
+
+ state->takeover_done = true;
+}
+
+static void shutdown_timeout_handler(struct tevent_context *ev,
+ struct tevent_timer *te,
+ struct timeval yt,
+ void *private_data)
+{
+ struct shutdown_takeover_state *state = private_data;
+
+ TALLOC_FREE(state->te);
+ state->timed_out = true;
+}
+
+static void shutdown_leader_handler(uint64_t srvid,
+ TDB_DATA data,
+ void *private_data)
+{
+ struct shutdown_takeover_state *state = private_data;
+ uint32_t pnn = 0;
+ size_t count = 0;
+ int ret = 0;
+
+ ret = ctdb_uint32_pull(data.dptr, data.dsize, &pnn, &count);
+ if (ret == EMSGSIZE) {
+ /*
+ * Can't happen unless there's bug somewhere else, so
+ * just ignore
+ */
+ DBG_WARNING("Wrong size for result\n");
+ return;
+ }
+
+ DBG_DEBUG("Leader broadcast received from node=%"PRIu32"\n", pnn);
+ state->leader_broadcast_count++;
+}
+
+static void ctdb_shutdown_takeover(struct ctdb_context *ctdb)
+{
+ struct shutdown_takeover_state state = {
+ .takeover_done = false,
+ .timed_out = false,
+ .te = NULL,
+ .leader_broadcast_count = 0,
+ };
+ /*
+ * This one is memcpy()ed onto the wire, so initialise below
+ * after ZERO_STRUCT(), to keep things valgrind clean
+ */
+ struct ctdb_srvid_message rd;
+ struct TDB_DATA rddata = {
+ .dptr = (uint8_t *)&rd,
+ .dsize = sizeof(rd),
+ };
+ int ret = 0;
+
+ if (ctdb_config.shutdown_failover_timeout <= 0) {
+ return;
+ }
+
+ ZERO_STRUCT(rd);
+ rd = (struct ctdb_srvid_message) {
+ .pnn = ctdb->pnn,
+ .srvid = ctdb_srvid_id(ctdb, 0),
+ };
+
+ ret = srvid_register(ctdb->srv,
+ ctdb->srv,
+ rd.srvid,
+ shutdown_takeover_handler,
+ &state);
+ if (ret != 0) {
+ DBG_WARNING("Failed to register takeover run handler\n");
+ return;
+ }
+
+ state.te = tevent_add_timer(
+ ctdb->ev,
+ ctdb->srv,
+ timeval_current_ofs(ctdb_config.shutdown_failover_timeout, 0),
+ shutdown_timeout_handler,
+ &state);
+ if (state.te == NULL) {
+ DBG_WARNING("Failed to set shutdown timeout\n");
+ goto done;
+ }
+
+ ret = srvid_register(ctdb->srv,
+ ctdb->srv,
+ CTDB_SRVID_LEADER,
+ shutdown_leader_handler,
+ &state);
+ if (ret != 0) {
+ /* Leader broadcasts provide extra information, so no
+ * problem if they can't be monitored...
+ */
+ DBG_WARNING("Failed to register leader handler\n");
+ }
+
+ ret = ctdb_daemon_send_message(ctdb,
+ CTDB_BROADCAST_CONNECTED,
+ CTDB_SRVID_TAKEOVER_RUN,
+ rddata);
+ if (ret != 0) {
+ DBG_WARNING("Failed to send IP takeover run request\n");
+ goto done;
+ }
+
+ while (!state.takeover_done && !state.timed_out) {
+ tevent_loop_once(ctdb->ev);
+ }
+
+ if (state.takeover_done) {
+ goto done;
+ }
+
+ if (state.timed_out) {
+ DBG_WARNING("Timed out waiting for takeover run "
+ "(%u leader broadcasts received)\n",
+ state.leader_broadcast_count);
+ }
+done:
+ srvid_deregister(ctdb->srv, CTDB_SRVID_TAKEOVER_RUN, &state);
+ srvid_deregister(ctdb->srv, CTDB_SRVID_LEADER, &state);
+ TALLOC_FREE(state.te);
+
+ if (!state.takeover_done || ctdb_config.shutdown_extra_timeout <= 0) {
+ return;
+ }
+
+ state.timed_out = false;
+ state.te = tevent_add_timer(
+ ctdb->ev,
+ ctdb->srv,
+ timeval_current_ofs(ctdb_config.shutdown_extra_timeout, 0),
+ shutdown_timeout_handler,
+ &state);
+ if (state.te == NULL) {
+ DBG_WARNING("Failed to set extra timeout\n");
+ return;
+ }
+
+ DBG_NOTICE("Waiting %ds for shutdown extra timeout\n",
+ ctdb_config.shutdown_extra_timeout);
+ while (!state.timed_out) {
+ tevent_loop_once(ctdb->ev);
+ }
+ DBG_INFO("shutdown extra timeout complete\n");
+}
+
void ctdb_shutdown_sequence(struct ctdb_context *ctdb, int exit_code)
{
if (ctdb->runstate == CTDB_RUNSTATE_SHUTDOWN) {
- DEBUG(DEBUG_NOTICE,("Already shutting down so will not
proceed.\n"));
+ D_NOTICE("Already shutting down so will not proceed.\n");
return;
}
- DEBUG(DEBUG_ERR,("Shutdown sequence commencing.\n"));
+ D_ERR("Shutdown sequence commencing.\n");
ctdb_set_runstate(ctdb, CTDB_RUNSTATE_SHUTDOWN);
+ ctdb_shutdown_takeover(ctdb);
ctdb_stop_recoverd(ctdb);
ctdb_stop_keepalive(ctdb);
ctdb_stop_monitoring(ctdb);
@@ -2237,7 +2460,7 @@ void ctdb_shutdown_sequence(struct ctdb_context *ctdb,
int exit_code)
ctdb->methods->shutdown(ctdb);
}
- DEBUG(DEBUG_ERR,("Shutdown sequence complete, exiting.\n"));
+ D_ERR("Shutdown sequence complete, exiting.\n");
exit(exit_code);
}
diff --git a/ctdb/server/ctdb_monitor.c b/ctdb/server/ctdb_monitor.c
index ab58ec485fe..869a589e6e5 100644
--- a/ctdb/server/ctdb_monitor.c
+++ b/ctdb/server/ctdb_monitor.c
@@ -217,6 +217,11 @@ static void ctdb_run_startup(struct tevent_context *ev,
*/
static void ctdb_startup_callback(struct ctdb_context *ctdb, int status, void
*p)
{
+ if (ctdb->runstate == CTDB_RUNSTATE_SHUTDOWN) {
+ DBG_WARNING("Detected early shutdown, not starting
monitoring\n");
+ return;
+ }
+
if (status != 0) {
DEBUG(DEBUG_ERR,("startup event failed\n"));
tevent_add_timer(ctdb->ev, ctdb->monitor->monitor_context,
@@ -249,6 +254,12 @@ static void ctdb_run_startup(struct tevent_context *ev,
struct ctdb_context);
int ret;
+ if (ctdb->runstate == CTDB_RUNSTATE_SHUTDOWN) {
+ DBG_WARNING(
+ "Detected early shutdown, not running startup event\n");
+ return;
+ }
+
/* This is necessary to avoid the "startup" event colliding
* with the "ipreallocated" event from the takeover run
* following the first recovery. We might as well serialise
@@ -432,6 +443,13 @@ void ctdb_stop_monitoring(struct ctdb_context *ctdb)
*/
void ctdb_wait_for_first_recovery(struct ctdb_context *ctdb)
{
+ if (ctdb->runstate == CTDB_RUNSTATE_SHUTDOWN) {
+ DBG_WARNING(
+ "Detected early shutdown, "
+ "not waiting for first recovery\n");
+ return;
+ }
+
ctdb_set_runstate(ctdb, CTDB_RUNSTATE_FIRST_RECOVERY);
ctdb->monitor = talloc(ctdb, struct ctdb_monitor_state);
diff --git a/ctdb/server/ctdb_takeover.c b/ctdb/server/ctdb_takeover.c
index ad543452e62..b9196e3ff63 100644
--- a/ctdb/server/ctdb_takeover.c
+++ b/ctdb/server/ctdb_takeover.c
@@ -2510,8 +2510,9 @@ int32_t ctdb_control_start_ipreallocate(struct
ctdb_context *ctdb,
struct start_ipreallocate_callback_state *state;
/* Nodes that are not RUNNING can not host IPs */
- if (ctdb->runstate != CTDB_RUNSTATE_RUNNING) {
- DBG_INFO("Skipping \"startipreallocate\" event, not RUNNING\n");
+ if (ctdb->runstate < CTDB_RUNSTATE_RUNNING) {
+ DBG_INFO("Skipping \"startipreallocate\" event, "
+ "not RUNNING/SHUTDOWN\n");
return 0;
}
diff --git a/ctdb/tests/UNIT/cunit/config_test_001.sh
b/ctdb/tests/UNIT/cunit/config_test_001.sh
index 70bf77f7939..b4d784c65ae 100755
--- a/ctdb/tests/UNIT/cunit/config_test_001.sh
+++ b/ctdb/tests/UNIT/cunit/config_test_001.sh
@@ -48,6 +48,8 @@ ok <<EOF
# debug script =
[failover]
# disabled = false
+ # shutdown extra timeout = 0
+ # shutdown failover timeout = 10
[legacy]
# realtime scheduling = true
# lmaster capability = true
--
Samba Shared Repository