The branch, 2.5 has been updated via ccddcec6258f511511abe6861fe8347360d8a066 (commit) via 034cecad7d96d521134c7d795d033d1dc418e351 (commit) via cd801651028f495eae6ac9a3135180dcfdd0ad88 (commit) via 24065effa286c2a92fbf03cea8d03c505cc0ca6f (commit) via 8a431c0513c5e6c22d76a859dc69ffb7be52aee0 (commit) via cf36ae2cf51152c75ac191fb0597827219d08133 (commit) via d31b80e2dd2c9c0ad319f4c3bdba25a66777b287 (commit) via 96271120c02131f97ce9c3033606b8e45962b1c1 (commit) via f97221d8d3c2c3e18a84494f8b7b132e1af3507c (commit) via 4581d2d7fa37669db3e3ce72ae1152db8ee33264 (commit) via 9924218dc6a7f1cb184db9953d839ab5d9124ca2 (commit) via e4a4f2656b86ba271451375d229eeaac6a598f93 (commit) via 22459b3c9d4d56b5a43a9bb1af724b4c52b7487e (commit) via 8f81f4d1b6ac2e7fa3e47937fe7af24cab29bd09 (commit) via 6b28c24d30b85dcb860f7d101249020ae4994419 (commit) via 9f98a42f01cdaa262e186eac093f7aa06766c88c (commit) via f87d186a40d50dd9fe97aadf81414c9327ad8cf4 (commit) from 2c5e86fab6b1d178a717a5f023e3ba346342f94e (commit)
https://git.samba.org/?p=ctdb.git;a=shortlog;h=2.5 - Log ----------------------------------------------------------------- commit ccddcec6258f511511abe6861fe8347360d8a066 Author: Amitay Isaacs <ami...@gmail.com> Date: Mon Feb 23 12:38:11 2015 +1100 io: Do not use sys_write to write to client sockets When sending messages to clients, ctdb checks for EAGAIN error code and schedules next write in the subsequent event loop. Using sys_write in these places causes ctdb to loop hard till a client is able to read from the socket. With real time scheduling, ctdb daemon spins consuming 100% of CPU trying to write to the client sockets. This can be quite harmful when running under VMs or machines with single CPU. This regression was introduced when all read/write calls were replaced to use sys_read/sys_write wrappers (c1558adeaa980fb4bd6177d36250ec8262e9b9fe). The existing code backs off in case of EAGAIN failures and waits for an event loop to process the write again. This should give ctdb clients a chance to get scheduled and to process the ctdb socket. Signed-off-by: Amitay Isaacs <ami...@gmail.com> Reviewed-by: Martin Schwenke <mar...@meltin.net> Autobuild-User(master): Martin Schwenke <mart...@samba.org> Autobuild-Date(master): Tue Feb 24 12:29:30 CET 2015 on sn-devel-104 (Imported from commit 04a061e4d19d5bdbd8179fb0fab8b0875eec243e) commit 034cecad7d96d521134c7d795d033d1dc418e351 Author: Martin Schwenke <mar...@meltin.net> Date: Sat Feb 14 12:53:08 2015 +1100 scripts: Improve messages about invalid tunables during "setup" Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> Autobuild-User(master): Amitay Isaacs <ami...@samba.org> Autobuild-Date(master): Wed Feb 18 08:03:33 CET 2015 on sn-devel-104 (Imported from commit dc32f11b871a7d4e8ea6fd1d01491d89103decf7) commit cd801651028f495eae6ac9a3135180dcfdd0ad88 Author: Martin Schwenke <mar...@meltin.net> Date: Mon Feb 9 10:33:35 2015 +1100 tool: Print a warning when setting an obsolete tunable variable Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> (Imported from commit c3706e7fb07bcb35f7d894c4e8e0c12b4a62d0db) commit 24065effa286c2a92fbf03cea8d03c505cc0ca6f Author: Martin Schwenke <mar...@meltin.net> Date: Mon Feb 9 10:32:47 2015 +1100 client: Return a value of 1 when setting obsolete tunable variable Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> (Imported from commit 54f0c39e5a33871847aa9fe2c070c7f638f54cc4) commit 8a431c0513c5e6c22d76a859dc69ffb7be52aee0 Author: Martin Schwenke <mar...@meltin.net> Date: Sun Feb 15 14:39:51 2015 +1100 tests: New tests for 00.ctdb "setup" event - set tunables from config Unit test infrastructure tweaks to support. Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> (Imported from commit 2c7c35377e5452e37925b970253b70875a8d7470) commit cf36ae2cf51152c75ac191fb0597827219d08133 Author: Martin Schwenke <mar...@meltin.net> Date: Mon Feb 16 14:04:09 2015 +1100 scripts: Fix tunable setup code by making it shell-agnostic All tunables set in configuration are currently set to 0 on system where /bin/sh is dash (and perhaps other non-bash shells). dash puts single quotes around all values in the output of the "set" builtin command, whereas bash only puts them around values when something needs to be quoted. Tunables always have a simple integer value so dash will quote them and bash won't. The setup code currently passes the raw value, including any quotes to "ctdb setvar ...". This command does no error checking on the input, so "'1'" is converted to 0. Change the code so that the value is determined from the shell variable and is independent of the "set" output. Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> (Imported from commit 39686f45056d942de5ebe3263a533a99ca17c79e) commit d31b80e2dd2c9c0ad319f4c3bdba25a66777b287 Author: Martin Schwenke <mar...@meltin.net> Date: Tue Jan 27 12:55:42 2015 +1100 recoverd: Abort when daemon can take recovery lock during recovery Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> Autobuild-User(master): Amitay Isaacs <ami...@samba.org> Autobuild-Date(master): Fri Feb 13 09:48:15 CET 2015 on sn-devel-104 (Imported from commit 39d2fd330a60ea590d76213f8cb406a42fa8d680) commit 96271120c02131f97ce9c3033606b8e45962b1c1 Author: Martin Schwenke <mar...@meltin.net> Date: Wed Dec 17 20:33:19 2014 +1100 recoverd: Improve error messages on recovery lock coherence fail When the daemon is able to take the recovery lock during recovery we might as well guess that the cluster filesystem has a lock coherence problem and print a more useful message. This will be more helpful to those trying out cluster filesystems that don't have lock coherence or that are difficult to setup. Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> (Imported from commit 432d6774891eba30a959cd2d8ee8469d189c7872) commit f97221d8d3c2c3e18a84494f8b7b132e1af3507c Author: Martin Schwenke <mar...@meltin.net> Date: Tue Dec 9 13:51:27 2014 +1100 recoverd: Don't release and re-take the recovery lock Just continue to hold it, otherwise a broken node might win an election and grab the lock. Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> (Imported from commit 48c91407abd5e34463d3a10cb6fce47ec4a0d5f6) commit 4581d2d7fa37669db3e3ce72ae1152db8ee33264 Author: Martin Schwenke <mar...@meltin.net> Date: Tue Dec 9 14:50:38 2014 +1100 recoverd: Simplify ctdb_recovery_lock() Have it just silently take or fail to take the lock, except on an unexpected failure (where it should log an error). This means that when it is called we need to keep the old behaviour and explicitly release the lock. In do_recovery() the lock is released and a message is printed before attempting to take the lock. In the daemon sanity check the lock must be released in the error path if it is actually taken. Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> (Imported from commit 1d6ed91f5518d462ba368bca03be923428710157) commit 9924218dc6a7f1cb184db9953d839ab5d9124ca2 Author: Martin Schwenke <mar...@meltin.net> Date: Tue Dec 9 14:45:08 2014 +1100 recoverd: Remove check_recovery_lock() This has not done anything useful since commit b9d8bb23af8abefb2d967e9b4e9d6e60c4a3b520. Instead, just check that the lock is held. Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> (Imported from commit be19a17faf6da97365c425c5b423e9b74f9c9e0c) commit e4a4f2656b86ba271451375d229eeaac6a598f93 Author: Martin Schwenke <mar...@meltin.net> Date: Tue Dec 9 14:09:40 2014 +1100 recoverd: Improve logging when recovery lock file is changed Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> (Imported from commit 668ed5366237b61f0ff618f32555ce29cca5e6f3) commit 22459b3c9d4d56b5a43a9bb1af724b4c52b7487e Author: Martin Schwenke <mar...@meltin.net> Date: Tue Dec 9 14:07:20 2014 +1100 recoverd: New function ctdb_recovery_unlock() Unlock the recovery lock file. This way knowledge of the file descriptor isn't sprinkled throughout the code. Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> (Imported from commit db32a2bce54b9618fe247b33d6de81bd5f7a3b62) commit 8f81f4d1b6ac2e7fa3e47937fe7af24cab29bd09 Author: Martin Schwenke <mar...@meltin.net> Date: Tue Dec 9 13:50:22 2014 +1100 recoverd: New function ctdb_recovery_have_lock() True if this recovery daemon holds the lock. Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> (Imported from commit 72701be663ddb265320a022a22130a3437bbf6bc) commit 6b28c24d30b85dcb860f7d101249020ae4994419 Author: Martin Schwenke <mar...@meltin.net> Date: Tue Dec 9 13:49:06 2014 +1100 daemon: Log a warning when setting obsolete tunables Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> (Imported from commit 4d3b52f1cec46f66f8d0827bc8f458cd8c86b5a5) commit 9f98a42f01cdaa262e186eac093f7aa06766c88c Author: Martin Schwenke <mar...@meltin.net> Date: Tue Dec 9 13:47:42 2014 +1100 daemon: Mark tunable VerifyRecoveryLock as obsolete It is pointless having a recovery lock but not sanity checking that it is working. Also, the logic that uses this tunable is confusing. In some places the recovery lock is released unnecessarily because the tunable isn't set. Simplify the logic by assuming that if a recovery lock is specified then it should be verified. Update documentation that references this tunable. Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> (Imported from commit d110fe231849d76ecb83378c934627dc64b74c72) commit f87d186a40d50dd9fe97aadf81414c9327ad8cf4 Author: Martin Schwenke <mar...@meltin.net> Date: Tue Feb 3 14:27:11 2015 +1100 doc: Improve documentation of the recovery lock Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> (Imported from commit a01744c08ff5b8aca4af99842acfc78a87af9297) ----------------------------------------------------------------------- Summary of changes: client/ctdb_client.c | 4 +- common/ctdb_io.c | 6 +- config/events.d/00.ctdb | 20 +- doc/ctdb-tunables.7.xml | 9 - doc/ctdb.1.xml | 34 ++- doc/ctdb.7.xml | 54 ++++ doc/ctdbd.1.xml | 16 +- doc/ctdbd.conf.5.xml | 6 + include/ctdb_private.h | 4 +- server/ctdb_recover.c | 90 +++---- server/ctdb_recoverd.c | 272 ++++----------------- server/ctdb_server.c | 1 - server/ctdb_tunables.c | 20 +- ...terface.startup.002.sh => 00.ctdb.setup.001.sh} | 2 +- tests/eventscripts/00.ctdb.setup.002.sh | 19 ++ tests/eventscripts/00.ctdb.setup.003.sh | 21 ++ tests/eventscripts/00.ctdb.setup.004.sh | 20 ++ tests/eventscripts/etc/sysconfig/ctdb | 4 + tests/eventscripts/scripts/local.sh | 11 + tests/eventscripts/stubs/ctdb | 26 ++ tools/ctdb.c | 5 + 21 files changed, 327 insertions(+), 317 deletions(-) copy tests/eventscripts/{10.interface.startup.002.sh => 00.ctdb.setup.001.sh} (64%) create mode 100755 tests/eventscripts/00.ctdb.setup.002.sh create mode 100755 tests/eventscripts/00.ctdb.setup.003.sh create mode 100755 tests/eventscripts/00.ctdb.setup.004.sh Changeset truncated at 500 lines: diff --git a/client/ctdb_client.c b/client/ctdb_client.c index 4553113..cf6835a 100644 --- a/client/ctdb_client.c +++ b/client/ctdb_client.c @@ -2735,12 +2735,12 @@ int ctdb_ctrl_set_tunable(struct ctdb_context *ctdb, ret = ctdb_control(ctdb, destnode, 0, CTDB_CONTROL_SET_TUNABLE, 0, data, NULL, NULL, &res, &timeout, NULL); talloc_free(data.dptr); - if (ret != 0 || res != 0) { + if ((ret != 0) || (res == -1)) { DEBUG(DEBUG_ERR,(__location__ " ctdb_control for set_tunable failed\n")); return -1; } - return 0; + return res; } /* diff --git a/common/ctdb_io.c b/common/ctdb_io.c index 467ec9a..53486f4 100644 --- a/common/ctdb_io.c +++ b/common/ctdb_io.c @@ -232,9 +232,9 @@ static void queue_io_write(struct ctdb_queue *queue) struct ctdb_queue_pkt *pkt = queue->out_queue; ssize_t n; if (queue->ctdb->flags & CTDB_FLAG_TORTURE) { - n = sys_write(queue->fd, pkt->data, 1); + n = write(queue->fd, pkt->data, 1); } else { - n = sys_write(queue->fd, pkt->data, pkt->length); + n = write(queue->fd, pkt->data, pkt->length); } if (n == -1 && errno != EAGAIN && errno != EWOULDBLOCK) { @@ -310,7 +310,7 @@ int ctdb_queue_send(struct ctdb_queue *queue, uint8_t *data, uint32_t length) queue overhead. This relies on non-blocking sockets */ if (queue->out_queue == NULL && queue->fd != -1 && !(queue->ctdb->flags & CTDB_FLAG_TORTURE)) { - ssize_t n = sys_write(queue->fd, data, length2); + ssize_t n = write(queue->fd, data, length2); if (n == -1 && errno != EAGAIN && errno != EWOULDBLOCK) { talloc_free(queue->fde); queue->fde = NULL; diff --git a/config/events.d/00.ctdb b/config/events.d/00.ctdb index a0f4102..ec18175 100755 --- a/config/events.d/00.ctdb +++ b/config/events.d/00.ctdb @@ -119,14 +119,19 @@ update_config_from_tdb() { fi } -set_ctdb_variables () { +set_ctdb_variables () +{ # set any tunables from the config file - set | grep ^CTDB_SET_ | cut -d_ -f3- | + set | sed -n '/^CTDB_SET_/s/=.*//p' | while read v; do - varname=`echo $v | cut -d= -f1` - value=`echo $v | cut -d= -f2` - ctdb setvar $varname $value || return 1 - echo "Set $varname to $value" + varname="${v#CTDB_SET_}" + value=$(eval echo "\$$v") + if ctdb setvar $varname $value ; then + echo "Set $varname to $value" + else + echo "Invalid configuration: CTDB_SET_${varname}=${value}" + return 1 + fi done } @@ -198,7 +203,8 @@ case "$1" in setup) # Set any tunables from the config file - set_ctdb_variables || die "Failed to set CTDB tunables" + set_ctdb_variables || \ + die "Aborting setup due to invalid configuration - fix typos, remove unknown tunables" ;; startup) diff --git a/doc/ctdb-tunables.7.xml b/doc/ctdb-tunables.7.xml index 456e856..b029fdb 100644 --- a/doc/ctdb-tunables.7.xml +++ b/doc/ctdb-tunables.7.xml @@ -448,15 +448,6 @@ </refsect2> <refsect2> - <title>VerifyRecoveryLock</title> - <para>Default: 1</para> - <para> - Should we take a fcntl() lock on the reclock file to verify that we are the - sole recovery master node on the cluster or not. - </para> - </refsect2> - - <refsect2> <title>VacuumInterval</title> <para>Default: 10</para> <para> diff --git a/doc/ctdb.1.xml b/doc/ctdb.1.xml index f680563..ccd46d6 100644 --- a/doc/ctdb.1.xml +++ b/doc/ctdb.1.xml @@ -682,7 +682,6 @@ RecdFailCount = 10 LogLatencyMs = 0 RecLockLatencyMs = 1000 RecoveryDropAllIPs = 120 -VerifyRecoveryLock = 1 VacuumInterval = 10 VacuumMaxRunTime = 30 RepackLimit = 10000 @@ -883,30 +882,51 @@ DB Statistics: locking.tdb <refsect2> <title>getreclock</title> <para> - This command is used to show the filename of the reclock file that is used. + Show the name of the recovery lock file, if any. </para> <para> Example output: </para> <screen> - Reclock file:/gpfs/.ctdb/shared + Reclock file:/clusterfs/.ctdb/recovery.lock </screen> </refsect2> <refsect2> - <title>setreclock [filename]</title> + <title> + setreclock <optional><parameter>FILE</parameter></optional> + </title> + <para> - This command is used to modify, or clear, the file that is used as the reclock file at runtime. When this command is used, the reclock file checks are disabled. To re-enable the checks the administrator needs to activate the "VerifyRecoveryLock" tunable using "ctdb setvar". + FILE specifies the name of the recovery lock file. If the + recovery lock file is changed at run-time then this will cause + a recovery, which in turn causes the recovery lock to be + retaken. </para> <para> - If run with no parameter this will remove the reclock file completely. If run with a parameter the parameter specifies the new filename to use for the recovery lock. + If no FILE is specified then a recovery lock file will no + longer be used. </para> <para> - This command only affects the runtime settings of a ctdb node and will be lost when ctdb is restarted. For persistent changes to the reclock file setting you must edit /etc/sysconfig/ctdb. + This command only affects the run-time setting of a single + CTDB node. This setting <emphasis>must</emphasis> be changed + on all nodes simultaneously by specifying <option>-n + all</option> (or similar). For information about configuring + the recovery lock file please see the + <citetitle>CTDB_RECOVERY_LOCK</citetitle> entry in + <citerefentry><refentrytitle>ctdbd.conf</refentrytitle> + <manvolnum>5</manvolnum></citerefentry> and the + <citetitle>--reclock</citetitle> entry in + <citerefentry><refentrytitle>ctdbd</refentrytitle> + <manvolnum>1</manvolnum></citerefentry>. For information + about the recovery lock please see the <citetitle>RECOVERY + LOCK</citetitle> section in + <citerefentry><refentrytitle>ctdb</refentrytitle> + <manvolnum>7</manvolnum></citerefentry>. </para> </refsect2> diff --git a/doc/ctdb.7.xml b/doc/ctdb.7.xml index b54fa42..ad17df7 100644 --- a/doc/ctdb.7.xml +++ b/doc/ctdb.7.xml @@ -76,6 +76,60 @@ </refsect1> <refsect1> + <title>Recovery Lock</title> + + <para> + CTDB uses a <emphasis>recovery lock</emphasis> to avoid a + <emphasis>split brain</emphasis>, where a cluster becomes + partitioned and each partition attempts to operate + independently. Issues that can result from a split brain + include file data corruption, because file locking metadata may + not be tracked correctly. + </para> + + <para> + CTDB uses a <emphasis>cluster leader and follower</emphasis> + model of cluster management. All nodes in a cluster elect one + node to be the leader. The leader node coordinates privileged + operations such as database recovery and IP address failover. + CTDB refers to the leader node as the <emphasis>recovery + master</emphasis>. This node takes and holds the recovery lock + to assert its privileged role in the cluster. + </para> + + <para> + The recovery lock is implemented using a file residing in shared + storage (usually) on a cluster filesystem. To support a + recovery lock the cluster filesystem must support lock + coherence. See + <citerefentry><refentrytitle>ping_pong</refentrytitle> + <manvolnum>1</manvolnum></citerefentry> for more details. + </para> + + <para> + If a cluster becomes partitioned (for example, due to a + communication failure) and a different recovery master is + elected by the nodes in each partition, then only one of these + recovery masters will be able to take the recovery lock. The + recovery master in the "losing" partition will not be able to + take the recovery lock and will be excluded from the cluster. + The nodes in the "losing" partition will elect each node in turn + as their recovery master so eventually all the nodes in that + partition will be excluded. + </para> + + <para> + CTDB does sanity checks to ensure that the recovery lock is held + as expected. + </para> + + <para> + CTDB can run without a recovery lock but this is not recommended + as there will be no protection from split brains. + </para> + </refsect1> + + <refsect1> <title>Private vs Public addresses</title> <para> diff --git a/doc/ctdbd.1.xml b/doc/ctdbd.1.xml index ab222bc..080f506 100644 --- a/doc/ctdbd.1.xml +++ b/doc/ctdbd.1.xml @@ -306,18 +306,18 @@ </varlistentry> <varlistentry> - <term>--reclock=<parameter>FILENAME</parameter></term> + <term>--reclock=<parameter>FILE</parameter></term> <listitem> <para> - FILENAME is the name of the recovery lock file stored in - <emphasis>shared storage</emphasis> that ctdbd uses to - prevent split brains from occuring. + FILE is the name of the recovery lock file, stored in + <emphasis>shared storage</emphasis>, that CTDB uses to + prevent split brains. </para> <para> - It is possible to run CTDB without a recovery lock file, but - then there will be no protection against split brain if the - cluster/network becomes partitioned. Using CTDB without a - reclock file is strongly discouraged. + For information about the recovery lock please see the + <citetitle>RECOVERY LOCK</citetitle> section in + <citerefentry><refentrytitle>ctdb</refentrytitle> + <manvolnum>7</manvolnum></citerefentry>. </para> </listitem> </varlistentry> diff --git a/doc/ctdbd.conf.5.xml b/doc/ctdbd.conf.5.xml index 803c232..a99ae9b 100644 --- a/doc/ctdbd.conf.5.xml +++ b/doc/ctdbd.conf.5.xml @@ -312,6 +312,12 @@ should be change to a useful value. Corresponds to <option>--reclock</option>. </para> + <para> + For information about the recovery lock please see the + <citetitle>RECOVERY LOCK</citetitle> section in + <citerefentry><refentrytitle>ctdb</refentrytitle> + <manvolnum>7</manvolnum></citerefentry>. + </para> </listitem> </varlistentry> diff --git a/include/ctdb_private.h b/include/ctdb_private.h index fe9250b..341bad6 100644 --- a/include/ctdb_private.h +++ b/include/ctdb_private.h @@ -1260,7 +1260,9 @@ void ctdb_release_all_ips(struct ctdb_context *ctdb); void set_nonblocking(int fd); void set_close_on_exec(int fd); -bool ctdb_recovery_lock(struct ctdb_context *ctdb, bool keep); +bool ctdb_recovery_have_lock(struct ctdb_context *ctdb); +bool ctdb_recovery_lock(struct ctdb_context *ctdb); +void ctdb_recovery_unlock(struct ctdb_context *ctdb); int ctdb_set_recovery_lock_file(struct ctdb_context *ctdb, const char *file); diff --git a/server/ctdb_recover.c b/server/ctdb_recover.c index 393b72f..50bcf25 100644 --- a/server/ctdb_recover.c +++ b/server/ctdb_recover.c @@ -527,18 +527,21 @@ static void set_recmode_handler(struct event_context *ev, struct fd_event *fde, state->te = NULL; - /* read the childs status when trying to lock the reclock file. - child wrote 0 if everything is fine and 1 if it did manage - to lock the file, which would be a problem since that means - we got a request to exit from recovery but we could still lock - the file which at this time SHOULD be locked by the recovery - daemon on the recmaster - */ + /* If, as expected, the child was unable to take the recovery + * lock then it will have written 0 into the pipe, so + * continue. However, any other value (e.g. 1) indicates that + * it was able to take the recovery lock when it should have + * been held by the recovery daemon on the recovery master. + */ ret = sys_read(state->fd[0], &c, 1); if (ret != 1 || c != 0) { - ctdb_request_control_reply(state->ctdb, state->c, NULL, -1, "managed to lock reclock file from inside daemon"); + const char *msg = \ + "Took recovery lock from daemon - probably a cluster filesystem lock coherence problem"; + ctdb_request_control_reply( + state->ctdb, state->c, NULL, -1, + msg); talloc_free(state); - return; + ctdb_die(state->ctdb, msg); } state->ctdb->recovery_mode = state->recmode; @@ -639,8 +642,8 @@ int32_t ctdb_control_set_recmode(struct ctdb_context *ctdb, ctdb_process_deferred_attach(ctdb); } - if (ctdb->tunable.verify_recovery_lock == 0) { - /* dont need to verify the reclock file */ + if (ctdb->recovery_lock_file == NULL) { + /* Not using recovery lock file */ ctdb->recovery_mode = recmode; return 0; } @@ -671,11 +674,13 @@ int32_t ctdb_control_set_recmode(struct ctdb_context *ctdb, ctdb_set_process_name("ctdb_recmode"); debug_extra = talloc_asprintf(NULL, "set_recmode:"); - /* we should not be able to get the lock on the reclock file, - as it should be held by the recovery master - */ - if (ctdb_recovery_lock(ctdb, false)) { - DEBUG(DEBUG_CRIT,("ERROR: recovery lock file %s not locked when recovering!\n", ctdb->recovery_lock_file)); + /* Daemon should not be able to get the recover lock, + * as it should be held by the recovery master */ + if (ctdb_recovery_lock(ctdb)) { + DEBUG(DEBUG_ERR, + ("ERROR: Daemon able to take recovery lock on \"%s\" during recovery\n", + ctdb->recovery_lock_file)); + ctdb_recovery_unlock(ctdb); cc = 1; } @@ -720,26 +725,25 @@ int32_t ctdb_control_set_recmode(struct ctdb_context *ctdb, } +bool ctdb_recovery_have_lock(struct ctdb_context *ctdb) +{ + return ctdb->recovery_lock_fd != -1; +} + /* try and get the recovery lock in shared storage - should only work on the recovery master recovery daemon. Anywhere else is a bug */ -bool ctdb_recovery_lock(struct ctdb_context *ctdb, bool keep) +bool ctdb_recovery_lock(struct ctdb_context *ctdb) { struct flock lock; - if (keep) { - DEBUG(DEBUG_ERR, ("Take the recovery lock\n")); - } - if (ctdb->recovery_lock_fd != -1) { - close(ctdb->recovery_lock_fd); - ctdb->recovery_lock_fd = -1; - } - - ctdb->recovery_lock_fd = open(ctdb->recovery_lock_file, O_RDWR|O_CREAT, 0600); + ctdb->recovery_lock_fd = open(ctdb->recovery_lock_file, + O_RDWR|O_CREAT, 0600); if (ctdb->recovery_lock_fd == -1) { - DEBUG(DEBUG_ERR,("ctdb_recovery_lock: Unable to open %s - (%s)\n", - ctdb->recovery_lock_file, strerror(errno))); + DEBUG(DEBUG_ERR, + ("ctdb_recovery_lock: Unable to open %s - (%s)\n", + ctdb->recovery_lock_file, strerror(errno))); return false; } @@ -755,27 +759,29 @@ bool ctdb_recovery_lock(struct ctdb_context *ctdb, bool keep) int saved_errno = errno; close(ctdb->recovery_lock_fd); ctdb->recovery_lock_fd = -1; - if (keep) { - DEBUG(DEBUG_CRIT,("ctdb_recovery_lock: Failed to get " - "recovery lock on '%s' - (%s)\n", - ctdb->recovery_lock_file, - strerror(saved_errno))); + /* Fail silently on these errors, since they indicate + * lock contention, but log an error for any other + * failure. */ + if (saved_errno != EACCES && + saved_errno != EAGAIN) { + DEBUG(DEBUG_ERR,("ctdb_recovery_lock: Failed to get " + "recovery lock on '%s' - (%s)\n", + ctdb->recovery_lock_file, + strerror(saved_errno))); } return false; } - if (!keep) { + return true; +} + +void ctdb_recovery_unlock(struct ctdb_context *ctdb) +{ + if (ctdb->recovery_lock_fd != -1) { + DEBUG(DEBUG_NOTICE, ("Releasing recovery lock\n")); close(ctdb->recovery_lock_fd); ctdb->recovery_lock_fd = -1; } - - if (keep) { - DEBUG(DEBUG_NOTICE, ("Recovery lock taken successfully\n")); - } - - DEBUG(DEBUG_NOTICE,("ctdb_recovery_lock: Got recovery lock on '%s'\n", ctdb->recovery_lock_file)); - - return true; } /* diff --git a/server/ctdb_recoverd.c b/server/ctdb_recoverd.c index 39e833c..a4daf09 100644 --- a/server/ctdb_recoverd.c +++ b/server/ctdb_recoverd.c @@ -1808,28 +1808,36 @@ static int do_recovery(struct ctdb_recoverd *rec, return -1; } - if (ctdb->tunable.verify_recovery_lock != 0) { - DEBUG(DEBUG_ERR,("Taking out recovery lock from recovery daemon\n")); - start_time = timeval_current(); - if (!ctdb_recovery_lock(ctdb, true)) { - if (ctdb->runstate == CTDB_RUNSTATE_FIRST_RECOVERY) { - /* If ctdb is trying first recovery, it's - * possible that current node does not know yet - * who the recmaster is. - */ - DEBUG(DEBUG_ERR, ("Unable to get recovery lock" - " - retrying recovery\n")); + if (ctdb->recovery_lock_file != NULL) { + if (ctdb_recovery_have_lock(ctdb)) { + DEBUG(DEBUG_NOTICE, ("Already holding recovery lock\n")); + } else { + start_time = timeval_current(); + DEBUG(DEBUG_NOTICE, ("Attempting to take recovery lock (%s)\n", + ctdb->recovery_lock_file)); + if (!ctdb_recovery_lock(ctdb)) { + if (ctdb->runstate == CTDB_RUNSTATE_FIRST_RECOVERY) { + /* If ctdb is trying first recovery, it's + * possible that current node does not know + * yet who the recmaster is. + */ + DEBUG(DEBUG_ERR, ("Unable to get recovery lock" + " - retrying recovery\n")); + return -1; + } + + DEBUG(DEBUG_ERR,("Unable to get recovery lock - aborting recovery " + "and ban ourself for %u seconds\n", + ctdb->tunable.recovery_ban_period)); + ctdb_ban_node(rec, pnn, ctdb->tunable.recovery_ban_period); return -1; } - - DEBUG(DEBUG_ERR,("Unable to get recovery lock - aborting recovery " - "and ban ourself for %u seconds\n", - ctdb->tunable.recovery_ban_period)); - ctdb_ban_node(rec, pnn, ctdb->tunable.recovery_ban_period); - return -1; + ctdb_ctrl_report_recd_lock_latency(ctdb, + CONTROL_TIMEOUT(), -- CTDB repository