The branch, master has been updated via bd4ff176387372b1c233373c0bc8ced523fc9670 (commit) via 7d4b8cce96f33fff647a0c9d259c121dfc8403e9 (commit) via c185ffd2822fcee26d07398464c59b66c61f53fa (commit) via 9550c497e6d6ef5ee44826c4bd9ed5ad65174263 (commit) via 56fcee3c7730cb12fa666072d5400949af6e5f7c (commit) via bfe16cf69bf2eee93c0d831f76d88bba0c2b96c2 (commit) via a555940fb5c914b7581667a05153256ad7d17774 (commit) via be4ad110ede9981b181ac28f31ffd855a879d5df (commit) via 7054e4ded59c6b8f254dcfefaef64da05f25aecd (commit) from c4f5a58471b206e2287c7958c7f29c1f1c0626ac (commit)
http://gitweb.samba.org/?p=ctdb.git;a=shortlog;h=master - Log ----------------------------------------------------------------- commit bd4ff176387372b1c233373c0bc8ced523fc9670 Author: Martin Schwenke <mar...@meltin.net> Date: Wed Oct 10 15:03:06 2012 +1100 tests/eventscripts: add unit tests for policy routing reconfigure Signed-off-by: Martin Schwenke <mar...@meltin.net> commit 7d4b8cce96f33fff647a0c9d259c121dfc8403e9 Author: Martin Schwenke <mar...@meltin.net> Date: Wed Oct 10 14:48:59 2012 +1100 tests/eventscripts: add extra infrastructure for policy routing tests Less copying and pasting is a good thing... Signed-off-by: Martin Schwenke <mar...@meltin.net> commit c185ffd2822fcee26d07398464c59b66c61f53fa Author: Martin Schwenke <mar...@meltin.net> Date: Fri Aug 3 10:54:30 2012 +1000 Eventscripts: Add support for "reconfigure" pseudo-event for policy routing This rebuilds all policy routes and can be used if the configuration changes. Signed-off-by: Martin Schwenke <mar...@meltin.net> commit 9550c497e6d6ef5ee44826c4bd9ed5ad65174263 Author: Martin Schwenke <mar...@meltin.net> Date: Mon Sep 24 14:32:04 2012 +1000 recoverd: Track failure of "recovered" event, banning culprits Pair-programmed-with: Amitay Isaacs <ami...@gmail.com> Signed-off-by: Martin Schwenke <mar...@meltin.net> commit 56fcee3c7730cb12fa666072d5400949af6e5f7c Author: Martin Schwenke <mar...@meltin.net> Date: Fri Aug 31 09:34:17 2012 +1000 recoverd: When starting a takeover run disable IP verification Disable for TakeoverTimeout seconds. Otherwise the the recovery daemon can get overzealous and start trying to add/delete addresses that it thinks are missing but where the eventscript just hasn't finished. This didn't used to matter so much but it is more important now that concurrent takeip/releaseip/updateip generate error - we want to avoid spamming the log. Signed-off-by: Martin Schwenke <mar...@meltin.net> commit bfe16cf69bf2eee93c0d831f76d88bba0c2b96c2 Author: Martin Schwenke <mar...@meltin.net> Date: Wed Jul 11 14:46:07 2012 +1000 ctdbd: Stop takeovers and releases from colliding in mid-air There's a race here where release and takeover events for an IP can run at the same time. For example, a "ctdb deleteip" and a takeover initiated by the recovery daemon. The timeline is as follows: 1. The release code registers a callback to update the VNN. The callback is executed *after* the eventscripts run the releaseip event. 2. The release code calls the eventscripts for the releaseip event, removing IP from its interface. The takeover code "updates" the VNN saying that IP is on some iface.... even if/though the address is already there. 3. The release callback runs, removing the iface associated with IP in the VNN. The takeover code calls the eventscripts for the takeip event, adding IP to an interface. As a result, CTDB doesn't think it should be hosting IP but IP is on an interface. The recovery daemon fixes this later... but it shouldn't happen. This patch can cause some additional noise in the logs: Release of IP 10.0.2.133/24 on interface eth2 node:2 recoverd:We are still serving a public address '10.0.2.133' that we should not be serving. Removing it. Release of IP 10.0.2.133/24 rejected update for this IP already in flight recoverd:client/ctdb_client.c:2455 ctdb_control for release_ip failed recoverd:Failed to release local ip address In this case the node has started releasing an IP when the recovery daemon notices the addresses is still hosted and initiates another release. This noise is harmless but annoying. Signed-off-by: Martin Schwenke <mar...@meltin.net> commit a555940fb5c914b7581667a05153256ad7d17774 Author: Martin Schwenke <mar...@meltin.net> Date: Tue Aug 28 15:17:29 2012 +1000 ctdbd: New tunable NoIPTakeoverOnDisabled Stops the behaviour where unhealthy nodes can host IPs when there are no healthy nodes. Set this to 1 when an immediate complete outage is preferred when all nodes are unhealthy. The alternative (i.e. default) can lead to undefined behaviour when the shared filesystem is unavailable. Signed-off-by: Martin Schwenke <mar...@meltin.net> commit be4ad110ede9981b181ac28f31ffd855a879d5df Author: Martin Schwenke <mar...@meltin.net> Date: Tue Aug 21 15:52:03 2012 +1000 Eventscripts: Add service-start and service-stop pseudo-events Signed-off-by: Martin Schwenke <mar...@meltin.net> commit 7054e4ded59c6b8f254dcfefaef64da05f25aecd Author: Martin Schwenke <mar...@meltin.net> Date: Wed Aug 15 15:28:14 2012 +1000 ctdbd: Avoid unnecessary updateip event The existing code makes one fatally bad assumption: vnn->iface->references can never be -1 (or max-unit32_t in this case). Right now the reference counting is broken so a reference count of -1 is possible and causes a spurious updateip when vnn->iface is the same as best_face. This can occur frequently because we get a lot of redundant takeovers, especially when each IP can only be hosted on one interface. This makes the code much more defensive by noting that when best_iface is the same as vnn->iface there is never a need for an updateip event. This effectively neuters the updateip code path when IPs can only be hosted by a single interface. This should obsolete 6a74515f0a1e24d97cee3ba05d89133aac7ad2b7. Signed-off-by: Martin Schwenke <mar...@meltin.net> ----------------------------------------------------------------------- Summary of changes: config/events.d/13.per_ip_routing | 19 +++- config/functions | 30 +++++- doc/ctdbd.1 | 11 ++- doc/ctdbd.1.html | 165 ++++++++++++++------------- doc/ctdbd.1.xml | 12 ++ include/ctdb_private.h | 5 + server/ctdb_recoverd.c | 77 ++++++++----- server/ctdb_takeover.c | 113 +++++++++++++++--- server/ctdb_tunables.c | 3 +- tests/eventscripts/13.per_ip_routing.001.sh | 13 +- tests/eventscripts/13.per_ip_routing.002.sh | 2 + tests/eventscripts/13.per_ip_routing.003.sh | 15 +-- tests/eventscripts/13.per_ip_routing.004.sh | 13 +-- tests/eventscripts/13.per_ip_routing.005.sh | 33 ++---- tests/eventscripts/13.per_ip_routing.006.sh | 31 ++---- tests/eventscripts/13.per_ip_routing.007.sh | 36 +----- tests/eventscripts/13.per_ip_routing.008.sh | 34 ++---- tests/eventscripts/13.per_ip_routing.009.sh | 39 ++----- tests/eventscripts/13.per_ip_routing.010.sh | 50 ++------- tests/eventscripts/13.per_ip_routing.011.sh | 39 ++----- tests/eventscripts/13.per_ip_routing.012.sh | 30 +---- tests/eventscripts/13.per_ip_routing.013.sh | 26 +---- tests/eventscripts/13.per_ip_routing.014.sh | 33 ++---- tests/eventscripts/13.per_ip_routing.015.sh | 21 +--- tests/eventscripts/13.per_ip_routing.016.sh | 15 +++ tests/eventscripts/13.per_ip_routing.017.sh | 16 +++ tests/eventscripts/13.per_ip_routing.018.sh | 22 ++++ tests/eventscripts/13.per_ip_routing.019.sh | 24 ++++ tests/eventscripts/scripts/local.sh | 72 ++++++++++++ 29 files changed, 548 insertions(+), 451 deletions(-) create mode 100755 tests/eventscripts/13.per_ip_routing.016.sh create mode 100755 tests/eventscripts/13.per_ip_routing.017.sh create mode 100755 tests/eventscripts/13.per_ip_routing.018.sh create mode 100755 tests/eventscripts/13.per_ip_routing.019.sh Changeset truncated at 500 lines: diff --git a/config/events.d/13.per_ip_routing b/config/events.d/13.per_ip_routing index 06b21b9..d51d309 100755 --- a/config/events.d/13.per_ip_routing +++ b/config/events.d/13.per_ip_routing @@ -276,7 +276,8 @@ flush_rules_and_routes () # Add any missing routes. Some might have gone missing if, for # example, all IPs on the network were removed (possibly if the -# primary was removed). +# primary was removed). If $1 is "force" then (re-)add all the +# routes. add_missing_routes () { ctdb ip -v -Y | { @@ -292,7 +293,8 @@ add_missing_routes () [ -n "$_iface" ] || continue _table_id="${table_id_prefix}${_ip}" - if [ -z "$(ip route show table $_table_id 2>/dev/null)" ] ; then + if [ -z "$(ip route show table $_table_id 2>/dev/null)" -o \ + "$1" = "force" ] ; then add_routing_for_ip "$_iface" "$_ip" fi done @@ -326,8 +328,21 @@ remove_bogus_routes () ###################################################################### +service_reconfigure () +{ + add_missing_routes "force" + remove_bogus_routes + + # flush our route cache + set_proc sys/net/ipv4/route/flush 1 +} + +###################################################################### + ctdb_check_args "$@" +ctdb_service_check_reconfigure + case "$1" in startup) flush_rules_and_routes diff --git a/config/functions b/config/functions index e2a9b03..32c6f4a 100755 --- a/config/functions +++ b/config/functions @@ -1286,11 +1286,37 @@ is_ctdb_managed_service () ctdb_start_stop_service () { + _service_name="${1:-${service_name}}" + + # Allow service-start/service-stop pseudo-events to start/stop + # services when we're not auto-starting/stopping and we're not + # monitoring. + case "$event_name" in + service-start) + if is_ctdb_managed_service "$_service_name" ; then + die 'service-start event not permitted when service is managed' + fi + if [ "$CTDB_SERVICE_AUTOSTARTSTOP" = "yes" ] ; then + die 'service-start event not permitted with $CTDB_SERVICE_AUTOSTARTSTOP = yes' + fi + ctdb_service_start "$_service_name" + exit $? + ;; + service-stop) + if is_ctdb_managed_service "$_service_name" ; then + die 'service-stop event not permitted when service is managed' + fi + if [ "$CTDB_SERVICE_AUTOSTARTSTOP" = "yes" ] ; then + die 'service-stop event not permitted with $CTDB_SERVICE_AUTOSTARTSTOP = yes' + fi + ctdb_service_stop "$_service_name" + exit $? + ;; + esac + # Do nothing unless configured to... [ "$CTDB_SERVICE_AUTOSTARTSTOP" = "yes" ] || return 0 - _service_name="${1:-${service_name}}" - [ "$event_name" = "monitor" ] || return 0 if is_ctdb_managed_service "$_service_name" ; then diff --git a/doc/ctdbd.1 b/doc/ctdbd.1 index e4ea114..52c3393 100644 --- a/doc/ctdbd.1 +++ b/doc/ctdbd.1 @@ -2,12 +2,12 @@ .\" Title: ctdbd .\" Author: [FIXME: author] [see http://docbook.sf.net/el/author] .\" Generator: DocBook XSL Stylesheets v1.76.1 <http://docbook.sf.net/> -.\" Date: 07/26/2012 +.\" Date: 10/11/2012 .\" Manual: CTDB - clustered TDB database .\" Source: ctdb .\" Language: English .\" -.TH "CTDBD" "1" "07/26/2012" "ctdb" "CTDB \- clustered TDB database" +.TH "CTDBD" "1" "10/11/2012" "ctdb" "CTDB \- clustered TDB database" .\" ----------------------------------------------------------------- .\" * Define some portability stuff .\" ----------------------------------------------------------------- @@ -483,6 +483,11 @@ When you enable this tunable, CTDB will no longer attempt to recover the cluster Default: 0 .PP When set to 1, ctdb will allow ip addresses to be failed over onto this node\&. Any ip addresses that the node currently hosts will remain on the node but no new ip addresses can be failed over onto the node\&. +.SS "NoIPTakeoverOnDisabled" +.PP +Default: 0 +.PP +If no nodes are healthy then by default ctdb will happily host public IPs on disabled (unhealthy or administratively disabled) nodes\&. This can cause problems, for example if the underlying cluster filesystem is not mounted\&. When set to 1 this behaviour is switched off and disabled nodes will not be able to takeover IPs\&. .SS "DBRecordCountWarn" .PP Default: 100000 @@ -681,7 +686,7 @@ There can be multiple NATGW groups in one cluster but each node can only be memb In each NATGW group, one of the nodes is designated the NAT Gateway through which all traffic that is originated by nodes in this group will be routed through if a public addresses are not available\&. .SS "Configuration" .PP -NAT\-GW is configured in /etc/sysconfigctdb by setting the following variables: +NAT\-GW is configured in /etc/sysconfig/ctdb by setting the following variables: .sp .if n \{\ .RS 4 diff --git a/doc/ctdbd.1.html b/doc/ctdbd.1.html index a2e6bc8..b6a7f54 100644 --- a/doc/ctdbd.1.html +++ b/doc/ctdbd.1.html @@ -1,4 +1,4 @@ -<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>ctdbd</title><meta name="generator" content="DocBook XSL Stylesheets V1.76.1"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="refentry" title="ctdbd"><a name="ctdbd.1"></a><div class="titlepage"></div><div class="refnamediv"><h2>Name</h2><p>ctdbd — The CTDB cluster daemon</p></div><div class="refsynopsisdiv" title="Synopsis"><h2>Synopsis</h2><div class="cmdsynopsis"><p><code class="command">ctdbd</code> </p></div><div class="cmdsynopsis"><p><code class="command">ctdbd</code> [-? --help] [-d --debug=<INTEGER>] {--dbdir=<directory>} {--dbdir-persistent=<directory>} [--event-script-dir=<directory>] [-i --interactive] [--listen=<address>] [--logfile=<filename>] [--lvs] {--nlist=<filename>} [--no-lmaster] [--no-recmaster] [--nosetsched] {--notification-script=<filename>} [--public-add resses=<filename>] [--public-interface=<interface>] {--reclock=<filename>} [--single-public-ip=<address>] [--socket=<filename>] [--start-as-disabled] [--start-as-stopped] [--syslog] [--log-ringbuf-size=<num-entries>] [--torture] [--transport=<STRING>] [--usage]</p></div></div><div class="refsect1" title="DESCRIPTION"><a name="idp199104"></a><h2>DESCRIPTION</h2><p> +<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>ctdbd</title><meta name="generator" content="DocBook XSL Stylesheets V1.76.1"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="refentry" title="ctdbd"><a name="ctdbd.1"></a><div class="titlepage"></div><div class="refnamediv"><h2>Name</h2><p>ctdbd — The CTDB cluster daemon</p></div><div class="refsynopsisdiv" title="Synopsis"><h2>Synopsis</h2><div class="cmdsynopsis"><p><code class="command">ctdbd</code> </p></div><div class="cmdsynopsis"><p><code class="command">ctdbd</code> [-? --help] [-d --debug=<INTEGER>] {--dbdir=<directory>} {--dbdir-persistent=<directory>} [--event-script-dir=<directory>] [-i --interactive] [--listen=<address>] [--logfile=<filename>] [--lvs] {--nlist=<filename>} [--no-lmaster] [--no-recmaster] [--nosetsched] {--notification-script=<filename>} [--public-add resses=<filename>] [--public-interface=<interface>] {--reclock=<filename>} [--single-public-ip=<address>] [--socket=<filename>] [--start-as-disabled] [--start-as-stopped] [--syslog] [--log-ringbuf-size=<num-entries>] [--torture] [--transport=<STRING>] [--usage]</p></div></div><div class="refsect1" title="DESCRIPTION"><a name="idp228184"></a><h2>DESCRIPTION</h2><p> ctdbd is the main ctdb daemon. </p><p> ctdbd provides a clustered version of the TDB database with automatic rebuild/recovery of the databases upon nodefailures. @@ -8,7 +8,7 @@ ctdbd provides monitoring of all nodes in the cluster and automatically reconfigures the cluster and recovers upon node failures. </p><p> ctdbd is the main component in clustered Samba that provides a high-availability load-sharing CIFS server cluster. - </p></div><div class="refsect1" title="OPTIONS"><a name="idp201064"></a><h2>OPTIONS</h2><div class="variablelist"><dl><dt><span class="term">-? --help</span></dt><dd><p> + </p></div><div class="refsect1" title="OPTIONS"><a name="idp230192"></a><h2>OPTIONS</h2><div class="variablelist"><dl><dt><span class="term">-? --help</span></dt><dd><p> Print some help text to the screen. </p></dd><dt><span class="term">-d --debug=<DEBUGLEVEL></span></dt><dd><p> This option sets the debuglevel on the ctdbd daemon which controls what will be written to the logfile. The default is 0 which will only log important events and errors. A larger number will provide additional logging. @@ -154,10 +154,10 @@ implemented in the future. </p></dd><dt><span class="term">--usage</span></dt><dd><p> Print useage information to the screen. - </p></dd></dl></div></div><div class="refsect1" title="Private vs Public addresses"><a name="idp90512"></a><h2>Private vs Public addresses</h2><p> + </p></dd></dl></div></div><div class="refsect1" title="Private vs Public addresses"><a name="idp120024"></a><h2>Private vs Public addresses</h2><p> When used for ip takeover in a HA environment, each node in a ctdb cluster has multiple ip addresses assigned to it. One private and one or more public. - </p><div class="refsect2" title="Private address"><a name="idp91136"></a><h3>Private address</h3><p> + </p><div class="refsect2" title="Private address"><a name="idp120648"></a><h3>Private address</h3><p> This is the physical ip address of the node which is configured in linux and attached to a physical interface. This address uniquely identifies a physical node in the cluster and is the ip addresses @@ -187,7 +187,7 @@ 10.1.1.2 10.1.1.3 10.1.1.4 - </pre></div><div class="refsect2" title="Public address"><a name="idp94040"></a><h3>Public address</h3><p> + </pre></div><div class="refsect2" title="Public address"><a name="idp123552"></a><h3>Public address</h3><p> A public address on the other hand is not attached to an interface. This address is managed by ctdbd itself and is attached/detached to a physical node at runtime. @@ -248,7 +248,7 @@ unavailable. 10.1.1.1 can not be failed over to node 2 or node 3 since these nodes do not have this ip address listed in their public addresses file. - </p></div></div><div class="refsect1" title="Node status"><a name="idp98936"></a><h2>Node status</h2><p> + </p></div></div><div class="refsect1" title="Node status"><a name="idp128472"></a><h2>Node status</h2><p> The current status of each node in the cluster can be viewed by the 'ctdb status' command. </p><p> @@ -285,9 +285,9 @@ RECMASTER or NATGW. This node does not perticipate in the CTDB cluster but can still be communicated with. I.e. ctdb commands can be sent to it. - </p></div><div class="refsect1" title="PUBLIC TUNABLES"><a name="idp102960"></a><h2>PUBLIC TUNABLES</h2><p> + </p></div><div class="refsect1" title="PUBLIC TUNABLES"><a name="idp132496"></a><h2>PUBLIC TUNABLES</h2><p> These are the public tuneables that can be used to control how ctdb behaves. - </p><div class="refsect2" title="MaxRedirectCount"><a name="idp103592"></a><h3>MaxRedirectCount</h3><p>Default: 3</p><p> + </p><div class="refsect2" title="MaxRedirectCount"><a name="idp133128"></a><h3>MaxRedirectCount</h3><p>Default: 3</p><p> If we are not the DMASTER and need to fetch a record across the network we first send the request to the LMASTER after which the record is passed onto the current DMASTER. If the DMASTER changes before @@ -301,7 +301,7 @@ </p><p> When chasing a record, this is how many hops we will chase the record for before going back to the LMASTER to ask for new guidance. - </p></div><div class="refsect2" title="SeqnumInterval"><a name="idp105312"></a><h3>SeqnumInterval</h3><p>Default: 1000</p><p> + </p></div><div class="refsect2" title="SeqnumInterval"><a name="idp134848"></a><h3>SeqnumInterval</h3><p>Default: 1000</p><p> Some databases have seqnum tracking enabled, so that samba will be able to detect asynchronously when there has been updates to the database. Everytime a database is updated its sequence number is increased. @@ -309,17 +309,17 @@ This tunable is used to specify in 'ms' how frequently ctdb will send out updates to remote nodes to inform them that the sequence number is increased. - </p></div><div class="refsect2" title="ControlTimeout"><a name="idp106664"></a><h3>ControlTimeout</h3><p>Default: 60</p><p> + </p></div><div class="refsect2" title="ControlTimeout"><a name="idp136200"></a><h3>ControlTimeout</h3><p>Default: 60</p><p> This is the default setting for timeout for when sending a control message to either the local or a remote ctdb daemon. - </p></div><div class="refsect2" title="TraverseTimeout"><a name="idp107552"></a><h3>TraverseTimeout</h3><p>Default: 20</p><p> + </p></div><div class="refsect2" title="TraverseTimeout"><a name="idp137088"></a><h3>TraverseTimeout</h3><p>Default: 20</p><p> This setting controls how long we allow a traverse process to run. After this timeout triggers, the main ctdb daemon will abort the traverse if it has not yet finished. - </p></div><div class="refsect2" title="KeepaliveInterval"><a name="idp108488"></a><h3>KeepaliveInterval</h3><p>Default: 5</p><p> + </p></div><div class="refsect2" title="KeepaliveInterval"><a name="idp138024"></a><h3>KeepaliveInterval</h3><p>Default: 5</p><p> How often in seconds should the nodes send keepalives to eachother. - </p></div><div class="refsect2" title="KeepaliveLimit"><a name="idp109320"></a><h3>KeepaliveLimit</h3><p>Default: 5</p><p> + </p></div><div class="refsect2" title="KeepaliveLimit"><a name="idp138856"></a><h3>KeepaliveLimit</h3><p>Default: 5</p><p> After how many keepalive intervals without any traffic should a node wait until marking the peer as DISCONNECTED. </p><p> @@ -328,60 +328,60 @@ require a recovery. This limitshould not be set too high since we want a hung node to be detectec, and expunged from the cluster well before common CIFS timeouts (45-90 seconds) kick in. - </p></div><div class="refsect2" title="RecoverTimeout"><a name="idp110760"></a><h3>RecoverTimeout</h3><p>Default: 20</p><p> + </p></div><div class="refsect2" title="RecoverTimeout"><a name="idp140296"></a><h3>RecoverTimeout</h3><p>Default: 20</p><p> This is the default setting for timeouts for controls when sent from the recovery daemon. We allow longer control timeouts from the recovery daemon than from normal use since the recovery dameon often use controls that can take a lot longer than normal controls. - </p></div><div class="refsect2" title="RecoverInterval"><a name="idp111800"></a><h3>RecoverInterval</h3><p>Default: 1</p><p> + </p></div><div class="refsect2" title="RecoverInterval"><a name="idp141336"></a><h3>RecoverInterval</h3><p>Default: 1</p><p> How frequently in seconds should the recovery daemon perform the consistency checks that determine if we need to perform a recovery or not. - </p></div><div class="refsect2" title="ElectionTimeout"><a name="idp112704"></a><h3>ElectionTimeout</h3><p>Default: 3</p><p> + </p></div><div class="refsect2" title="ElectionTimeout"><a name="idp142240"></a><h3>ElectionTimeout</h3><p>Default: 3</p><p> When electing a new recovery master, this is how many seconds we allow the election to take before we either deem the election finished or we fail the election and start a new one. - </p></div><div class="refsect2" title="TakeoverTimeout"><a name="idp113656"></a><h3>TakeoverTimeout</h3><p>Default: 9</p><p> + </p></div><div class="refsect2" title="TakeoverTimeout"><a name="idp143192"></a><h3>TakeoverTimeout</h3><p>Default: 9</p><p> This is how many seconds we allow controls to take for IP failover events. - </p></div><div class="refsect2" title="MonitorInterval"><a name="idp114496"></a><h3>MonitorInterval</h3><p>Default: 15</p><p> + </p></div><div class="refsect2" title="MonitorInterval"><a name="idp144032"></a><h3>MonitorInterval</h3><p>Default: 15</p><p> How often should ctdb run the event scripts to check for a nodes health. - </p></div><div class="refsect2" title="TickleUpdateInterval"><a name="idp115328"></a><h3>TickleUpdateInterval</h3><p>Default: 20</p><p> + </p></div><div class="refsect2" title="TickleUpdateInterval"><a name="idp144864"></a><h3>TickleUpdateInterval</h3><p>Default: 20</p><p> How often will ctdb record and store the "tickle" information used to kickstart stalled tcp connections after a recovery. - </p></div><div class="refsect2" title="EventScriptTimeout"><a name="idp116192"></a><h3>EventScriptTimeout</h3><p>Default: 20</p><p> + </p></div><div class="refsect2" title="EventScriptTimeout"><a name="idp145728"></a><h3>EventScriptTimeout</h3><p>Default: 20</p><p> How long should ctdb let an event script run before aborting it and marking the node unhealthy. - </p></div><div class="refsect2" title="EventScriptTimeoutCount"><a name="idp117056"></a><h3>EventScriptTimeoutCount</h3><p>Default: 1</p><p> + </p></div><div class="refsect2" title="EventScriptTimeoutCount"><a name="idp146592"></a><h3>EventScriptTimeoutCount</h3><p>Default: 1</p><p> How many events in a row needs to timeout before we flag the node UNHEALTHY. This setting is useful if your scripts can not be written so that they do not hang for benign reasons. - </p></div><div class="refsect2" title="EventScriptUnhealthyOnTimeout"><a name="idp117984"></a><h3>EventScriptUnhealthyOnTimeout</h3><p>Default: 0</p><p> + </p></div><div class="refsect2" title="EventScriptUnhealthyOnTimeout"><a name="idp147520"></a><h3>EventScriptUnhealthyOnTimeout</h3><p>Default: 0</p><p> This setting can be be used to make ctdb never become UNHEALTHY if your eventscripts keep hanging/timing out. - </p></div><div class="refsect2" title="RecoveryGracePeriod"><a name="idp118832"></a><h3>RecoveryGracePeriod</h3><p>Default: 120</p><p> + </p></div><div class="refsect2" title="RecoveryGracePeriod"><a name="idp148368"></a><h3>RecoveryGracePeriod</h3><p>Default: 120</p><p> During recoveries, if a node has not caused recovery failures during the last grace period, any records of transgressions that the node has caused recovery failures will be forgiven. This resets the ban-counter back to zero for that node. - </p></div><div class="refsect2" title="RecoveryBanPeriod"><a name="idp119856"></a><h3>RecoveryBanPeriod</h3><p>Default: 300</p><p> + </p></div><div class="refsect2" title="RecoveryBanPeriod"><a name="idp149392"></a><h3>RecoveryBanPeriod</h3><p>Default: 300</p><p> If a node becomes banned causing repetitive recovery failures. The node will eventually become banned from the cluster. This controls how long the culprit node will be banned from the cluster before it is allowed to try to join the cluster again. Don't set to small. A node gets banned for a reason and it is usually due to real problems with the node. - </p></div><div class="refsect2" title="DatabaseHashSize"><a name="idp121384"></a><h3>DatabaseHashSize</h3><p>Default: 100001</p><p> + </p></div><div class="refsect2" title="DatabaseHashSize"><a name="idp150920"></a><h3>DatabaseHashSize</h3><p>Default: 100001</p><p> Size of the hash chains for the local store of the tdbs that ctdb manages. - </p></div><div class="refsect2" title="DatabaseMaxDead"><a name="idp122232"></a><h3>DatabaseMaxDead</h3><p>Default: 5</p><p> + </p></div><div class="refsect2" title="DatabaseMaxDead"><a name="idp151768"></a><h3>DatabaseMaxDead</h3><p>Default: 5</p><p> How many dead records per hashchain in the TDB database do we allow before the freelist needs to be processed. - </p></div><div class="refsect2" title="RerecoveryTimeout"><a name="idp123112"></a><h3>RerecoveryTimeout</h3><p>Default: 10</p><p> + </p></div><div class="refsect2" title="RerecoveryTimeout"><a name="idp152648"></a><h3>RerecoveryTimeout</h3><p>Default: 10</p><p> Once a recovery has completed, no additional recoveries are permitted until this timeout has expired. - </p></div><div class="refsect2" title="EnableBans"><a name="idp123976"></a><h3>EnableBans</h3><p>Default: 1</p><p> + </p></div><div class="refsect2" title="EnableBans"><a name="idp153512"></a><h3>EnableBans</h3><p>Default: 1</p><p> When set to 0, this disables BANNING completely in the cluster and thus nodes can not get banned, even it they break. Don't set to 0 unless you know what you are doing. - </p></div><div class="refsect2" title="DeterministicIPs"><a name="idp124904"></a><h3>DeterministicIPs</h3><p>Default: 0</p><p> + </p></div><div class="refsect2" title="DeterministicIPs"><a name="idp154440"></a><h3>DeterministicIPs</h3><p>Default: 0</p><p> When enabled, this tunable makes ctdb try to keep public IP addresses locked to specific nodes as far as possible. This makes it easier for debugging since you can know that as long as all nodes are healthy @@ -392,12 +392,12 @@ public IP assignment changes in the cluster. This tunable may increase the number of IP failover/failbacks that are performed on the cluster by a small margin. - </p></div><div class="refsect2" title="LCP2PublicIPs"><a name="idp126448"></a><h3>LCP2PublicIPs</h3><p>Default: 1</p><p> + </p></div><div class="refsect2" title="LCP2PublicIPs"><a name="idp155984"></a><h3>LCP2PublicIPs</h3><p>Default: 1</p><p> When enabled this switches ctdb to use the LCP2 ip allocation algorithm. - </p></div><div class="refsect2" title="ReclockPingPeriod"><a name="idp127288"></a><h3>ReclockPingPeriod</h3><p>Default: x</p><p> + </p></div><div class="refsect2" title="ReclockPingPeriod"><a name="idp156824"></a><h3>ReclockPingPeriod</h3><p>Default: x</p><p> Obsolete - </p></div><div class="refsect2" title="NoIPFailback"><a name="idp128056"></a><h3>NoIPFailback</h3><p>Default: 0</p><p> + </p></div><div class="refsect2" title="NoIPFailback"><a name="idp157592"></a><h3>NoIPFailback</h3><p>Default: 0</p><p> When set to 1, ctdb will not perform failback of IP addresses when a node becomes healthy. Ctdb WILL perform failover of public IP addresses when a node becomes UNHEALTHY, but when the node becomes HEALTHY again, ctdb @@ -415,7 +415,7 @@ intervention from the administrator. When this parameter is set, you can manually fail public IP addresses over to the new node(s) using the 'ctdb moveip' command. - </p></div><div class="refsect2" title="DisableIPFailover"><a name="idp130224"></a><h3>DisableIPFailover</h3><p>Default: 0</p><p> + </p></div><div class="refsect2" title="DisableIPFailover"><a name="idp159760"></a><h3>DisableIPFailover</h3><p>Default: 0</p><p> When enabled, ctdb will not perform failover or failback. Even if a node fails while holding public IPs, ctdb will not recover the IPs or assign them to another node. @@ -424,52 +424,59 @@ the cluster by failing IP addresses over to other nodes. This leads to a service outage until the administrator has manually performed failover to replacement nodes using the 'ctdb moveip' command. - </p></div><div class="refsect2" title="NoIPTakeover"><a name="idp131648"></a><h3>NoIPTakeover</h3><p>Default: 0</p><p> + </p></div><div class="refsect2" title="NoIPTakeover"><a name="idp161184"></a><h3>NoIPTakeover</h3><p>Default: 0</p><p> When set to 1, ctdb will allow ip addresses to be failed over onto this node. Any ip addresses that the node currently hosts will remain on the node but no new ip addresses can be failed over onto the node. - </p></div><div class="refsect2" title="DBRecordCountWarn"><a name="idp132624"></a><h3>DBRecordCountWarn</h3><p>Default: 100000</p><p> + </p></div><div class="refsect2" title="NoIPTakeoverOnDisabled"><a name="idp162160"></a><h3>NoIPTakeoverOnDisabled</h3><p>Default: 0</p><p> + If no nodes are healthy then by default ctdb will happily host + public IPs on disabled (unhealthy or administratively disabled) + nodes. This can cause problems, for example if the underlying + cluster filesystem is not mounted. When set to 1 this behaviour + is switched off and disabled nodes will not be able to takeover + IPs. + </p></div><div class="refsect2" title="DBRecordCountWarn"><a name="idp163240"></a><h3>DBRecordCountWarn</h3><p>Default: 100000</p><p> When set to non-zero, ctdb will log a warning when we try to recover a database with more than this many records. This will produce a warning if a database grows uncontrollably with orphaned records. - </p></div><div class="refsect2" title="DBRecordSizeWarn"><a name="idp133600"></a><h3>DBRecordSizeWarn</h3><p>Default: 10000000</p><p> + </p></div><div class="refsect2" title="DBRecordSizeWarn"><a name="idp164216"></a><h3>DBRecordSizeWarn</h3><p>Default: 10000000</p><p> When set to non-zero, ctdb will log a warning when we try to recover a database where a single record is bigger than this. This will produce a warning if a database record grows uncontrollably with orphaned sub-records. - </p></div><div class="refsect2" title="DBSizeWarn"><a name="idp134600"></a><h3>DBSizeWarn</h3><p>Default: 1000000000</p><p> + </p></div><div class="refsect2" title="DBSizeWarn"><a name="idp165216"></a><h3>DBSizeWarn</h3><p>Default: 1000000000</p><p> When set to non-zero, ctdb will log a warning when we try to recover a database bigger than this. This will produce a warning if a database grows uncontrollably. - </p></div><div class="refsect2" title="VerboseMemoryNames"><a name="idp135528"></a><h3>VerboseMemoryNames</h3><p>Default: 0</p><p> + </p></div><div class="refsect2" title="VerboseMemoryNames"><a name="idp3317272"></a><h3>VerboseMemoryNames</h3><p>Default: 0</p><p> This feature consumes additional memory. when used the talloc library will create more verbose names for all talloc allocated objects. - </p></div><div class="refsect2" title="RecdPingTimeout"><a name="idp136432"></a><h3>RecdPingTimeout</h3><p>Default: 60</p><p> + </p></div><div class="refsect2" title="RecdPingTimeout"><a name="idp3318136"></a><h3>RecdPingTimeout</h3><p>Default: 60</p><p> If the main dameon has not heard a "ping" from the recovery dameon for this many seconds, the main dameon will log a message that the recovery daemon is potentially hung. - </p></div><div class="refsect2" title="RecdFailCount"><a name="idp137376"></a><h3>RecdFailCount</h3><p>Default: 10</p><p> + </p></div><div class="refsect2" title="RecdFailCount"><a name="idp3319040"></a><h3>RecdFailCount</h3><p>Default: 10</p><p> If the recovery daemon has failed to ping the main dameon for this many consecutive intervals, the main daemon will consider the recovery daemon as hung and will try to restart it to recover. - </p></div><div class="refsect2" title="LogLatencyMs"><a name="idp138336"></a><h3>LogLatencyMs</h3><p>Default: 0</p><p> + </p></div><div class="refsect2" title="LogLatencyMs"><a name="idp3319960"></a><h3>LogLatencyMs</h3><p>Default: 0</p><p> When set to non-zero, this will make the main daemon log any operation that took longer than this value, in 'ms', to complete. These include "how long time a lockwait child process needed", "how long time to write to a persistent database" but also "how long did it take to get a response to a CALL from a remote node". - </p></div><div class="refsect2" title="RecLockLatencyMs"><a name="idp139432"></a><h3>RecLockLatencyMs</h3><p>Default: 1000</p><p> + </p></div><div class="refsect2" title="RecLockLatencyMs"><a name="idp3321016"></a><h3>RecLockLatencyMs</h3><p>Default: 1000</p><p> When using a reclock file for split brain prevention, if set to non-zero this tunable will make the recovery dameon log a message if the fcntl() call to lock/testlock the recovery file takes longer than this number of ms. - </p></div><div class="refsect2" title="RecoveryDropAllIPs"><a name="idp140440"></a><h3>RecoveryDropAllIPs</h3><p>Default: 120</p><p> + </p></div><div class="refsect2" title="RecoveryDropAllIPs"><a name="idp3321976"></a><h3>RecoveryDropAllIPs</h3><p>Default: 120</p><p> If we have been stuck in recovery, or stopped, or banned, mode for this many seconds we will force drop all held public addresses. - </p></div><div class="refsect2" title="verifyRecoveryLock"><a name="idp141344"></a><h3>verifyRecoveryLock</h3><p>Default: 1</p><p> + </p></div><div class="refsect2" title="verifyRecoveryLock"><a name="idp3322832"></a><h3>verifyRecoveryLock</h3><p>Default: 1</p><p> Should we take a fcntl() lock on the reclock file to verify that we are the sole recovery master node on the cluster or not. - </p></div><div class="refsect2" title="DeferredAttachTO"><a name="idp142232"></a><h3>DeferredAttachTO</h3><p>Default: 120</p><p> + </p></div><div class="refsect2" title="DeferredAttachTO"><a name="idp3323680"></a><h3>DeferredAttachTO</h3><p>Default: 120</p><p> When databases are frozen we do not allow clients to attach to the databases. Instead of returning an error immediately to the application the attach request from the client is deferred until the database @@ -477,7 +484,7 @@ </p><p> This timeout controls how long we will defer the request from the client before timing it out and returning an error to the client. - </p></div><div class="refsect2" title="HopcountMakeSticky"><a name="idp3179992"></a><h3>HopcountMakeSticky</h3><p>Default: 50</p><p> + </p></div><div class="refsect2" title="HopcountMakeSticky"><a name="idp3325024"></a><h3>HopcountMakeSticky</h3><p>Default: 50</p><p> If the database is set to 'STICKY' mode, using the 'ctdb setdbsticky' command, any record that is seen as very hot and migrating so fast that hopcount surpasses 50 is set to become a STICKY record for StickyDuration @@ -488,15 +495,15 @@ migrating across the cluster so fast. This will improve performance for certain workloads, such as locking.tdb if many clients are opening/closing the same file concurrently. - </p></div><div class="refsect2" title="StickyDuration"><a name="idp3181552"></a><h3>StickyDuration</h3><p>Default: 600</p><p> + </p></div><div class="refsect2" title="StickyDuration"><a name="idp3326584"></a><h3>StickyDuration</h3><p>Default: 600</p><p> Once a record has been found to be fetch-lock hot and has been flagged to become STICKY, this is for how long, in seconds, the record will be flagged as a STICKY record. - </p></div><div class="refsect2" title="StickyPindown"><a name="idp3182456"></a><h3>StickyPindown</h3><p>Default: 200</p><p> + </p></div><div class="refsect2" title="StickyPindown"><a name="idp3327488"></a><h3>StickyPindown</h3><p>Default: 200</p><p> Once a STICKY record has been migrated onto a node, it will be pinned down on that node for this number of ms. Any request from other nodes to migrate the record off the node will be deferred until the pindown timer expires. - </p></div><div class="refsect2" title="MaxLACount"><a name="idp3183408"></a><h3>MaxLACount</h3><p>Default: 20</p><p> + </p></div><div class="refsect2" title="MaxLACount"><a name="idp3328440"></a><h3>MaxLACount</h3><p>Default: 20</p><p> When record content is fetched from a remote node, if it is only for reading the record, pass back the content of the record but do not yet migrate the record. Once MaxLACount identical requests from the @@ -504,13 +511,13 @@ onto the requesting node. This reduces the amount of migration for a database read-mostly workload at the expense of more frequent network roundtrips. - </p></div><div class="refsect2" title="StatHistoryInterval"><a name="idp3184584"></a><h3>StatHistoryInterval</h3><p>Default: 1</p><p> + </p></div><div class="refsect2" title="StatHistoryInterval"><a name="idp3329616"></a><h3>StatHistoryInterval</h3><p>Default: 1</p><p> Granularity of the statistics collected in the statistics history. - </p></div><div class="refsect2" title="AllowClientDBAttach"><a name="idp3185376"></a><h3>AllowClientDBAttach</h3><p>Default: 1</p><p> + </p></div><div class="refsect2" title="AllowClientDBAttach"><a name="idp3330408"></a><h3>AllowClientDBAttach</h3><p>Default: 1</p><p> When set to 0, clients are not allowed to attach to any databases. This can be used to temporarily block any new processes from attaching to and accessing the databases. - </p></div><div class="refsect2" title="RecoverPDBBySeqNum"><a name="idp3186272"></a><h3>RecoverPDBBySeqNum</h3><p>Default: 0</p><p> + </p></div><div class="refsect2" title="RecoverPDBBySeqNum"><a name="idp3331304"></a><h3>RecoverPDBBySeqNum</h3><p>Default: 0</p><p> When set to non-zero, this will change how the recovery process for persistent databases ar performed. By default, when performing a database recovery, for normal as for persistent databases, recovery is @@ -521,7 +528,7 @@ a whole db and not by individual records. The node that contains the highest value stored in the record "__db_sequence_number__" is selected and the copy of that nodes database is used as the recovered database. - </p></div><div class="refsect2" title="FetchCollapse"><a name="idp3187824"></a><h3>FetchCollapse</h3><p>Default: 1</p><p> + </p></div><div class="refsect2" title="FetchCollapse"><a name="idp3332856"></a><h3>FetchCollapse</h3><p>Default: 1</p><p> When many clients across many nodes try to access the same record at the same time this can lead to a fetch storm where the record becomes very active and bounces between nodes very fast. This leads to high CPU @@ -537,7 +544,7 @@ </p><p> This timeout controls if we should collapse multiple fetch operations of the same record into a single request and defer all duplicates or not. - </p></div></div><div class="refsect1" title="LVS"><a name="idp3190272"></a><h2>LVS</h2><p> + </p></div></div><div class="refsect1" title="LVS"><a name="idp3335256"></a><h2>LVS</h2><p> LVS is a mode where CTDB presents one single IP address for the entire cluster. This is an alternative to using public IP addresses and round-robin DNS to loadbalance clients across the cluster. @@ -578,7 +585,7 @@ the processing node back to the clients. For read-intensive i/o patterns you can acheive very high throughput rates in this mode. </p><p> Note: you can use LVS and public addresses at the same time. - </p><div class="refsect2" title="Configuration"><a name="idp3194584"></a><h3>Configuration</h3><p> + </p><div class="refsect2" title="Configuration"><a name="idp3339568"></a><h3>Configuration</h3><p> To activate LVS on a CTDB node you must specify CTDB_PUBLIC_INTERFACE and CTDB_LVS_PUBLIC_ADDRESS in /etc/sysconfig/ctdb. </p><p> @@ -601,7 +608,7 @@ You must also specify the "--lvs" command line argument to ctdbd to activate LVS all of the clients from the node BEFORE you enable LVS. Also make sure that when you ping these hosts that the traffic is routed out through the eth0 interface. - </p></div><div class="refsect1" title="REMOTE CLUSTER NODES"><a name="idp3197376"></a><h2>REMOTE CLUSTER NODES</h2><p> + </p></div><div class="refsect1" title="REMOTE CLUSTER NODES"><a name="idp3342360"></a><h2>REMOTE CLUSTER NODES</h2><p> It is possible to have a CTDB cluster that spans across a WAN link. For example where you have a CTDB cluster in your datacentre but you also want to have one additional CTDB node located at a remote branch site. @@ -630,7 +637,7 @@ CTDB_CAPABILITY_RECMASTER=no </p><p> Verify with the command "ctdb getcapabilities" that that node no longer has the recmaster or the lmaster capabilities. - </p></div><div class="refsect1" title="NAT-GW"><a name="idp3200392"></a><h2>NAT-GW</h2><p> + </p></div><div class="refsect1" title="NAT-GW"><a name="idp3345376"></a><h2>NAT-GW</h2><p> Sometimes it is desireable to run services on the CTDB node which will need to originate outgoing traffic to external servers. This might be contacting NIS servers, LDAP servers etc. etc. @@ -653,7 +660,7 @@ CTDB_CAPABILITY_RECMASTER=no if there are no public addresses assigned to the node. This is the simplest way but it uses up a lot of ip addresses since you have to assign both static and also public addresses to each node. - </p><div class="refsect2" title="NAT-GW"><a name="idp3202792"></a><h3>NAT-GW</h3><p> + </p><div class="refsect2" title="NAT-GW"><a name="idp3347776"></a><h3>NAT-GW</h3><p> A second way is to use the built in NAT-GW feature in CTDB. With NAT-GW you assign one public NATGW address for each natgw group. Each NATGW group is a set of nodes in the cluster that shares the same @@ -668,8 +675,8 @@ CTDB_CAPABILITY_RECMASTER=no In each NATGW group, one of the nodes is designated the NAT Gateway through which all traffic that is originated by nodes in this group will be routed through if a public addresses are not available. -- CTDB repository