The branch, master has been updated via eb8ec5681bfccb26c8ffae72952d54bb0ba46249 (commit) via d1674aad224f8f0c9a03c3cd38a647318ba0f03e (commit) via 81b94fbb7495ac3204f1a84c673c8babf04663bc (commit) via 8c6f511254ecb0381a609b37e3a0ee6e5ec5d562 (commit) via c072eb1f6488f94f83a6d3a81d88bf29ad866943 (commit) via 3e41170c78fc7a2bf526129c9b7db3739b61c6bf (commit) via 01a46205c3a3d6609dc0b0324319b89667dffa32 (commit) via 56486d1c01cc8ad0e4b8cee7a22429e72e50f03d (commit) from c7450f9e22133333bf82c88a17ac25990ebc77ab (commit)
http://gitweb.samba.org/?p=ctdb.git;a=shortlog;h=master - Log ----------------------------------------------------------------- commit eb8ec5681bfccb26c8ffae72952d54bb0ba46249 Author: Martin Schwenke <mar...@meltin.net> Date: Tue Oct 29 14:05:41 2013 +1100 ctdbd: When a node is connected, log at DEBUG NOTICE not DEBUG_INFO This is important enough that we should see it when the log level is DEBUG_NOTICE. Signed-off-by: Martin Schwenke <mar...@meltin.net> commit d1674aad224f8f0c9a03c3cd38a647318ba0f03e Author: Martin Schwenke <mar...@meltin.net> Date: Mon Oct 28 16:20:44 2013 +1100 tests/complex: Remove CTDB_NFS_SKIP_SHARE_CHECK test This is a needlessly complex way of testing the same thing as the eventscripts unit tests 60.nfs.monitor.161.sh and 60.nfs.monitor.162.sh. Signed-off-by: Martin Schwenke <mar...@meltin.net> commit 81b94fbb7495ac3204f1a84c673c8babf04663bc Author: Martin Schwenke <mar...@meltin.net> Date: Mon Oct 28 16:14:40 2013 +1100 tests/complex: Remove CTDB_SAMBA_SKIP_SHARE_CHECK test This is adequately covered by eventscripts unit tests 50.samba.monitor.105.sh and 50.samba.monitor.106.sh. This test is broken if CTDB_SAMBA_CHECK_PORTS is not specified in the CTDB configuration. Fixing it is hard and involves adding a more complex stub for testparm. We already have that in the eventscript unit tests above. Signed-off-by: Martin Schwenke <mar...@meltin.net> commit 8c6f511254ecb0381a609b37e3a0ee6e5ec5d562 Author: Martin Schwenke <mar...@meltin.net> Date: Mon Oct 28 16:00:54 2013 +1100 eventscripts: Rewrite the smb.conf cache file handling The background update is never guaranteed to complete before the cache is used, so don't bother trying it at the beginning. Instead, put a timeout on a foreground update. If the foreground update fails: * If there's no available cache file then die. * If there is a previous cache file then use it and log a warning. * Do a background update at the end of the monitor event. Also remove commas in the "smb ports" list before use, since (newer?) testparm seem to insert commas into the default value. Update the associated test to add a comma. Signed-off-by: Martin Schwenke <mar...@meltin.net> Pair-programmed-with: Amitay Isaacs <ami...@gmail.com> commit c072eb1f6488f94f83a6d3a81d88bf29ad866943 Author: Martin Schwenke <mar...@meltin.net> Date: Fri Oct 25 16:25:25 2013 +1100 tools/ctdb: Fix documentation string for ban command Ban time of 0 is not supported. Signed-off-by: Martin Schwenke <mar...@meltin.net> commit 3e41170c78fc7a2bf526129c9b7db3739b61c6bf Author: Martin Schwenke <mar...@meltin.net> Date: Thu Oct 24 11:13:16 2013 +1100 Revert "recoverd: Disable takeover runs on other nodes for 5 minutes" 5 minutes is too long to leave the cluster in limbo if the recovery daemon dies during a takeover run, even though this is quite unlikely. We need a new recover master to be able to do takeover runs fairly quickly. This reverts commit 71080676bb4acbd0d9b595a30cf7fe6dddbf426f. commit 01a46205c3a3d6609dc0b0324319b89667dffa32 Author: Martin Schwenke <mar...@meltin.net> Date: Thu Oct 24 14:15:53 2013 +1100 tools/onnode: Fix healthy/ok node handling This bit-rotted a long time ago when the "ThisNode" column was added to "ctdb -Y status" output. The fake "ctdb -Y status" output in the test was never updated to reflect this change. Instead of making sure that all columns are "0", just check that they're not "1". This implicitly ignores "Y" and "N" in this "ThisNode" column without having to do anything else clever. Also update associated tests. The main "ctdb ok" test had a duplicate opening line for a here document, which was tickled by this change. This fixes samba bz#8122. Signed-off-by: Martin Schwenke <mar...@meltin.net> onnode test fixup Signed-off-by: Martin Schwenke <mar...@meltin.net> commit 56486d1c01cc8ad0e4b8cee7a22429e72e50f03d Author: Amitay Isaacs <ami...@gmail.com> Date: Mon Oct 28 18:49:51 2013 +1100 daemon: Change the default recovery method for persistent databases Use sequence numbers to do recovery for persistent databases instead of RSNs. This fixes the problem of registry corruption during recovery. Signed-off-by: Amitay Isaacs <ami...@gmail.com> ----------------------------------------------------------------------- Summary of changes: config/events.d/50.samba | 125 ++++++++------------- doc/ctdbd.1.xml | 11 +- server/ctdb_recoverd.c | 4 +- server/ctdb_server.c | 5 +- server/ctdb_tunables.c | 2 +- tests/complex/01_ctdb_nfs_skip_share_check.sh | 129 ---------------------- tests/complex/02_ctdb_samba_skip_share_check.sh | 134 ----------------------- tests/eventscripts/stubs/testparm | 2 +- tests/onnode/0070.sh | 10 +- tests/onnode/0071.sh | 13 +- tests/onnode/0075.sh | 10 +- tools/ctdb.c | 2 +- tools/onnode | 4 +- 13 files changed, 81 insertions(+), 370 deletions(-) delete mode 100755 tests/complex/01_ctdb_nfs_skip_share_check.sh delete mode 100755 tests/complex/02_ctdb_samba_skip_share_check.sh Changeset truncated at 500 lines: diff --git a/config/events.d/50.samba b/config/events.d/50.samba index 117b459..4b53cba 100755 --- a/config/events.d/50.samba +++ b/config/events.d/50.samba @@ -68,74 +68,44 @@ service_stop () fi } -# we keep a cached copy of smb.conf here +###################################################################### +# Show the testparm output using a cached smb.conf to avoid delays due +# to registry access. + smbconf_cache="$service_state_dir/smb.conf.cache" +testparm_foreground_update () +{ + _timeout="$1" + + if ! _out=$(timeout $_timeout testparm -v -s 2>/dev/null) ; then + if [ -f "$smbconf_cache" ] ; then + echo "WARNING: smb.conf cache update failed - using old cache file" + return 1 + else + die "ERROR: smb.conf cache create failed" + fi + fi + + _tmpfile="${smbconf_cache}.$$" + # Patterns to exclude... + pat='^[[:space:]]+(registry[[:space:]]+shares|include|copy|winbind[[:space:]]+separator)[[:space:]]+=' + echo "$_out" | grep -Ev "$pat" >"$_tmpfile" + mv "$_tmpfile" "$smbconf_cache" # atomic -############################################# -# update the smb.conf cache in the foreground -testparm_foreground_update() { - testparm -s 2> /dev/null | egrep -v 'registry.shares.=|include.=' > "$smbconf_cache" + return 0 } -############################################# -# update the smb.conf cache in the background -testparm_background_update() { - # if the cache doesn't exist, then update in the foreground - [ -f $smbconf_cache ] || { - testparm_foreground_update - } - # otherwise do a background update - ( - tmpfile="${smbconf_cache}.$$" - testparm -s > $tmpfile 2> /dev/null & - # remember the pid of the teamparm process - pid="$!" - # give it 10 seconds to run - timeleft=10 - while [ $timeleft -gt 0 ]; do - timeleft=$(($timeleft - 1)) - # see if the process still exists - kill -0 $pid > /dev/null 2>&1 || { - # it doesn't exist, grab its exit status - wait $pid - [ $? = 0 ] || { - echo "50.samba: smb.conf background update exited with status $?" - rm -f "${tmpfile}" - exit 1 - } - # put the new smb.conf contents in the cache (atomic rename) - # make sure we remove references to the registry while doing - # this to ensure that running testparm on the cache does - # not use the registry - egrep -v 'registry.shares.=|include.=' < "$tmpfile" > "${tmpfile}.2" - rm -f "$tmpfile" - mv -f "${tmpfile}.2" "$smbconf_cache" || { - echo "50.samba: failed to update background cache" - rm -f "${tmpfile}.2" - exit 1 - } - exit 0 - } - # keep waiting for testparm to finish - sleep 1 - done - # it took more than 10 seconds - kill it off - rm -f "${tmpfile}" - kill -9 "$pid" > /dev/null 2>&1 - echo "50.samba: timed out updating smbconf cache in background" - exit 1 - ) & +testparm_background_update () +{ + _timeout="$1" + + testparm_foreground_update $_timeout >/dev/null 2>&1 </dev/null & } -################################################## -# show the testparm output using a cached smb.conf -# to avoid registry access -testparm_cat() { - [ -f $smbconf_cache ] || { - testparm_foreground_update - } - testparm -v -s "$smbconf_cache" "$@" 2>/dev/null +testparm_cat () +{ + testparm -s "$smbconf_cache" "$@" 2>/dev/null } list_samba_shares () @@ -145,6 +115,11 @@ list_samba_shares () sed -e 's/"//g' } +list_samba_ports () +{ + testparm_cat --parameter-name="smb ports" | + sed -e 's@,@ @g' +} ########################### @@ -164,27 +139,23 @@ case "$1" in ;; monitor) - if [ "$CTDB_SAMBA_SKIP_SHARE_CHECK" != "yes" ] ; then - testparm_background_update - - testparm_cat | egrep '^WARNING|^ERROR|^Unknown' && { - testparm_foreground_update - testparm_cat | egrep '^WARNING|^ERROR|^Unknown' && \ - die "ERROR: testparm shows smb.conf is not clean" - } - - list_samba_shares | ctdb_check_directories_probe || { - testparm_foreground_update - list_samba_shares | - ctdb_check_directories - } || exit $? - fi + testparm_foreground_update 10 + ret=$? smb_ports="$CTDB_SAMBA_CHECK_PORTS" if [ -z "$smb_ports" ] ; then - smb_ports=`testparm_cat --parameter-name="smb ports"` + smb_ports=$(list_samba_ports) + [ -n "$smb_ports" ] || die "Failed to set smb ports" fi ctdb_check_tcp_ports $smb_ports || exit $? + + if [ "$CTDB_SAMBA_SKIP_SHARE_CHECK" != "yes" ] ; then + list_samba_shares | ctdb_check_directories || exit $? + fi + + if [ $ret -ne 0 ] ; then + testparm_background_update 10 + fi ;; *) diff --git a/doc/ctdbd.1.xml b/doc/ctdbd.1.xml index 111a8f4..75974cf 100644 --- a/doc/ctdbd.1.xml +++ b/doc/ctdbd.1.xml @@ -1049,11 +1049,9 @@ </refsect2> <refsect2><title>RecoverPDBBySeqNum</title> - <para>Default: 0</para> + <para>Default: 1</para> <para> - When set to non-zero, this will change how the recovery process for - persistent databases ar performed. By default, when performing a database - recovery, for normal as for persistent databases, recovery is + When set to zero, database recovery for persistent databases is record-by-record and recovery process simply collects the most recent version of every individual record. </para> @@ -1063,6 +1061,11 @@ highest value stored in the record "__db_sequence_number__" is selected and the copy of that nodes database is used as the recovered database. </para> + <para> + By default, recovery of persistent databses is done using + __db_sequence_number__ record. + </para> + </refsect2> <refsect2><title>FetchCollapse</title> diff --git a/server/ctdb_recoverd.c b/server/ctdb_recoverd.c index d41932b..e5c2887 100644 --- a/server/ctdb_recoverd.c +++ b/server/ctdb_recoverd.c @@ -1691,10 +1691,10 @@ static bool do_takeover_run(struct ctdb_recoverd *rec, nodes = list_of_connected_nodes(rec->ctdb, nodemap, rec, false); - /* Disable for 5 minutes. This can be a tunable later if + /* Disable for 60 seconds. This can be a tunable later if * necessary. */ - dtr.data = 300; + dtr.data = 60; for (i = 0; i < talloc_array_length(nodes); i++) { if (ctdb_client_send_message(rec->ctdb, nodes[i], CTDB_SRVID_DISABLE_TAKEOVER_RUNS, diff --git a/server/ctdb_server.c b/server/ctdb_server.c index 41cc881..c45f4cb 100644 --- a/server/ctdb_server.c +++ b/server/ctdb_server.c @@ -425,8 +425,9 @@ void ctdb_node_connected(struct ctdb_node *node) node->dead_count = 0; node->flags &= ~NODE_FLAGS_DISCONNECTED; node->flags |= NODE_FLAGS_UNHEALTHY; - DEBUG(DEBUG_INFO,("%s: connected to %s - %u connected\n", - node->ctdb->name, node->name, node->ctdb->num_connected)); + DEBUG(DEBUG_NOTICE, + ("%s: connected to %s - %u connected\n", + node->ctdb->name, node->name, node->ctdb->num_connected)); } struct queue_next { diff --git a/server/ctdb_tunables.c b/server/ctdb_tunables.c index 5fb4344..4c139ea 100644 --- a/server/ctdb_tunables.c +++ b/server/ctdb_tunables.c @@ -72,7 +72,7 @@ static const struct { { "StatHistoryInterval", 1, offsetof(struct ctdb_tunable, stat_history_interval), false }, { "DeferredAttachTO", 120, offsetof(struct ctdb_tunable, deferred_attach_timeout), false }, { "AllowClientDBAttach", 1, offsetof(struct ctdb_tunable, allow_client_db_attach), false }, - { "RecoverPDBBySeqNum", 0, offsetof(struct ctdb_tunable, recover_pdb_by_seqnum), false }, + { "RecoverPDBBySeqNum", 1, offsetof(struct ctdb_tunable, recover_pdb_by_seqnum), false }, { "DeferredRebalanceOnNodeAdd", 300, offsetof(struct ctdb_tunable, deferred_rebalance_on_node_add) }, { "FetchCollapse", 1, offsetof(struct ctdb_tunable, fetch_collapse) }, { "HopcountMakeSticky", 50, offsetof(struct ctdb_tunable, hopcount_make_sticky) }, diff --git a/tests/complex/01_ctdb_nfs_skip_share_check.sh b/tests/complex/01_ctdb_nfs_skip_share_check.sh deleted file mode 100755 index a7ad938..0000000 --- a/tests/complex/01_ctdb_nfs_skip_share_check.sh +++ /dev/null @@ -1,129 +0,0 @@ -#!/bin/bash - -test_info() -{ - cat <<EOF -Verify that the CTDB_NFS_SKIP_SHARE_CHECK configuration option is respected. - -We create a file in /etc/ctdb/rc.local.d/ that creates a function -called exportfs. This effectively hooks the exportfs command, -allowing us to provide a fake list of shares to check or not check. - -We create another file in the same directory to set and unset the -CTDB_NFS_SKIP_SHARE_CHECK option, utilising the shell's "readonly" -built-in to ensure that our value for the option is used. - -Prerequisites: - -* An active CTDB cluster with at least 2 nodes with public addresses. - -* Test must be run on a real or virtual cluster rather than against - local daemons. There is nothing intrinsic to this test that forces - this - it is because tests run against local daemons don't use the - regular eventscripts. - -Steps: - -1. Verify that the cluster is healthy. -2. Determine a timeout for state changes by adding MonitorInterval - and EventScriptTimeout. -3. Create a temporary directory on the test node using mktemp, - remember the name in $mydir. -4. On the test node create an executable file - /etc/ctdb/rc.local.d/fake-exportfs that contains a definiton for - the function exportfs, which prints a share definition for a - directory $mydir/foo (which does not currently exist). -5. On the test node create an executable file - /etc/ctdb/rc.local.d/nfs-skip-share-check that replaces the - loadconfig() function by one with equivalent functionality, but - which also sets CTDB_NFS_SKIP_SHARE_CHECK="no" if loading - "ctdb" configuration. -6. Wait for the test node to become unhealthy. -7. Create the directory $mydir/foo. -8. Wait for the test node to become healthy. -9. Modify /etc/ctdb/rc.local.d/nfs-skip-share-check so that it sets - CTDB_NFS_SKIP_SHARE_CHECK to "yes". -10. Remove the directory $mydir/foo. -11. Wait for a monitor event and confirm that the the node is still - healthy. - -Expected results: - -* When an NFS share directory is missing CTDB should only mark a node - as unhealthy if CTDB_NFS_SKIP_SHARE_CHECK is set to "no". -EOF -} - -. "${TEST_SCRIPTS_DIR}/integration.bash" - -set -e - -ctdb_test_init "$@" - -ctdb_test_check_real_cluster - -cluster_is_healthy - -select_test_node_and_ips - -# We need this for later, so we know how long to sleep. -try_command_on_node $test_node $CTDB getvar MonitorInterval -monitor_interval=${out#*= } -try_command_on_node $test_node $CTDB getvar EventScriptTimeout -event_script_timeout=${out#*= } - -monitor_timeout=$(($monitor_interval + $event_script_timeout)) - -echo "Using timeout of ${monitor_timeout}s (MonitorInterval + EventScriptTimeout)..." - - -mydir=$(onnode -q $test_node mktemp -d) -rc_local_d="${CTDB_BASE:-/etc/ctdb}/rc.local.d" - -my_exit_hook () -{ - ctdb_test_eventscript_uninstall - onnode -q $test_node "rm -f $mydir/*" - onnode -q $test_node "rmdir --ignore-fail-on-non-empty $mydir" - onnode -q $test_node "rm -f \"$rc_local_d/\"*" - onnode -q $test_node "rmdir --ignore-fail-on-non-empty \"$rc_local_d\"" -} - -ctdb_test_exit_hook_add my_exit_hook - -ctdb_test_eventscript_install - -foo_dir=$mydir/foo - -try_command_on_node -v $test_node "mkdir -p \"$rc_local_d\"" - -f="$rc_local_d/fake-exportfs" -echo "Installing \"$f\"..." -try_command_on_node $test_node "echo \"function exportfs () { echo \\\"$foo_dir 127.0.0.1/32(rw)\\\" ; }\" >\"$f\" ; chmod +x \"$f\"" - -n="$rc_local_d/nfs-skip-share-check" -n_contents='loadconfig() { - _loadconfig "$@" - - if [ "$1" = "ctdb" -o "$1" = "nfs" ] ; then - CTDB_NFS_SKIP_SHARE_CHECK=no - fi -} -' -echo "Installing \"$n\" with CTDB_NSF_SKIP_SHARE_CHECK=no..." -try_command_on_node $test_node "echo '$n_contents' >\"$n\" ; chmod +x \"$n\"" - -wait_until_node_has_status $test_node unhealthy $monitor_timeout - -try_command_on_node -v $test_node "mkdir $foo_dir" - -wait_until_node_has_status $test_node healthy $monitor_timeout - -echo "Re-installing \"$n\" with CTDB_NFS_SKIP_SHARE_CHECK=yes..." -try_command_on_node $test_node "echo '${n_contents/=no/=yes}' >\"$n\" ; chmod +x \"$n\"" - -try_command_on_node -v $test_node "rmdir $foo_dir" - -wait_for_monitor_event $test_node - -wait_until_node_has_status $test_node healthy 1 diff --git a/tests/complex/02_ctdb_samba_skip_share_check.sh b/tests/complex/02_ctdb_samba_skip_share_check.sh deleted file mode 100755 index 9097a78..0000000 --- a/tests/complex/02_ctdb_samba_skip_share_check.sh +++ /dev/null @@ -1,134 +0,0 @@ -#!/bin/bash - -test_info() -{ - cat <<EOF -Verify that the CTDB_SAMBA_SKIP_SHARE_CHECK configuration option is respected. - -We create a file in /etc/ctdb/rc.local.d/ that creates a function -called testparm. This effectively hooks the testparm command, -allowing us to provide a fake list of shares to check or not check. - -We create another file in the same directory to set and unset the -CTDB_SAMBA_SKIP_SHARE_CHECK option, utilising the shell's "readonly" -built-in to ensure that our value for the option is used. - -Prerequisites: - -* An active CTDB cluster with at least 2 nodes with public addresses. - -* Test must be run on a real or virtual cluster rather than against - local daemons. There is nothing intrinsic to this test that forces - this - it is because tests run against local daemons don't use the - regular eventscripts. - -Steps: - -1. Verify that the cluster is healthy. -2. Determine a timeout for state changes by adding MonitorInterval - and EventScriptTimeout. -3. Create a temporary directory using mktemp, remember the name in - $mydir. -4. Create an executable file /etc/ctdb/rc.local.d/fake-testparm that - contains a definiton for the function testparm, which prints a - share definition for a directory $mydir/foo (which does not - currently exist). -5. Create an executable file - /etc/ctdb/rc.local.d/samba-skip-share-check that replaces the - loadconfig() function by one with equivalent functionality, but - which also sets CTDB_SAMBA_SKIP_SHARE_CHECK="no" if loading - "ctdb" configuration. -6. Wait for a maximum of MonitorInterval seconds for the node to - become unhealthy. -7. Create the directory $mydir/foo. -8. Wait for a maximum of MonitorInterval seconds for the node to - become healthy. -9. Modify /etc/ctdb/rc.local.d/samba-skip-share-check so that it sets - CTDB_SAMBA_SKIP_SHARE_CHECK="yes". -10. Remove the directory $mydir/foo. -11. Wait for a monitor event and confirm that the the node is still - healthy. - -Expected results: - -* When an SAMBA share directory is missing CTDB should only mark a node - as unhealthy if CTDB_SAMBA_SKIP_SHARE_CHECK is set to "no". -EOF -} - -. "${TEST_SCRIPTS_DIR}/integration.bash" - -set -e - -ctdb_test_init "$@" - -ctdb_test_check_real_cluster - -cluster_is_healthy - -select_test_node_and_ips - -# We need this for later, so we know how long to sleep. -# We need this for later, so we know how long to sleep. -try_command_on_node $test_node $CTDB getvar MonitorInterval -monitor_interval=${out#*= } -try_command_on_node $test_node $CTDB getvar EventScriptTimeout -event_script_timeout=${out#*= } - -monitor_timeout=$(($monitor_interval + $event_script_timeout)) - -echo "Using timeout of ${monitor_timeout}s (MonitorInterval + EventScriptTimeout)..." - -mydir=$(onnode -q $test_node mktemp -d) -rc_local_d="${CTDB_BASE:-/etc/ctdb}/rc.local.d" - -my_exit_hook () -{ - ctdb_test_eventscript_uninstall - onnode -q $test_node "rm -f $mydir/*" - onnode -q $test_node "rmdir --ignore-fail-on-non-empty $mydir" - onnode -q $test_node "rm -f \"$rc_local_d/\"*" - onnode -q $test_node "rmdir --ignore-fail-on-non-empty \"$rc_local_d\"" -} - -ctdb_test_exit_hook_add my_exit_hook - -ctdb_test_eventscript_install - -foo_dir=$mydir/foo - -try_command_on_node -v $test_node "mkdir -p \"$rc_local_d\"" - -f="$rc_local_d/fake-testparm" -echo "Installing \"$f\"..." -# Yes, the quoting is very tricky. We want $foo_dir and $f expanded when -# we echo the function definition but we don't want any of the other -# items expanded until the function is run. -try_command_on_node $test_node "echo 'function testparm () { tp=\$(which testparm 2>/dev/null) ; if [ -n \"\$2\" ] ; then echo path = '\"$foo_dir\"' ; else \$tp \"\$@\" ; fi ; }' >\"$f\" ; chmod +x \"$f\"" - -n="$rc_local_d/samba-skip-share-check" -n_contents='loadconfig() { - _loadconfig "$@" - - if [ "$1" = "ctdb" ] ; then - CTDB_SAMBA_SKIP_SHARE_CHECK=no - fi -} -' -echo "Installing \"$n\" with CTDB_SAMBA_SKIP_SHARE_CHECK=no..." -try_command_on_node $test_node "echo '$n_contents' >\"$n\" ; chmod +x \"$n\"" - -wait_until_node_has_status $test_node unhealthy $monitor_timeout -- CTDB repository