The branch, 1.2 has been updated via 1d48b3f6cb27d84425863f576c7bbd3e1a8f9863 (commit) via e1cd38eee86ec3d826ba587aa29e587ec7384e56 (commit) from 7b0ddb7b3b4b4ce42ee40872b66269920d9f472a (commit)
http://gitweb.samba.org/?p=ctdb.git;a=shortlog;h=1.2 - Log ----------------------------------------------------------------- commit 1d48b3f6cb27d84425863f576c7bbd3e1a8f9863 Author: Ronnie Sahlberg <ronniesahlb...@gmail.com> Date: Thu Oct 13 17:16:46 2011 +1100 new version 1.2.37 commit e1cd38eee86ec3d826ba587aa29e587ec7384e56 Author: Martin Schwenke <mar...@meltin.net> Date: Fri Oct 7 15:00:42 2011 +1100 Make ctdb_diagnostics more resilient to uncontactable nodes. Current behaviour is for onnode to timeout (for about 20s) for each attempted ssh to a down node. With 40 or 50 invocations of onnode this takes a long time. 2 changes to work around this: * If EXTRA_SSH_OPTS (which is passed to ssh by onnode) does not contains a ConnectTimeout= setting then add a setting for a 5 second timeout. * Filter the nodes before starting any diagnosis, taking out any "bad nodes" that are uncontactable via onnode. In the nodes summary at the beginning of the output, print information about any "bad nodes". Signed-off-by: Martin Schwenke <mar...@meltin.net> ----------------------------------------------------------------------- Summary of changes: packaging/RPM/ctdb.spec.in | 4 +++- tools/ctdb_diagnostics | 34 +++++++++++++++++++++++++++++++++- 2 files changed, 36 insertions(+), 2 deletions(-) Changeset truncated at 500 lines: diff --git a/packaging/RPM/ctdb.spec.in b/packaging/RPM/ctdb.spec.in index d5b081d..06e805e 100644 --- a/packaging/RPM/ctdb.spec.in +++ b/packaging/RPM/ctdb.spec.in @@ -3,7 +3,7 @@ Name: ctdb Summary: Clustered TDB Vendor: Samba Team Packager: Samba Team <sa...@samba.org> -Version: 1.2.36 +Version: 1.2.37 Release: 1GITHASH Epoch: 0 License: GNU GPL version 3 @@ -144,6 +144,8 @@ development libraries for ctdb %{_libdir}/libctdb.a %changelog +* Thu Oct 13 2011 : Version 1.2.37 + - updates to ctdb-diagnostics * Thu Sep 22 2011 : Version 1.2.36 - Fix for delip failing to delete the ip drom the interface S1028798 diff --git a/tools/ctdb_diagnostics b/tools/ctdb_diagnostics index cf166ec..117def8 100755 --- a/tools/ctdb_diagnostics +++ b/tools/ctdb_diagnostics @@ -18,6 +18,7 @@ EOF } nodes=$(ctdb listnodes -Y | cut -d: -f2) +bad_nodes="" diff_opts= no_ads=false @@ -45,6 +46,25 @@ parse_options () parse_options "$@" +# Use 5s ssh timeout if EXTRA_SSH_OPTS doesn't set a timeout. +case "$EXTRA_SSH_OPTS" in + *ConnectTimeout=*) : ;; + *) + export EXTRA_SSH_OPTS="${EXTRA_SSH_OPTS} -o ConnectTimeout=5" +esac + +# Filter nodes. Remove any nodes we can't contact from $node and add +# them to $bad_nodes. +_nodes="" +for _i in $nodes ; do + if onnode $_i true >/dev/null 2>&1 ; then + _nodes="${_nodes}${_nodes:+ }${_i}" + else + bad_nodes="${bad_nodes}${bad_nodes:+,}${_i}" + fi +done +nodes="$_nodes" + nodes_comma=$(echo $nodes | sed -e 's@[[:space:]]@,@g') PATH="$PATH:/sbin:/usr/sbin:/usr/lpp/mmfs/bin" @@ -138,11 +158,23 @@ NUM_ERRORS=0 cat <<EOF Diagnosis started on these nodes: $nodes_comma +EOF + +if [ -n "$bad_nodes" ] ; then + cat <<EOF + +NOT RUNNING DIAGNOSTICS on these uncontactable nodes: +$bad_nodes +EOF + +fi + +cat <<EOF For reference, here is the nodes file on the current node... EOF -show_file /etc/ctdb/nodes +show_file /etc/ctdb/nodes cat <<EOF -------------------------------------------------------------------- -- CTDB repository