The branch, master has been updated via 4164d7b ctdb-scripts: Add default filesystem usage warnings via 0f28ccf ctdb-scripts: Add default system memory usage warnings via 2c601f1 ctdb-scripts: Enable system monitoring eventscript by default via b18e4ae ctdb-scripts: Throttle system resource monitoring warnings via e6b5163 ctdb-scripts: Don't shutdown CTDB when memory monitoring fails via b6a0e4b ctdb-scripts: New consistent system memory and swap monitoring via 02fa6c3 ctdb-scripts: Factor out new function check_thresholds() via b7b6e25 ctdb-scripts: Memory monitoring uses thresholds expressed as percentages via bd2845d ctdb-scripts: Use MemAvailable if it is in /proc/meminfo via 99b8ef5 ctdb-scripts: Only use /proc/meminfo for memory checks, not "free" via ab58c7a ctdb-scripts: Move system memory checking to 05.system via b27ff25 ctdb-tests: Remove unwanted trailing whitespace via 23acbd2 ctdb-tests: Add tests for filesystem usage monitoring via fa10506 ctdb-scripts: New configuration variable CTDB_MONITOR_FILESYSTEM_USAGE via 8f713c8 ctdb-scripts: Don't fail monitoring if sanity checks fail via 6b4a46e ctdb-scripts: Move filesystem monitoring into a function, clean it up via 47f7d1b ctdb-scripts: Rename 40.fs_use to 05.system from e139f19 s3: add suport for SMB3_10 and SMB3_11 protocols in smbstatus
https://git.samba.org/?p=samba.git;a=shortlog;h=master - Log ----------------------------------------------------------------- commit 4164d7bf3153a2fd9081b4d073bfa88fec1507ad Author: Martin Schwenke <mar...@meltin.net> Date: Tue Aug 18 15:22:23 2015 +1000 ctdb-scripts: Add default filesystem usage warnings Always check filesystem usage for the database directories. Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> Autobuild-User(master): Amitay Isaacs <ami...@samba.org> Autobuild-Date(master): Sat Aug 29 20:08:48 CEST 2015 on sn-devel-104 commit 0f28ccf87af4e90867eaab213a640f6d0cdaa12d Author: Martin Schwenke <mar...@meltin.net> Date: Fri Aug 14 17:08:45 2015 +1000 ctdb-scripts: Add default system memory usage warnings CTDB should warn by default if too much system memory or swap is used. The tests have also been tweaked. In particular, the filesystem-only tests need to initialise the memory information to avoid errors where meminfo isn't set. Document the defaults, warning against disabling them. Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> commit 2c601f189521ae65ec5ab867c6d8c88cb5d1ae8c Author: Martin Schwenke <mar...@meltin.net> Date: Thu Aug 6 15:59:06 2015 +1000 ctdb-scripts: Enable system monitoring eventscript by default Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> commit b18e4ae0c9536a549722aeef8bc6c095b12db962 Author: Martin Schwenke <mar...@meltin.net> Date: Wed Aug 5 20:42:16 2015 +1000 ctdb-scripts: Throttle system resource monitoring warnings They are only printed when the percentage usage changes. This should stop the logs from being filled with warnings. Add a test for the throttling. Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> commit e6b5163bc1c3551a808d3741b4cbac80e15d10d9 Author: Martin Schwenke <mar...@meltin.net> Date: Mon Aug 3 19:55:27 2015 +1000 ctdb-scripts: Don't shutdown CTDB when memory monitoring fails Marking the node unhealthy should cause Samba processes to close, possible freeing a stack of memory. If not, then it is somebody else's problem. Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> commit b6a0e4b85699241ba90f25f4c605cbb7a6fc2146 Author: Martin Schwenke <mar...@meltin.net> Date: Mon Aug 3 17:22:08 2015 +1000 ctdb-scripts: New consistent system memory and swap monitoring New variables CTDB_MONITOR_MEMORY_USAGE and CTDB_MONITOR_SWAP_USAGE. Both take a pair of <warn_threshold>:<unhealthy_threshold> where each theshold is specified as a percentage. This adds a callout to check_thresholds() that is run when the unhealthy threshold is reached. Add some combination tests. Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> commit 02fa6c3d106e8fbf0e685afafa5e6a9bc0c3d22d Author: Martin Schwenke <mar...@meltin.net> Date: Mon Aug 3 16:20:40 2015 +1000 ctdb-scripts: Factor out new function check_thresholds() Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> commit b7b6e25b3e26210ed196be7fc5848e3320b5c35b Author: Martin Schwenke <mar...@meltin.net> Date: Mon Aug 3 15:59:50 2015 +1000 ctdb-scripts: Memory monitoring uses thresholds expressed as percentages CTDB_MONITOR_FREE_MEMORY and CTDB_MONITOR_FREE_MEMORY_WARN are now percentages that specify thresholds of acceptable memory usage. Memory/swap usage in tests also specified as percentages. Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> commit bd2845d7ebe9e2970d4d5546e51c79c9b40ce9cb Author: Martin Schwenke <mar...@meltin.net> Date: Fri Jul 24 19:57:42 2015 +1000 ctdb-scripts: Use MemAvailable if it is in /proc/meminfo Otherwise calculate, as before. Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> commit 99b8ef512162570504689b53adb14a52233f49b7 Author: Martin Schwenke <mar...@meltin.net> Date: Mon Jul 20 20:50:56 2015 +1000 ctdb-scripts: Only use /proc/meminfo for memory checks, not "free" No need to use 2 different sources of information for similar checks. Also, output of free has been changed, whereas /proc/meminfo is a kernel API, which will not change. Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> commit ab58c7abd9c49325c3cee1e7178d04a3034e57d8 Author: Martin Schwenke <mar...@meltin.net> Date: Mon Jul 20 16:08:13 2015 +1000 ctdb-scripts: Move system memory checking to 05.system Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> commit b27ff251aff6d7c5c59dbe9b1748b30587402aa3 Author: Martin Schwenke <mar...@meltin.net> Date: Thu Aug 20 11:47:19 2015 +1000 ctdb-tests: Remove unwanted trailing whitespace Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> commit 23acbd2f4b0079d1fab01a7dad135e3451efd6d7 Author: Martin Schwenke <mar...@meltin.net> Date: Fri Jul 17 21:32:01 2015 +1000 ctdb-tests: Add tests for filesystem usage monitoring Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> commit fa1050690bd28cac8bc99047a900caf2e5fca22f Author: Martin Schwenke <mar...@meltin.net> Date: Mon Aug 3 14:56:40 2015 +1000 ctdb-scripts: New configuration variable CTDB_MONITOR_FILESYSTEM_USAGE This allows both errors (i.e. unhealthy) and warnings for different thresholds. It replaces CTDB_CHECK_FS_USE. Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> commit 8f713c87c1359ef8780018718f6fa47bb0fa82a7 Author: Martin Schwenke <mar...@meltin.net> Date: Fri Jul 24 19:56:06 2015 +1000 ctdb-scripts: Don't fail monitoring if sanity checks fail Just log some warnings. Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> commit 6b4a46e5742732d7cbdf911b74ab0bb1fc8e3b97 Author: Martin Schwenke <mar...@meltin.net> Date: Fri Jul 17 20:04:44 2015 +1000 ctdb-scripts: Move filesystem monitoring into a function, clean it up Drop obvious comments. Use die() for less lines of code. Use a case statement to avoid forking unnecessary processes for each filesystem being checked. Drop parentheses around percentages in messages. Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> commit 47f7d1b1c8432ffdfb71176cf64cdd31e188e59c Author: Martin Schwenke <mar...@meltin.net> Date: Fri Jul 17 11:59:56 2015 +1000 ctdb-scripts: Rename 40.fs_use to 05.system Will put all the system monitoring in here, simplifying 00.ctdb. Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> ----------------------------------------------------------------------- Summary of changes: ctdb/config/events.d/00.ctdb | 43 ------ ctdb/config/events.d/05.system | 176 +++++++++++++++++++++++ ctdb/config/events.d/40.fs_use | 55 ------- ctdb/doc/ctdbd.conf.5.xml | 92 ++++++------ ctdb/packaging/RPM/ctdb.spec.in | 2 +- ctdb/tests/eventscripts/00.ctdb.monitor.001.sh | 15 -- ctdb/tests/eventscripts/00.ctdb.monitor.002.sh | 15 -- ctdb/tests/eventscripts/00.ctdb.monitor.003.sh | 19 --- ctdb/tests/eventscripts/00.ctdb.monitor.004.sh | 17 --- ctdb/tests/eventscripts/00.ctdb.monitor.005.sh | 21 --- ctdb/tests/eventscripts/05.system.monitor.001.sh | 14 ++ ctdb/tests/eventscripts/05.system.monitor.002.sh | 12 ++ ctdb/tests/eventscripts/05.system.monitor.003.sh | 14 ++ ctdb/tests/eventscripts/05.system.monitor.004.sh | 12 ++ ctdb/tests/eventscripts/05.system.monitor.005.sh | 14 ++ ctdb/tests/eventscripts/05.system.monitor.006.sh | 14 ++ ctdb/tests/eventscripts/05.system.monitor.007.sh | 12 ++ ctdb/tests/eventscripts/05.system.monitor.011.sh | 16 +++ ctdb/tests/eventscripts/05.system.monitor.012.sh | 14 ++ ctdb/tests/eventscripts/05.system.monitor.013.sh | 19 +++ ctdb/tests/eventscripts/05.system.monitor.014.sh | 16 +++ ctdb/tests/eventscripts/05.system.monitor.015.sh | 18 +++ ctdb/tests/eventscripts/05.system.monitor.016.sh | 16 +++ ctdb/tests/eventscripts/05.system.monitor.017.sh | 40 ++++++ ctdb/tests/eventscripts/05.system.monitor.018.sh | 123 ++++++++++++++++ ctdb/tests/eventscripts/scripts/local.sh | 60 +++++--- ctdb/tests/eventscripts/stubs/df | 38 +++++ ctdb/tests/eventscripts/stubs/free | 9 -- ctdb/tests/eventscripts/stubs/ps | 2 +- 29 files changed, 653 insertions(+), 265 deletions(-) create mode 100755 ctdb/config/events.d/05.system delete mode 100644 ctdb/config/events.d/40.fs_use delete mode 100755 ctdb/tests/eventscripts/00.ctdb.monitor.001.sh delete mode 100755 ctdb/tests/eventscripts/00.ctdb.monitor.002.sh delete mode 100755 ctdb/tests/eventscripts/00.ctdb.monitor.003.sh delete mode 100755 ctdb/tests/eventscripts/00.ctdb.monitor.004.sh delete mode 100755 ctdb/tests/eventscripts/00.ctdb.monitor.005.sh create mode 100755 ctdb/tests/eventscripts/05.system.monitor.001.sh create mode 100755 ctdb/tests/eventscripts/05.system.monitor.002.sh create mode 100755 ctdb/tests/eventscripts/05.system.monitor.003.sh create mode 100755 ctdb/tests/eventscripts/05.system.monitor.004.sh create mode 100755 ctdb/tests/eventscripts/05.system.monitor.005.sh create mode 100755 ctdb/tests/eventscripts/05.system.monitor.006.sh create mode 100755 ctdb/tests/eventscripts/05.system.monitor.007.sh create mode 100755 ctdb/tests/eventscripts/05.system.monitor.011.sh create mode 100755 ctdb/tests/eventscripts/05.system.monitor.012.sh create mode 100755 ctdb/tests/eventscripts/05.system.monitor.013.sh create mode 100755 ctdb/tests/eventscripts/05.system.monitor.014.sh create mode 100755 ctdb/tests/eventscripts/05.system.monitor.015.sh create mode 100755 ctdb/tests/eventscripts/05.system.monitor.016.sh create mode 100755 ctdb/tests/eventscripts/05.system.monitor.017.sh create mode 100755 ctdb/tests/eventscripts/05.system.monitor.018.sh create mode 100755 ctdb/tests/eventscripts/stubs/df delete mode 100755 ctdb/tests/eventscripts/stubs/free Changeset truncated at 500 lines: diff --git a/ctdb/config/events.d/00.ctdb b/ctdb/config/events.d/00.ctdb index 0e25e50..da7186f 100755 --- a/ctdb/config/events.d/00.ctdb +++ b/ctdb/config/events.d/00.ctdb @@ -116,46 +116,6 @@ set_ctdb_variables () done } -monitor_system_memory () -{ - # If monitoring free memory then calculate how much there is - if [ -n "$CTDB_MONITOR_FREE_MEMORY_WARN" -o \ - -n "$CTDB_MONITOR_FREE_MEMORY" ] ; then - free_mem=$(free -m | awk '$2 == "buffers/cache:" { print $4 }') - fi - - # Shutdown CTDB when memory is below the configured limit - if [ -n "$CTDB_MONITOR_FREE_MEMORY" ] ; then - if [ $free_mem -le $CTDB_MONITOR_FREE_MEMORY ] ; then - echo "CRITICAL: OOM - ${free_mem}MB free <= ${CTDB_MONITOR_FREE_MEMORY}MB (CTDB threshold)" - echo "CRITICAL: Shutting down CTDB!!!" - get_proc "meminfo" - ps auxfww - set_proc "sysrq-trigger" "m" - ctdb disable - sleep 3 - ctdb shutdown - fi - fi - - # Warn when low on memory - if [ -n "$CTDB_MONITOR_FREE_MEMORY_WARN" ] ; then - if [ $free_mem -le $CTDB_MONITOR_FREE_MEMORY_WARN ] ; then - echo "WARNING: free memory is low - ${free_mem}MB free <= ${CTDB_MONITOR_FREE_MEMORY_WARN}MB (CTDB threshold)" - fi - fi - - # We should never enter swap, so SwapTotal == SwapFree. - if [ "$CTDB_CHECK_SWAP_IS_NOT_USED" = "yes" ] ; then - set -- $(get_proc "meminfo" | awk '$1 ~ /Swap(Total|Free):/ { print $2 }') - if [ "$1" != "$2" ] ; then - echo We are swapping: - get_proc "meminfo" - ps auxfww - fi - fi -} - ############################################################ ctdb_check_args "$@" @@ -187,9 +147,6 @@ case "$1" in startup) ctdb attach ctdb.tdb persistent ;; - monitor) - monitor_system_memory - ;; *) ctdb_standard_event_handler "$@" diff --git a/ctdb/config/events.d/05.system b/ctdb/config/events.d/05.system new file mode 100755 index 0000000..69fcec2 --- /dev/null +++ b/ctdb/config/events.d/05.system @@ -0,0 +1,176 @@ +#!/bin/sh +# ctdb event script for checking local file system utilization + +[ -n "$CTDB_BASE" ] || \ + export CTDB_BASE=$(cd -P $(dirname "$0") ; dirname "$PWD") + +. $CTDB_BASE/functions +loadconfig + +ctdb_setup_service_state_dir "system-monitoring" + +validate_percentage () +{ + case "$1" in + "") return 1 ;; # A failure that doesn't need a warning + [0-9]|[0-9][0-9]|100) return 0 ;; + *) echo "WARNING: ${1} is an invalid percentage${2:+ in \"}${2}${2:+\"} check" + return 1 + esac +} + +check_thresholds () +{ + _thing="$1" + _thresholds="$2" + _usage="$3" + _unhealthy_callout="$4" + + case "$_thresholds" in + *:*) + _warn_threshold="${_thresholds%:*}" + _unhealthy_threshold="${_thresholds#*:}" + ;; + *) + _warn_threshold="$_thresholds" + _unhealthy_threshold="" + esac + + _t=$(echo "$_thing" | sed -e 's@/@SLASH_@g' -e 's@ @_@g') + _cache="${service_state_dir}/cache_${_t}" + if validate_percentage "$_unhealthy_threshold" "$_thing" ; then + if [ "$_usage" -ge "$_unhealthy_threshold" ] ; then + echo "ERROR: ${_thing} utilization ${_usage}% >= threshold ${_unhealthy_threshold}%" + eval "$_unhealthy_callout" + echo "$_usage" >"$_cache" + exit 1 + fi + fi + + if validate_percentage "$_warn_threshold" "$_what" ; then + if [ "$_usage" -ge "$_warn_threshold" ] ; then + if [ -r "$_cache" ] ; then + read _prev <"$_cache" + else + _prev="" + fi + if [ "$_usage" != "$_prev" ] ; then + echo "WARNING: ${_thing} utilization ${_usage}% >= threshold ${_warn_threshold}%" + echo "$_usage" >"$_cache" + fi + else + if [ -r "$_cache" ] ; then + echo "NOTICE: ${_thing} utilization ${_usage}% < threshold ${_warn_threshold}%" + fi + rm -f "$_cache" + fi + fi +} + +set_monitor_filsystem_usage_defaults () +{ + _fs_defaults_cache="${service_state_dir}/cache_monitor_filsystem_usage_defaults" + + if [ ! -r "$_fs_defaults_cache" ] ; then + # Determine filesystem for each database directory, generate + # an entry to warn at 90%, de-duplicate entries, put all items + # on 1 line (so the read below gets everything) + for _t in "${CTDB_DBDIR:-${CTDB_VARDIR}}" \ + "${CTDB_DBDIR_PERSISTENT:-${CTDB_VARDIR}/persistent}" \ + "${CTDB_DBDIR_STATE:-${CTDB_VARDIR}/state}" ; do + df -kP "$_t" | awk 'NR == 2 { printf "%s:90\n", $6 }' + done | sort -u | xargs >"$_fs_defaults_cache" + fi + + read CTDB_MONITOR_FILESYSTEM_USAGE <"$_fs_defaults_cache" +} + +monitor_filesystem_usage () +{ + if [ -z "$CTDB_MONITOR_FILESYSTEM_USAGE" ] ; then + set_monitor_filsystem_usage_defaults + fi + + # Check each specified filesystem, specified in format + # <fs_mount>:<fs_warn_threshold>[:fs_unhealthy_threshold] + for _fs in $CTDB_MONITOR_FILESYSTEM_USAGE ; do + _fs_mount="${_fs%%:*}" + _fs_thresholds="${_fs#*:}" + + if [ ! -d "$_fs_mount" ]; then + echo "WARNING: Directory ${_fs_mount} does not exist" + continue + fi + + # Get current utilization + _fs_usage=$(df -kP "$_fs_mount" | \ + sed -n -e 's@.*[[:space:]]\([[:digit:]]*\)%.*@\1@p') + if [ -z "$_fs_usage" ] ; then + echo "WARNING: Unable to get FS utilization for ${_fs_mount}" + continue + fi + + check_thresholds "Filesystem ${_fs_mount}" \ + "$_fs_thresholds" \ + "$_fs_usage" + done +} + +dump_memory_info () +{ + get_proc "meminfo" + ps auxfww + set_proc "sysrq-trigger" "m" +} + +monitor_memory_usage () +{ + # Defaults + if [ -z "$CTDB_MONITOR_MEMORY_USAGE" ] ; then + CTDB_MONITOR_MEMORY_USAGE=80 + fi + if [ -z "$CTDB_MONITOR_SWAP_USAGE" ] ; then + CTDB_MONITOR_SWAP_USAGE=25 + fi + + _meminfo=$(get_proc "meminfo") + set -- $(echo "$_meminfo" | awk ' +$1 == "MemAvailable:" { memavail += $2 } +$1 == "MemFree:" { memfree += $2 } +$1 == "Cached:" { memfree += $2 } +$1 == "Buffers:" { memfree += $2 } +$1 == "MemTotal:" { memtotal = $2 } +$1 == "SwapFree:" { swapfree = $2 } +$1 == "SwapTotal:" { swaptotal = $2 } +END { + if (memavail != 0) { memfree = memavail ; } + print int((memtotal - memfree) / memtotal * 100), + int((swaptotal - swapfree) / swaptotal * 100) +}') + _mem_usage="$1" + _swap_usage="$2" + + check_thresholds "System memory" \ + "$CTDB_MONITOR_MEMORY_USAGE" \ + "$_mem_usage" \ + dump_memory_info + + check_thresholds "System swap" \ + "$CTDB_MONITOR_SWAP_USAGE" \ + "$_swap_usage" \ + dump_memory_info +} + + +case "$1" in + monitor) + monitor_filesystem_usage + monitor_memory_usage + ;; + + *) + ctdb_standard_event_handler "$@" + ;; +esac + +exit 0 diff --git a/ctdb/config/events.d/40.fs_use b/ctdb/config/events.d/40.fs_use deleted file mode 100644 index 603b463..0000000 --- a/ctdb/config/events.d/40.fs_use +++ /dev/null @@ -1,55 +0,0 @@ -#!/bin/sh -# ctdb event script for checking local file system utilization - -[ -n "$CTDB_BASE" ] || \ - export CTDB_BASE=$(cd -P $(dirname "$0") ; dirname "$PWD") - -. $CTDB_BASE/functions -loadconfig - -case "$1" in - monitor) - # check each specified fs to be checked - # config format is <fs_mount>:<fs_threshold> - for fs in $CTDB_CHECK_FS_USE - do - # parse fs_mount and fs_threshold - fs_mount="${fs%:*}" - fs_threshold="${fs#*:}" - - # check if given fs_mount is existing directory - if [ ! -d "$fs_mount" ]; then - echo "Directory $fs_mount does not exist" - exit 1 - fi - - # check if given fs_threshold is number - if ! (echo "$fs_threshold" | egrep -q '^[0-9]+$') ; then - echo "Threshold $fs_threshold is invalid number" - exit 1 - fi - - # get utilization of given fs from df - fs_usage=$(df -kP $fs_mount | sed -n -e 's@.*[[:space:]]\([[:digit:]]*\)%.*@\1@p') - - # check if fs_usage is number - if [ -z "$fs_usage" ] ; then - echo "Unable to get FS utilization for $fs_mount" - exit 1 - fi - - # check if fs_usage is higher than or equal to fs_threshold - if [ "$fs_usage" -ge "$fs_threshold" ] ; then - echo "ERROR: Utilization of $fs_mount ($fs_usage%) is higher than threshold ($fs_threshold%)" - exit 1 - fi - done - - ;; - - *) - ctdb_standard_event_handler "$@" - ;; -esac - -exit 0 diff --git a/ctdb/doc/ctdbd.conf.5.xml b/ctdb/doc/ctdbd.conf.5.xml index da53e51..f45c724 100644 --- a/ctdb/doc/ctdbd.conf.5.xml +++ b/ctdb/doc/ctdbd.conf.5.xml @@ -1279,91 +1279,91 @@ CTDB_PER_IP_ROUTING_TABLE_ID_HIGH=9000 <para> CTDB can experience seemingly random (performance and other) - issues if system resources become too contrained. Options in - this section can be enabled to allow certain system resources to - be checked. + issues if system resources become too constrained. Options in + this section can be enabled to allow certain system resources + to be checked. They allows warnings to be logged and nodes to + be marked unhealthy when system resource usage reaches the + configured thresholds. + </para> + + <para> + Some checks are enabled by default. It is recommended that + these checks remain enabled or are augmented by extra checks. + There is no supported way of completely disabling the checks. </para> <refsect3> <title>Eventscripts</title> <simplelist> - <member><filename>00.ctdb</filename></member> - <member><filename>40.fs_use</filename></member> + <member><filename>05.system</filename></member> </simplelist> <para> - Filesystem usage monitoring is in - <filename>40.fs_use</filename>. This eventscript is not - enabled by default. Use <command>ctdb - enablescript</command> to enable it. + Filesystem and memory usage monitoring is in + <filename>05.system</filename>. </para> </refsect3> <variablelist> <varlistentry> - <term>CTDB_CHECK_FS_USE=<parameter>FS-LIMIT-LIST</parameter></term> + <term>CTDB_MONITOR_FILESYSTEM_USAGE=<parameter>FS-LIMIT-LIST</parameter></term> <listitem> <para> FS-LIMIT-LIST is a space-separated list of - <parameter>FILESYSTEM</parameter>:<parameter>LIMIT</parameter> - pairs indicating that a node should be flagged unhealthy - if the space used on FILESYSTEM reaches LIMIT%. - </para> - - <para> - No default. + <parameter>FILESYSTEM</parameter>:<parameter>WARN_LIMIT</parameter><optional>:<parameter>UNHEALTHY_LIMIT</parameter></optional> + triples indicating that warnings should be logged if the + space used on FILESYSTEM reaches WARN_LIMIT%. If usage + reaches UNHEALTHY_LIMIT then the node should be flagged + unhealthy. Either WARN_LIMIT or UNHEALTHY_LIMIT may be + left blank, meaning that check will be omitted. </para> <para> - Note that this feature uses the - <filename>40.fs_use</filename> eventscript, which is not - enabled by default. Use <command>ctdb - enablescript</command> to enable it. + Default is to warn for each filesystem containing a + database directory (<envar>CTDB_DBDIR</envar>, + <envar>CTDB_DBDIR_PERSISTENT</envar>, + <envar>CTDB_DBDIR_STATE</envar>) with a threshold of + 90%. </para> </listitem> </varlistentry> <varlistentry> - <term>CTDB_CHECK_SWAP_IS_NOT_USED=yes|no</term> + <term>CTDB_MONITOR_MEMORY_USAGE=<parameter>MEM-LIMITS</parameter></term> <listitem> <para> - Should a warning be logged if swap space is in use. + MEM-LIMITS takes the form + <parameter>WARN_LIMIT</parameter><optional>:<parameter>UNHEALTHY_LIMIT</parameter></optional> + indicating that warnings should be logged if memory + usage reaches WARN_LIMIT%. If usage reaches + UNHEALTHY_LIMIT then the node should be flagged + unhealthy. Either WARN_LIMIT or UNHEALTHY_LIMIT may be + left blank, meaning that check will be omitted. </para> <para> - Default is no. + Default is 80, so warnings will be logged when memory + usage reaches 80%. </para> </listitem> </varlistentry> <varlistentry> - <term>CTDB_MONITOR_FREE_MEMORY=<parameter>NUM</parameter></term> + <term>CTDB_MONITOR_SWAP_USAGE=<parameter>SWAP-LIMITS</parameter></term> <listitem> <para> - NUM is a lower limit on available system memory, expressed - in megabytes. If this is set and the amount of available - memory falls below this limit then some debug information - will be logged, the node will be disabled and then CTDB - will be shut down. + SWAP-LIMITS takes the form + <parameter>WARN_LIMIT</parameter><optional>:<parameter>UNHEALTHY_LIMIT</parameter></optional> + indicating that warnings should be logged if + swap usage reaches WARN_LIMIT%. If usage reaches + UNHEALTHY_LIMIT then the node should be flagged + unhealthy. Either WARN_LIMIT or UNHEALTHY_LIMIT may be + left blank, meaning that check will be omitted. </para> <para> - No default. - </para> - </listitem> - </varlistentry> - - <varlistentry> - <term>CTDB_MONITOR_FREE_MEMORY_WARN=<parameter>NUM</parameter></term> - <listitem> - <para> - NUM is a lower limit on available system memory, expressed - in megabytes. If this is set and the amount of available - memory falls below this limit then a warning will be - logged. - </para> - <para> - No default. + Default is 25, so warnings will be logged when swap + usage reaches 25%. </para> </listitem> </varlistentry> diff --git a/ctdb/packaging/RPM/ctdb.spec.in b/ctdb/packaging/RPM/ctdb.spec.in index 00f0be5..318dacf 100644 --- a/ctdb/packaging/RPM/ctdb.spec.in +++ b/ctdb/packaging/RPM/ctdb.spec.in @@ -167,6 +167,7 @@ rm -rf $RPM_BUILD_ROOT %{_sysconfdir}/ctdb/functions %{_sysconfdir}/ctdb/events.d/00.ctdb %{_sysconfdir}/ctdb/events.d/01.reclock +%{_sysconfdir}/ctdb/events.d/05.system %{_sysconfdir}/ctdb/events.d/10.interface %{_sysconfdir}/ctdb/events.d/10.external %{_sysconfdir}/ctdb/events.d/13.per_ip_routing @@ -174,7 +175,6 @@ rm -rf $RPM_BUILD_ROOT %{_sysconfdir}/ctdb/events.d/11.routing %{_sysconfdir}/ctdb/events.d/20.multipathd %{_sysconfdir}/ctdb/events.d/31.clamd -%{_sysconfdir}/ctdb/events.d/40.fs_use %{_sysconfdir}/ctdb/events.d/40.vsftpd %{_sysconfdir}/ctdb/events.d/41.httpd %{_sysconfdir}/ctdb/events.d/49.winbind diff --git a/ctdb/tests/eventscripts/00.ctdb.monitor.001.sh b/ctdb/tests/eventscripts/00.ctdb.monitor.001.sh deleted file mode 100755 index 4290d13..0000000 --- a/ctdb/tests/eventscripts/00.ctdb.monitor.001.sh +++ /dev/null @@ -1,15 +0,0 @@ -#!/bin/sh - -. "${TEST_SCRIPTS_DIR}/unit.sh" - -define_test "Memory check, bad situation, no checks enabled" - -setup_memcheck "bad" - -CTDB_MONITOR_FREE_MEMORY="" -CTDB_MONITOR_FREE_MEMORY_WARN="" -CTDB_CHECK_SWAP_IS_NOT_USED="no" - -ok_null - -simple_test diff --git a/ctdb/tests/eventscripts/00.ctdb.monitor.002.sh b/ctdb/tests/eventscripts/00.ctdb.monitor.002.sh deleted file mode 100755 index 6e94012..0000000 --- a/ctdb/tests/eventscripts/00.ctdb.monitor.002.sh +++ /dev/null @@ -1,15 +0,0 @@ -#!/bin/sh - -. "${TEST_SCRIPTS_DIR}/unit.sh" - -define_test "Memory check, good situation, all enabled" - -setup_memcheck -- Samba Shared Repository