The branch, 2.5 has been updated via dfed4f36662df4b0fccb45ef390eea36382a3738 (commit) via e35ded87829ec6858fb5e045bc428c3f7454ee45 (commit) via 567359a8d8c6b355054eb1132c765516f5cf7249 (commit) via 8c168b37d2fec274bace439e504e1d32b4a3357a (commit) via 952040ace9cc34dcaab96f238341cd42eb6dc4f0 (commit) via 36f3f2c2f2e40ce9df69907a71c19940df9a5864 (commit) via 3f46d376f3019ed579951be474f11ac5e1744ea1 (commit) via 64ccd71ba19a2cb7d0bc5a7259d80d0520ab69d0 (commit) via bd03bb7370edea2d4d74ce3f91eb109acf776d8f (commit) via 1eb332804e66b0a9d57045e1e6f15a22eb89425e (commit) via 08763a59fc56eba28dcb652f1fc5ba97bef42647 (commit) via d59ebfb00a44b23400a3ecc602ab4542af06018f (commit) via 9c125995fec927b49ae228d2e94ffb69f32f2b69 (commit) via a5d07817c00efcbd434f2e10696a8fdbbab641c9 (commit) via d532f63178c17bc5faea3e688b9d2e026a617b9d (commit) via 6796f0d6c95755c3270ef3deea6da10c8d8473f7 (commit) via 08832d8b2398f4f3af73e781c805feeaffdc0469 (commit) via eb37e2108d10257dfafe2bc7a719690dd2d466d5 (commit) via 645f15c98d572b703cecabcc2af2abb05e9b6e67 (commit) from 70c7ef023730d8344ca4afde2c94634dd541101f (commit)
https://git.samba.org/?p=ctdb.git;a=shortlog;h=2.5 - Log ----------------------------------------------------------------- commit dfed4f36662df4b0fccb45ef390eea36382a3738 Author: Martin Schwenke <mar...@meltin.net> Date: Fri Aug 8 11:42:51 2014 +1000 logging: Rename ctdb_log_handler() to ctdb_child_log_handler() Now it is obvious that it has something to do with child processes. Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> (Imported from commit 7d391b746695d7a262e4f939f057ee1d1685e12b) commit e35ded87829ec6858fb5e045bc428c3f7454ee45 Author: Martin Schwenke <mar...@meltin.net> Date: Wed Oct 8 14:22:53 2014 +1100 logging: Remove debug levels DEBUG_ALERT and DEBUG_CRIT Internally map them to DEBUG_ERR to limit code churn. This reduces the unwieldy number of debug levels used by CTDB. ALERT and CRIT aren't of much use as separate errors, since everything from ERR up should always be logged. In future just ERR can be used. This also improves compatibility with Samba's debug.c system priority mapping. Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> (Imported from commit f4fc9a153c533968905b8c7945c6615dcd9253d1) commit 567359a8d8c6b355054eb1132c765516f5cf7249 Author: Martin Schwenke <mar...@meltin.net> Date: Wed Oct 8 14:19:22 2014 +1100 logging: Remove DEBUG_EMERG It isn't used and shouldn't be. CTDB can't make the system unusable. Update associated test to ensure that EMERG isn't attempted. Actually test all remaining debug levels and modernise the test a bit. Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> (Imported from commit 0eabbb8c2b91b61a23f20e04605fdbd653c5cbcb) commit 8c168b37d2fec274bace439e504e1d32b4a3357a Author: Amitay Isaacs <ami...@gmail.com> Date: Tue Oct 14 17:52:55 2014 +1100 tools: Fix heap-use-after-free problem Found by address sanitizer. Signed-off-by: Amitay Isaacs <ami...@gmail.com> Reviewed-by: Martin Schwenke <mar...@meltin.net> Autobuild-User(master): Martin Schwenke <mart...@samba.org> Autobuild-Date(master): Fri Oct 17 12:56:02 CEST 2014 on sn-devel-104 (Imported from commit 470af881479d1a1588dc23ef40622b4d8f006b61) commit 952040ace9cc34dcaab96f238341cd42eb6dc4f0 Author: Amitay Isaacs <ami...@gmail.com> Date: Wed Apr 23 18:02:39 2014 +1000 recoverd: Process all the records for vacuum fetch in a loop Processing one migration request at a time is very slow and processing a batch of records can take longer than VacuumInterval. This causes subsequent vacuum fetch requests to be dropped. The dropped records can accumulate quickly and will cause the vacuum database traverse to be quite expensive. Signed-off-by: Amitay Isaacs <ami...@gmail.com> Reviewed-by: Martin Schwenke <mar...@meltin.net> Autobuild-User(master): Amitay Isaacs <ami...@samba.org> Autobuild-Date(master): Fri Dec 5 17:06:58 CET 2014 on sn-devel-104 (Imported from commit 959b9ea0ef85c57ffc84d66a6e5e855868943391) commit 36f3f2c2f2e40ce9df69907a71c19940df9a5864 Author: Amitay Isaacs <ami...@gmail.com> Date: Mon Apr 14 14:53:25 2014 +1000 vacuum: Do not delete VACUUM MIGRATED records immediately Such records should be processed by the local vacuuming daemon to ensure that all the remote copies have been deleted first. Signed-off-by: Amitay Isaacs <ami...@gmail.com> Reviewed-by: Martin Schwenke <mar...@meltin.net> (Imported from commit 257311e337065f089df688cbf261d2577949203d) commit 3f46d376f3019ed579951be474f11ac5e1744ea1 Author: Amitay Isaacs <ami...@gmail.com> Date: Thu Nov 6 09:33:50 2014 +1100 vacuum: Use non-blocking lock when traversing delete tree This avoids vacuuming getting in the way of ctdb daemon to process record requests. Signed-off-by: Amitay Isaacs <ami...@gmail.com> Reviewed-by: Martin Schwenke <mar...@meltin.net> (Imported from commit dbb1958284657f26a868705e5f9612bc377fd5e0) commit 64ccd71ba19a2cb7d0bc5a7259d80d0520ab69d0 Author: Amitay Isaacs <ami...@gmail.com> Date: Mon Apr 14 13:18:41 2014 +1000 vacuum: Use non-blocking lock when traversing delete queue This avoids vacuuming getting in the way of ctdb daemon to process record requests. Signed-off-by: Amitay Isaacs <ami...@gmail.com> Reviewed-by: Martin Schwenke <mar...@meltin.net> (Imported from commit d35f512cd972ac1f732fe998b2179242d042082d) commit bd03bb7370edea2d4d74ce3f91eb109acf776d8f Author: Amitay Isaacs <ami...@gmail.com> Date: Fri Feb 21 14:58:00 2014 +1100 vacuum: Stagger vacuuming child processes This prevents multiple child processes being forked at the same time for vacuuming TDBs. Signed-off-by: Amitay Isaacs <ami...@gmail.com> Reviewed-by: Martin Schwenke <mar...@meltin.net> (Imported from commit e4597f8771f42cf315bd163c18b2f27147d3de5f) commit 1eb332804e66b0a9d57045e1e6f15a22eb89425e Author: Amitay Isaacs <ami...@gmail.com> Date: Tue Feb 11 14:23:28 2014 +1100 vacuum: Track time for vacuuming in database statistics Signed-off-by: Amitay Isaacs <ami...@gmail.com> Reviewed-by: Martin Schwenke <mar...@meltin.net> (Imported from commit a0628e317df76c7c38a7cca9c3090077fa352899) commit 08763a59fc56eba28dcb652f1fc5ba97bef42647 Author: Martin Schwenke <mar...@meltin.net> Date: Mon Nov 17 14:15:14 2014 +1100 scripts: Fix stack dumping when debugging hung scripts There are parentheses missing that stop the default pattern from matching commands with trailing garbage (e.g. "exportfs.orig"). A careful check of POSIX (and running GNU sed with --posix) suggests that "\|" isn't a supported way of specifying alternation in a regular expression. Therefore, it is clearer to switch to extended regular expressions so that this has a chance of being portable (even though the point is to print /proc/<pid>/stack, which only works on Linux). Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> Autobuild-User(master): Amitay Isaacs <ami...@samba.org> Autobuild-Date(master): Tue Nov 18 06:37:45 CET 2014 on sn-devel-104 (Imported from commit 7f377cf26ecec10cd77f28c1993c48337279892d) commit d59ebfb00a44b23400a3ecc602ab4542af06018f Author: Martin Schwenke <mar...@meltin.net> Date: Fri Nov 14 16:42:01 2014 +1100 scripts: Try to restart statd after every 10 failures Also add and update tests for statd stack dumps. Update the existing 60.ganesha statd test to do more iterations. Duplicate the result as a new test for 60.nfs. Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> (Imported from commit 4cd5be87daf531cb8a67f31b91cceeaf2c488127) commit 9c125995fec927b49ae228d2e94ffb69f32f2b69 Author: Martin Schwenke <mar...@meltin.net> Date: Fri Nov 14 16:39:07 2014 +1100 scripts: Add rpc.statd stack dumping to Ganesha restart Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> (Imported from commit f51672f5149110025088ef6d1fc59fe7208d2aae) commit a5d07817c00efcbd434f2e10696a8fdbbab641c9 Author: Martin Schwenke <mar...@meltin.net> Date: Fri Nov 14 13:59:16 2014 +1100 scripts: Dump stack traces for hung mountd, rquotad, statd processes Add a corresponding new unit test for statd. Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> (Imported from commit 968401ccdc217d0addb6235739b84dbb9d23e651) commit d532f63178c17bc5faea3e688b9d2e026a617b9d Author: Martin Schwenke <mar...@meltin.net> Date: Fri Nov 14 13:48:16 2014 +1100 scripts: Add optional program name argument to nfs_dump_some_threads() Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> (Imported from commit 1f49e1ab5b317812c0ad482404fb224368726846) commit 6796f0d6c95755c3270ef3deea6da10c8d8473f7 Author: Martin Schwenke <mar...@meltin.net> Date: Fri Nov 14 13:31:03 2014 +1100 scripts: Factor out new function program_stack_traces() In the process, fix a bug where an extra trace would be printed. Signed-off-by: Martin Schwenke <mar...@meltin.net> Reviewed-by: Amitay Isaacs <ami...@gmail.com> (Imported from commit 2ebc305be64cd59ad8cb4ccb6beb6ec6e66bf07a) commit 08832d8b2398f4f3af73e781c805feeaffdc0469 Author: Amitay Isaacs <ami...@gmail.com> Date: Thu Nov 13 11:02:26 2014 +1100 daemon: Improve error handling for running event scripts Signed-off-by: Amitay Isaacs <ami...@gmail.com> Reviewed-by: Martin Schwenke <mar...@meltin.net> Autobuild-User(master): Martin Schwenke <mart...@samba.org> Autobuild-Date(master): Fri Nov 14 03:06:12 CET 2014 on sn-devel-104 (Imported from commit d04bfc6ec6ad7a4749ebfee2284253c4a91a81aa) commit eb37e2108d10257dfafe2bc7a719690dd2d466d5 Author: Amitay Isaacs <ami...@gmail.com> Date: Tue May 13 23:13:13 2014 +1000 build: Move internal include files in a separate directory This will allow to build clustered samba with built-in ctdb tree rather than needing to install CTDB first. Signed-off-by: Amitay Isaacs <ami...@gmail.com> (Imported from commit a0db87ed1edcd199af352e457e35ac018157d646) commit 645f15c98d572b703cecabcc2af2abb05e9b6e67 Author: Amitay Isaacs <ami...@gmail.com> Date: Tue May 13 22:33:03 2014 +1000 build: Fix dependencies on ctdb_version.h This makes sure that parallel compile builds everything correctly. Signed-off-by: Amitay Isaacs <ami...@gmail.com> (Imported from commit a065e693ee5801f12f356b7baa823e6a34271dbc) ----------------------------------------------------------------------- Summary of changes: Makefile.in | 6 ++- common/ctdb_logging.c | 3 -- config/debug-hung-script.sh | 9 ++-- config/events.d/60.ganesha | 1 + config/functions | 47 +++++++++++++-------- config/nfs-rpc-checks.d/10.statd.check | 1 + doc/ctdb.1.xml | 4 +- doc/ctdb.7.xml | 3 -- doc/ctdbd.conf.5.xml | 8 ++-- include/ctdb_protocol.h | 3 ++ include/{ => internal}/cmdline.h | 0 include/{ => internal}/idtree.h | 0 include/{ => internal}/includes.h | 0 server/ctdb_event_helper.c | 48 ++++++++++++++-------- server/ctdb_logging.c | 18 +++----- server/ctdb_ltdb_server.c | 5 +++ server/ctdb_recoverd.c | 5 +-- server/ctdb_vacuum.c | 24 ++++++----- server/eventscript.c | 10 ++++- tests/complex/90_debug_hung_script.sh | 2 +- tests/eventscripts/60.ganesha.monitor.141.sh | 18 +++++++- tests/eventscripts/60.nfs.monitor.143.sh | 15 +++++++ ...anesha.monitor.141.sh => 60.nfs.monitor.144.sh} | 20 ++++++++- tests/eventscripts/scripts/local.sh | 12 +++++- tests/eventscripts/stubs/pidof | 3 ++ tests/simple/13_ctdb_setdebug.sh | 42 +++++++------------ tools/ctdb.c | 16 +++++++- 27 files changed, 210 insertions(+), 113 deletions(-) rename include/{ => internal}/cmdline.h (100%) rename include/{ => internal}/idtree.h (100%) rename include/{ => internal}/includes.h (100%) create mode 100755 tests/eventscripts/60.nfs.monitor.143.sh copy tests/eventscripts/{60.ganesha.monitor.141.sh => 60.nfs.monitor.144.sh} (57%) Changeset truncated at 500 lines: diff --git a/Makefile.in b/Makefile.in index 118d80a..925ea25 100755 --- a/Makefile.in +++ b/Makefile.in @@ -62,7 +62,8 @@ ifeq ($(CC),gcc) EXTRA_CFLAGS=-Wno-format-zero-length -Wno-deprecated-declarations -fPIC endif -CFLAGS=@CPPFLAGS@ -g -I$(srcdir)/include -Iinclude -Ilib -Ilib/util -I$(srcdir) \ +CFLAGS=@CPPFLAGS@ -g -I$(srcdir)/include -I$(srcdir)/include/internal \ + -Iinclude -Ilib -Ilib/util -I$(srcdir) \ $(TALLOC_CFLAGS) $(TEVENT_CFLAGS) $(TDB_CFLAGS) -I@libreplacedir@ \ -DVARDIR=\"$(localstatedir)\" -DETCDIR=\"$(etcdir)\" \ -DCTDB_VARDIR=\"$(localstatedir)/lib/ctdb\" \ @@ -160,6 +161,9 @@ $(CTDB_VERSION_H): @echo Generating $@ $(WRAPPER) ./packaging/mkversion.sh +server/ctdb_daemon.o: $(CTDB_VERSION_H) +tools/ctdb.o: $(CTDB_VERSION_H) + bin/ctdbd: $(CTDB_SERVER_OBJ) @echo Linking $@ $(WRAPPER) $(CC) $(CFLAGS) -o $@ $(CTDB_SERVER_OBJ) $(LIB_FLAGS) diff --git a/common/ctdb_logging.c b/common/ctdb_logging.c index 6dd1a38..bb80fcd 100644 --- a/common/ctdb_logging.c +++ b/common/ctdb_logging.c @@ -176,9 +176,6 @@ int32_t ctdb_control_clear_log(struct ctdb_context *ctdb) } struct debug_levels debug_levels[] = { - {DEBUG_EMERG, "EMERG"}, - {DEBUG_ALERT, "ALERT"}, - {DEBUG_CRIT, "CRIT"}, {DEBUG_ERR, "ERR"}, {DEBUG_WARNING, "WARNING"}, {DEBUG_NOTICE, "NOTICE"}, diff --git a/config/debug-hung-script.sh b/config/debug-hung-script.sh index 34e957c..3f800fc 100755 --- a/config/debug-hung-script.sh +++ b/config/debug-hung-script.sh @@ -1,5 +1,8 @@ #!/bin/sh +# This script only works on Linux. Please modify (and submit patches) +# for other operating systems. + [ -n "$CTDB_BASE" ] || \ export CTDB_BASE=$(cd -P $(dirname "$0") ; echo "$PWD") @@ -28,12 +31,12 @@ fi # Check for processes matching a regular expression and print # stack staces. This could help confirm that certain processes # are stuck in certain places such as the cluster filesystem. The - # regexp should separate items with "\|" and should not contain + # regexp must separate items with "|" and must not contain # parentheses. The default pattern can be replaced for testing. - default_pat='exportfs\|rpcinfo' + default_pat='exportfs|rpcinfo' pat="${CTDB_DEBUG_HUNG_SCRIPT_STACKPAT:-${default_pat}}" echo "$out" | - sed -n "s@.*-\(.*${pat}.*\),\([0-9]*\).*@\2 \1@p" | + sed -r -n "s@.*-(.*(${pat}).*),([0-9]*).*@\3 \1@p" | while read pid name ; do trace=$(cat "/proc/${pid}/stack" 2>/dev/null) if [ $? -eq 0 ] ; then diff --git a/config/events.d/60.ganesha b/config/events.d/60.ganesha index 5640b74..be77e1d 100755 --- a/config/events.d/60.ganesha +++ b/config/events.d/60.ganesha @@ -230,6 +230,7 @@ case "$1" in p="rpc.statd" which $p >/dev/null 2>/dev/null && \ nfs_check_rpc_service "statd" \ + % 10 "verbose restart:b unhealthy" \ -ge 6 "verbose unhealthy" \ -eq 4 "verbose restart" \ -eq 2 "restart:b" diff --git a/config/functions b/config/functions index 9617047..021f2ad 100755 --- a/config/functions +++ b/config/functions @@ -201,6 +201,27 @@ get_proc () } ###################################################### +# Print up to $_max kernel stack traces for processes named $_program +program_stack_traces () +{ + _prog="$1" + _max="${2:-1}" + + _count=1 + for _pid in $(pidof "$_prog") ; do + [ $_count -le $_max ] || break + + # Do this first to avoid racing with process exit + _stack=$(get_proc "${_pid}/stack" 2>/dev/null) + if [ -n "$_stack" ] ; then + echo "Stack trace for ${_prog}[${_pid}]:" + echo "$_stack" + _count=$(($_count + 1)) + fi + done +} + +###################################################### # Check that an RPC service is healthy - # this includes allowing a certain number of failures # before marking the NFS service unhealthy. @@ -371,11 +392,13 @@ _nfs_restart_rpc_service () mountd) echo "Trying to restart $_prog_name [${_p}]" killall -q -9 "$_p" + nfs_dump_some_threads "$_p" $_maybe_background $_p ${MOUNTD_PORT:+-p} $MOUNTD_PORT ;; rquotad) echo "Trying to restart $_prog_name [${_p}]" killall -q -9 "$_p" + nfs_dump_some_threads "$_p" $_maybe_background $_p ${RQUOTAD_PORT:+-p} $RQUOTAD_PORT ;; lockd) @@ -385,6 +408,7 @@ _nfs_restart_rpc_service () statd) echo "Trying to restart $_prog_name [${_p}]" killall -q -9 "$_p" + nfs_dump_some_threads "$_p" $_maybe_background $_p \ ${STATD_HOSTNAME:+-n} $STATD_HOSTNAME \ ${STATD_PORT:+-p} $STATD_PORT \ @@ -668,7 +692,9 @@ startstop_ganesha() service "$_service_name" stop ;; restart) - service "$_service_name" restart + service "$_service_name" stop + nfs_dump_some_threads "rpc.statd" + service "$_service_name" start ;; esac } @@ -735,23 +761,12 @@ startstop_nfs() { # Dump up to the configured number of nfsd thread backtraces. nfs_dump_some_threads () { - [ -n "$CTDB_NFS_DUMP_STUCK_THREADS" ] || CTDB_NFS_DUMP_STUCK_THREADS=5 + _prog="${1:-nfsd}" - # Optimisation to avoid running an unnecessary pidof - [ $CTDB_NFS_DUMP_STUCK_THREADS -gt 0 ] || return 0 + _num="${CTDB_NFS_DUMP_STUCK_THREADS:-5}" + [ $_num -gt 0 ] || return 0 - _count=0 - for _pid in $(pidof nfsd) ; do - [ $_count -le $CTDB_NFS_DUMP_STUCK_THREADS ] || break - - # Do this first to avoid racing with thread exit - _stack=$(get_proc "${_pid}/stack" 2>/dev/null) - if [ -n "$_stack" ] ; then - echo "Stack trace for stuck nfsd thread [${_pid}]:" - echo "$_stack" - _count=$(($_count + 1)) - fi - done + program_stack_traces "$_prog" $_num } ######################################################## diff --git a/config/nfs-rpc-checks.d/10.statd.check b/config/nfs-rpc-checks.d/10.statd.check index d738a32..526e238 100644 --- a/config/nfs-rpc-checks.d/10.statd.check +++ b/config/nfs-rpc-checks.d/10.statd.check @@ -1,3 +1,4 @@ +% 10 verbose restart:b unhealthy -ge 6 verbose unhealthy -eq 4 verbose restart -eq 2 restart:b diff --git a/doc/ctdb.1.xml b/doc/ctdb.1.xml index 054948d..a62d425 100644 --- a/doc/ctdb.1.xml +++ b/doc/ctdb.1.xml @@ -902,7 +902,7 @@ DB Statistics: locking.tdb The list of debug levels from highest to lowest are : </para> <para> - EMERG ALERT CRIT ERR WARNING NOTICE INFO DEBUG + ERR WARNING NOTICE INFO DEBUG </para> </refsect2> @@ -912,7 +912,7 @@ DB Statistics: locking.tdb Set the debug level of a node. This controls what information will be logged. </para> <para> - The debuglevel is one of EMERG ALERT CRIT ERR WARNING NOTICE INFO DEBUG + The debuglevel is one of ERR WARNING NOTICE INFO DEBUG </para> </refsect2> diff --git a/doc/ctdb.7.xml b/doc/ctdb.7.xml index a94b62f..b54fa42 100644 --- a/doc/ctdb.7.xml +++ b/doc/ctdb.7.xml @@ -883,9 +883,6 @@ CTDB_NATGW_DEFAULT_GATEWAY=10.0.0.1 </para> <simplelist> - <member>EMERG (-3)</member> - <member>ALERT (-2)</member> - <member>CRIT (-1)</member> <member>ERR (0)</member> <member>WARNING (1)</member> <member>NOTICE (2)</member> diff --git a/doc/ctdbd.conf.5.xml b/doc/ctdbd.conf.5.xml index 149aa62..803c232 100644 --- a/doc/ctdbd.conf.5.xml +++ b/doc/ctdbd.conf.5.xml @@ -1469,11 +1469,13 @@ CTDB_SET_MonitorInterval=20 <para> REGEXP specifies interesting processes for which stack traces should be logged when debugging hung eventscripts - and those processes are matched in pstree output. See - also <citetitle>CTDB_DEBUG_HUNG_SCRIPT</citetitle>. + and those processes are matched in pstree output. REGEXP + is an extended regexp so choices are separated by pipes + ('|'). However, REGEXP should not contain parentheses. + See also <citetitle>CTDB_DEBUG_HUNG_SCRIPT</citetitle>. </para> <para> - Default is "exportfs\|rpcinfo". + Default is "exportfs|rpcinfo". </para> </listitem> </varlistentry> diff --git a/include/ctdb_protocol.h b/include/ctdb_protocol.h index 629c91c..1068132 100644 --- a/include/ctdb_protocol.h +++ b/include/ctdb_protocol.h @@ -717,6 +717,9 @@ struct ctdb_db_statistics { struct latency_counter latency; uint32_t buckets[MAX_COUNT_BUCKETS]; } locks; + struct { + struct latency_counter latency; + } vacuum; uint32_t db_ro_delegations; uint32_t db_ro_revokes; uint32_t hop_count_bucket[MAX_COUNT_BUCKETS]; diff --git a/include/cmdline.h b/include/internal/cmdline.h similarity index 100% rename from include/cmdline.h rename to include/internal/cmdline.h diff --git a/include/idtree.h b/include/internal/idtree.h similarity index 100% rename from include/idtree.h rename to include/internal/idtree.h diff --git a/include/includes.h b/include/internal/includes.h similarity index 100% rename from include/includes.h rename to include/internal/includes.h diff --git a/server/ctdb_event_helper.c b/server/ctdb_event_helper.c index 9ff763c..f14e336 100644 --- a/server/ctdb_event_helper.c +++ b/server/ctdb_event_helper.c @@ -67,7 +67,7 @@ int main(int argc, char *argv[]) { int log_fd, write_fd; pid_t pid; - int status, output; + int status, output, ret; progname = argv[0]; @@ -99,33 +99,47 @@ int main(int argc, char *argv[]) pid = fork(); if (pid < 0) { + int save_errno = errno; fprintf(stderr, "Failed to fork - %s\n", strerror(errno)); - exit(errno); + sys_write(write_fd, &save_errno, sizeof(save_errno)); + exit(1); } if (pid == 0) { - int save_errno; - - execv(argv[3], &argv[3]); - if (errno == EACCES) { - save_errno = check_executable(argv[3]); - } else { - save_errno = errno; + ret = check_executable(argv[3]); + if (ret != 0) { + _exit(ret); + } + ret = execv(argv[3], &argv[3]); + if (ret != 0) { + int save_errno = errno; fprintf(stderr, "Error executing '%s' - %s\n", - argv[3], strerror(errno)); + argv[3], strerror(save_errno)); } - _exit(save_errno); + /* This should never happen */ + _exit(ENOEXEC); } - waitpid(pid, &status, 0); + ret = waitpid(pid, &status, 0); + if (ret == -1) { + output = -errno; + fprintf(stderr, "waitpid() failed - %s\n", strerror(errno)); + sys_write(write_fd, &output, sizeof(output)); + exit(1); + } if (WIFEXITED(status)) { - output = WEXITSTATUS(status); - if (output == ENOENT || output == ENOEXEC) { - output = -output; - } + output = -WEXITSTATUS(status); + sys_write(write_fd, &output, sizeof(output)); + exit(0); + } + if (WIFSIGNALED(status)) { + output = -EINTR; + fprintf(stderr, "Process terminated with signal - %d\n", + WTERMSIG(status)); sys_write(write_fd, &output, sizeof(output)); - exit(output); + exit(0); } + fprintf(stderr, "waitpid() status=%d\n", status); exit(1); } diff --git a/server/ctdb_logging.c b/server/ctdb_logging.c index 9f6f3b5..eb743ca 100644 --- a/server/ctdb_logging.c +++ b/server/ctdb_logging.c @@ -223,15 +223,6 @@ static void ctdb_syslog_log(const char *format, va_list ap) } switch (this_log_level) { - case DEBUG_EMERG: - level = LOG_EMERG; - break; - case DEBUG_ALERT: - level = LOG_ALERT; - break; - case DEBUG_CRIT: - level = LOG_CRIT; - break; case DEBUG_ERR: level = LOG_ERR; break; @@ -413,8 +404,9 @@ static void write_to_log(struct ctdb_log_state *log, /* called when log data comes in from a child process */ -static void ctdb_log_handler(struct event_context *ev, struct fd_event *fde, - uint16_t flags, void *private) +static void ctdb_child_log_handler(struct event_context *ev, + struct fd_event *fde, + uint16_t flags, void *private) { struct ctdb_log_state *log = talloc_get_type(private, struct ctdb_log_state); char *p; @@ -535,7 +527,7 @@ struct ctdb_log_state *ctdb_vfork_with_logging(TALLOC_CTX *mem_ctx, set_close_on_exec(log->pfd); talloc_set_destructor(log, log_context_destructor); fde = tevent_add_fd(ctdb->ev, log, log->pfd, EVENT_FD_READ, - ctdb_log_handler, log); + ctdb_child_log_handler, log); tevent_fd_set_auto_close(fde); return log; @@ -592,7 +584,7 @@ int ctdb_set_child_logging(struct ctdb_context *ctdb) close(old_stderr); fde = event_add_fd(ctdb->ev, ctdb->log, p[0], - EVENT_FD_READ, ctdb_log_handler, ctdb->log); + EVENT_FD_READ, ctdb_child_log_handler, ctdb->log); tevent_fd_set_auto_close(fde); ctdb->log->pfd = p[0]; diff --git a/server/ctdb_ltdb_server.c b/server/ctdb_ltdb_server.c index fb4bb0a..24ad255 100644 --- a/server/ctdb_ltdb_server.c +++ b/server/ctdb_ltdb_server.c @@ -115,6 +115,11 @@ static int ctdb_ltdb_store_server(struct ctdb_db_context *ctdb_db, * fails. So storing the empty record makes sure that we do not * need to change the client code. */ + if ((header->flags & CTDB_REC_FLAG_VACUUM_MIGRATED) && + (ctdb_db->ctdb->pnn == header->dmaster)) { + keep = true; + schedule_for_deletion = true; + } if (!(header->flags & CTDB_REC_FLAG_VACUUM_MIGRATED)) { keep = true; } else if (ctdb_db->ctdb->pnn != header->dmaster) { diff --git a/server/ctdb_recoverd.c b/server/ctdb_recoverd.c index d3c06b4..39e833c 100644 --- a/server/ctdb_recoverd.c +++ b/server/ctdb_recoverd.c @@ -910,9 +910,7 @@ static void vacuum_fetch_next(struct vacuum_info *v); */ static void vacuum_fetch_callback(struct ctdb_client_call_state *state) { - struct vacuum_info *v = talloc_get_type(state->async.private_data, struct vacuum_info); talloc_free(state); - vacuum_fetch_next(v); } @@ -977,8 +975,7 @@ static void vacuum_fetch_next(struct vacuum_info *v) return; } state->async.fn = vacuum_fetch_callback; - state->async.private_data = v; - return; + state->async.private_data = NULL; } talloc_free(v); diff --git a/server/ctdb_vacuum.c b/server/ctdb_vacuum.c index 5013339..85ce91d 100644 --- a/server/ctdb_vacuum.c +++ b/server/ctdb_vacuum.c @@ -317,12 +317,8 @@ static int delete_marshall_traverse_first(void *param, void *data) uint32_t hash = ctdb_hash(&(dd->key)); int res; - res = tdb_chainlock(ctdb_db->ltdb->tdb, dd->key); + res = tdb_chainlock_nonblock(ctdb_db->ltdb->tdb, dd->key); if (res != 0) { - DEBUG(DEBUG_ERR, - (__location__ " Error getting chainlock on record with " - "key hash [0x%08x] on database db[%s].\n", - hash, ctdb_db->db_name)); recs->vdata->count.delete_list.skipped++; recs->vdata->count.delete_list.left--; talloc_free(dd); @@ -446,12 +442,8 @@ static int delete_queue_traverse(void *param, void *data) vdata->count.delete_queue.total++; - res = tdb_chainlock(ctdb_db->ltdb->tdb, dd->key); + res = tdb_chainlock_nonblock(ctdb_db->ltdb->tdb, dd->key); if (res != 0) { - DEBUG(DEBUG_ERR, - (__location__ " Error getting chainlock on record with " - "key hash [0x%08x] on database db[%s].\n", - hash, ctdb_db->db_name)); vdata->count.delete_queue.error++; return 0; } @@ -1364,6 +1356,7 @@ static int vacuum_child_destructor(struct ctdb_vacuum_child_context *child_ctx) struct ctdb_db_context *ctdb_db = child_ctx->vacuum_handle->ctdb_db; struct ctdb_context *ctdb = ctdb_db->ctdb; + CTDB_UPDATE_DB_LATENCY(ctdb_db, "vacuum", vacuum.latency, l); DEBUG(DEBUG_INFO,("Vacuuming took %.3f seconds for database %s\n", l, ctdb_db->db_name)); if (child_ctx->child_pid != -1) { @@ -1450,6 +1443,17 @@ ctdb_vacuum_event(struct event_context *ev, struct timed_event *te, return; } + /* Do not allow multiple vacuuming child processes to be active at the + * same time. If there is vacuuming child process active, delay + * new vacuuming event to stagger vacuuming events. + */ + if (ctdb->vacuumers != NULL) { + event_add_timed(ctdb->ev, vacuum_handle, + timeval_current_ofs(0, 500*1000), + ctdb_vacuum_event, vacuum_handle); + return; + } + child_ctx = talloc(vacuum_handle, struct ctdb_vacuum_child_context); if (child_ctx == NULL) { DEBUG(DEBUG_CRIT, (__location__ " Failed to allocate child context for vacuuming of %s\n", ctdb_db->db_name)); diff --git a/server/eventscript.c b/server/eventscript.c index ff05617..84dcf68 100644 --- a/server/eventscript.c +++ b/server/eventscript.c @@ -367,6 +367,8 @@ static void ctdb_event_script_handler(struct event_context *ev, struct fd_event r = sys_read(state->fd[0], ¤t->status, sizeof(current->status)); if (r < 0) { current->status = -errno; + } else if (r == 0) { + current->status = -EINTR; } else if (r != sizeof(current->status)) { -- CTDB repository