[SCM] Samba Shared Repository - branch master updated

Martin Schwenke Wed, 21 Aug 2019 06:07:44 -0700

The branch, master has been updated
       via  71ad473ba80 ctdb-tests: Clear deleted record via recovery instead 
of vacuuming
       via  45b9e02f8f6 ctdb-tests: Wait for child process when killing cluster 
mutex helper
       via  ca4df060807 ctdb-tests: Strengthen volatile DB traverse test
       via  5d655ac6f2f ctdb-recoverd: Only check for LMASTER nodes in the VNN 
map
       via  53daeb2f878 ctdb-tests: Don't retrieve the VNN map from target node 
for notlmaster
       via  bff1a3a548a ctdb-tests: Handle special cases first and return
       via  bb59073515e ctdb-tests: Inline handling of recovered and notlmaster 
statuses
       via  9b09a87326a ctdb-tests: Drop unused node statuses frozen/unfrozen
       via  52227d19735 ctdb-tests: Reformat node_has_status()
      from  c3f96981755 lib:crypto: Do not build AES-CMAC if we use GnuTLS that 
supports it


https://git.samba.org/?p=samba.git;a=shortlog;h=master


- Log -----------------------------------------------------------------
commit 71ad473ba805abe23bbe6c1a1290612e448e73f3
Author: Martin Schwenke <mar...@meltin.net>
Date:   Tue Aug 13 14:45:33 2019 +1000

    ctdb-tests: Clear deleted record via recovery instead of vacuuming
    
    This test has been flapping because sometimes the record is not
    vacuumed within the expected time period, perhaps even because the
    check for the record can interfere with vacuuming.  However, instead
    of waiting for vacuuming the record can be cleared by doing a
    recovery.  This should be much more reliable.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085
    RN: Fix flapping CTDB tests
    
    Signed-off-by: Martin Schwenke <mar...@meltin.net>
    Reviewed-by: Amitay Isaacs <ami...@gmail.com>
    
    Autobuild-User(master): Martin Schwenke <mart...@samba.org>
    Autobuild-Date(master): Wed Aug 21 13:06:57 UTC 2019 on sn-devel-184

commit 45b9e02f8f67cba9885e95a0c0af73373d39bafd
Author: Martin Schwenke <mar...@meltin.net>
Date:   Wed Aug 7 16:58:37 2019 +1000

    ctdb-tests: Wait for child process when killing cluster mutex helper
    
    The following test sometimes fails:
    
    ==================================================
    Running "cluster_mutex_test lock-unlock-lock-unlock 
./tests/var/cluster_mutex.lockfile"
    --------------------------------------------------
    Output (Exit status: 134):
    --------------------------------------------------
    LOCK
    UNLOCK
    CONTENTION
    NOLOCK
    cluster_mutex_test: ../../tests/src/cluster_mutex_test.c:307: 
test_lock_unlock_lock_unlock: Assertion `dl2->mh != NULL' failed.
    --------------------------------------------------
    Required output (Exit status: 0):
    --------------------------------------------------
    LOCK
    UNLOCK
    LOCK
    UNLOCK
    
    FAILED
    ==========================================================================
    TEST FAILED: tests/cunit/cluster_mutex_001.sh (status 1) (duration: 0s)
    ==========================================================================
    
    This is due to a race in the test.  For the first UNLOCK a signal is
    sent to the cluster mutex handler but the test tries to retake the
    lock before that process is scheduled and the signal is processed.
    Therefore, the fcntl() lock is still held and contention is seen.
    
    After unlocking, tests need to wait until the child has gone, so build
    this into ctdb_kill().  This is one of the only places where the PID
    is accessible.
    
    Outside of testing, on a real system, nothing will never try
    to (re)take the lock so quickly.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085
    
    Signed-off-by: Martin Schwenke <mar...@meltin.net>
    Reviewed-by: Amitay Isaacs <ami...@gmail.com>

commit ca4df06080709adf0cbebc95b0a70b4090dad5ba
Author: Martin Schwenke <mar...@meltin.net>
Date:   Mon Jul 29 17:22:50 2019 +1000

    ctdb-tests: Strengthen volatile DB traverse test
    
    Check the record count more often, from multiple nodes.  Add a case
    with multiple records.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085
    
    Signed-off-by: Martin Schwenke <mar...@meltin.net>
    Reviewed-by: Amitay Isaacs <ami...@gmail.com>

commit 5d655ac6f2ff82f8f1c89b06870d600a1a3c7a8a
Author: Martin Schwenke <mar...@meltin.net>
Date:   Wed Aug 21 14:35:09 2019 +1000

    ctdb-recoverd: Only check for LMASTER nodes in the VNN map
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085
    
    Signed-off-by: Martin Schwenke <mar...@meltin.net>
    Reviewed-by: Amitay Isaacs <ami...@gmail.com>

commit 53daeb2f878af1634a26e05cb86d87e2faf20173
Author: Martin Schwenke <mar...@meltin.net>
Date:   Mon Jul 29 16:45:07 2019 +1000

    ctdb-tests: Don't retrieve the VNN map from target node for notlmaster
    
    Use the VNN map from the node running node_has_status().
    
    This means that
    
      wait_until_node_has_status 1 notlmaster 10 0
    
    will run "ctdb status" on node 0 and check (for up to 10 seconds) if
    node 1 is in the VNN map.
    
    If the LMASTER capability has been dropped on node 1 then the above
    will wait for the VNN map to be updated on node 0.  This will happen
    as part of the recovery that is triggered by the change of LMASTER
    capability.  The next command will then only be able to attach to
    $TESTDB after the recovery is complete thus guaranteeing a sane state
    for the test to continue.
    
    This stops simple/79_volatile_db_traverse.sh from going into recovery
    during the traverse or at some other inconvenient time.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085
    
    Signed-off-by: Martin Schwenke <mar...@meltin.net>
    Reviewed-by: Amitay Isaacs <ami...@gmail.com>

commit bff1a3a548a2cace997b767d78bb824438664cb7
Author: Martin Schwenke <mar...@meltin.net>
Date:   Mon Jul 29 16:43:09 2019 +1000

    ctdb-tests: Handle special cases first and return
    
    All the other cases involve matching bits.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085
    
    Signed-off-by: Martin Schwenke <mar...@meltin.net>
    Reviewed-by: Amitay Isaacs <ami...@gmail.com>

commit bb59073515ee5f7886b5d9a20d7b2805857c2708
Author: Martin Schwenke <mar...@meltin.net>
Date:   Mon Jul 29 15:45:41 2019 +1000

    ctdb-tests: Inline handling of recovered and notlmaster statuses
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085
    
    Signed-off-by: Martin Schwenke <mar...@meltin.net>
    Reviewed-by: Amitay Isaacs <ami...@gmail.com>

commit 9b09a87326af28877301ad27bcec5bb13744e2b6
Author: Martin Schwenke <mar...@meltin.net>
Date:   Mon Jul 29 15:40:16 2019 +1000

    ctdb-tests: Drop unused node statuses frozen/unfrozen
    
    Silently drop unused local variable mpat.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085
    
    Signed-off-by: Martin Schwenke <mar...@meltin.net>
    Reviewed-by: Amitay Isaacs <ami...@gmail.com>

commit 52227d19735a3305ad633672c70385f443f222f0
Author: Martin Schwenke <mar...@meltin.net>
Date:   Mon Jul 29 15:31:55 2019 +1000

    ctdb-tests: Reformat node_has_status()
    
    Re-indent and drop non-POSIX left-parenthesis from case labels.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085
    
    Signed-off-by: Martin Schwenke <mar...@meltin.net>
    Reviewed-by: Amitay Isaacs <ami...@gmail.com>

-----------------------------------------------------------------------

Summary of changes:
 ctdb/server/ctdb_recoverd.c                        | 14 ++--
 ctdb/tests/scripts/integration.bash                | 80 +++++++++++-----------
 ctdb/tests/simple/69_recovery_resurrect_deleted.sh | 17 ++---
 ctdb/tests/simple/79_volatile_db_traverse.sh       | 67 ++++++++++++++----
 ctdb/tests/src/cluster_mutex_test.c                | 18 ++++-
 5 files changed, 124 insertions(+), 72 deletions(-)


Changeset truncated at 500 lines:

diff --git a/ctdb/server/ctdb_recoverd.c b/ctdb/server/ctdb_recoverd.c
index 2633c755752..c1c2a88b12c 100644
--- a/ctdb/server/ctdb_recoverd.c
+++ b/ctdb/server/ctdb_recoverd.c
@@ -2989,13 +2989,19 @@ static void main_loop(struct ctdb_context *ctdb, struct 
ctdb_recoverd *rec,
                return;
        }
 
-       /* verify that all active nodes in the nodemap also exist in 
-          the vnnmap.
+       /*
+        * Verify that all active lmaster nodes in the nodemap also
+        * exist in the vnnmap
         */
        for (j=0; j<nodemap->num; j++) {
                if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {
                        continue;
                }
+               if (! ctdb_node_has_capabilities(rec->caps,
+                                                ctdb->nodes[j]->pnn,
+                                                CTDB_CAP_LMASTER)) {
+                       continue;
+               }
                if (nodemap->nodes[j].pnn == pnn) {
                        continue;
                }
@@ -3006,8 +3012,8 @@ static void main_loop(struct ctdb_context *ctdb, struct 
ctdb_recoverd *rec,
                        }
                }
                if (i == vnnmap->size) {
-                       DEBUG(DEBUG_ERR, (__location__ " Node %u is active in 
the nodemap but did not exist in the vnnmap\n", 
-                                 nodemap->nodes[j].pnn));
+                       D_ERR("Active LMASTER node %u is not in the vnnmap\n",
+                             nodemap->nodes[j].pnn);
                        ctdb_set_culprit(rec, nodemap->nodes[j].pnn);
                        do_recovery(rec, mem_ctx, pnn, nodemap, vnnmap);
                        return;
diff --git a/ctdb/tests/scripts/integration.bash 
b/ctdb/tests/scripts/integration.bash
index 011aeadee40..284449d4503 100644
--- a/ctdb/tests/scripts/integration.bash
+++ b/ctdb/tests/scripts/integration.bash
@@ -319,53 +319,53 @@ wait_until_ready ()
 # This function is becoming nicely overloaded.  Soon it will collapse!  :-)
 node_has_status ()
 {
-    local pnn="$1"
-    local status="$2"
-
-    local bits fpat mpat rpat
-    case "$status" in
-       (unhealthy)    bits="?|?|?|1|*" ;;
-       (healthy)      bits="?|?|?|0|*" ;;
-       (disconnected) bits="1|*" ;;
-       (connected)    bits="0|*" ;;
-       (banned)       bits="?|1|*" ;;
-       (unbanned)     bits="?|0|*" ;;
-       (disabled)     bits="?|?|1|*" ;;
-       (enabled)      bits="?|?|0|*" ;;
-       (stopped)      bits="?|?|?|?|1|*" ;;
-       (notstopped)   bits="?|?|?|?|0|*" ;;
-       (frozen)       fpat='^[[:space:]]+frozen[[:space:]]+1$' ;;
-       (unfrozen)     fpat='^[[:space:]]+frozen[[:space:]]+0$' ;;
-       (recovered)    rpat='^Recovery mode:RECOVERY \(1\)$' ;;
-       (notlmaster)   rpat="^hash:.* lmaster:${pnn}\$" ;;
+       local pnn="$1"
+       local status="$2"
+
+       case "$status" in
+       recovered)
+               ! $CTDB status -n "$pnn" | \
+                       grep -Eq '^Recovery mode:RECOVERY \(1\)$'
+               return
+               ;;
+       notlmaster)
+               ! $CTDB status | grep -Eq "^hash:.* lmaster:${pnn}\$"
+               return
+               ;;
+       esac
+
+       local bits
+       case "$status" in
+       unhealthy)    bits="?|?|?|1|*" ;;
+       healthy)      bits="?|?|?|0|*" ;;
+       disconnected) bits="1|*" ;;
+       connected)    bits="0|*" ;;
+       banned)       bits="?|1|*" ;;
+       unbanned)     bits="?|0|*" ;;
+       disabled)     bits="?|?|1|*" ;;
+       enabled)      bits="?|?|0|*" ;;
+       stopped)      bits="?|?|?|?|1|*" ;;
+       notstopped)   bits="?|?|?|?|0|*" ;;
        *)
-           echo "node_has_status: unknown status \"$status\""
-           return 1
-    esac
-
-    if [ -n "$bits" ] ; then
+               echo "node_has_status: unknown status \"$status\""
+               return 1
+       esac
        local out x line
 
        out=$($CTDB -X status 2>&1) || return 1
 
        {
-            read x
-            while read line ; do
-               # This needs to be done in 2 steps to avoid false matches.
-               local line_bits="${line#|${pnn}|*|}"
-               [ "$line_bits" = "$line" ] && continue
-               [ "${line_bits#${bits}}" != "$line_bits" ] && return 0
-            done
-           return 1
+               read x
+               while read line ; do
+                       # This needs to be done in 2 steps to
+                       # avoid false matches.
+                       local line_bits="${line#|${pnn}|*|}"
+                       [ "$line_bits" = "$line" ] && continue
+                       [ "${line_bits#${bits}}" != "$line_bits" ] && \
+                               return 0
+               done
+               return 1
        } <<<"$out" # Yay bash!
-    elif [ -n "$fpat" ] ; then
-       $CTDB statistics -n "$pnn" | egrep -q "$fpat"
-    elif [ -n "$rpat" ] ; then
-        ! $CTDB status -n "$pnn" | egrep -q "$rpat"
-    else
-       echo 'node_has_status: unknown mode, neither $bits nor $fpat is set'
-       return 1
-    fi
 }
 
 wait_until_node_has_status ()
diff --git a/ctdb/tests/simple/69_recovery_resurrect_deleted.sh 
b/ctdb/tests/simple/69_recovery_resurrect_deleted.sh
index 8126c49b83c..f6c72c59f2a 100755
--- a/ctdb/tests/simple/69_recovery_resurrect_deleted.sh
+++ b/ctdb/tests/simple/69_recovery_resurrect_deleted.sh
@@ -54,18 +54,11 @@ database_has_zero_records ()
        return 0
 }
 
-echo "Get vacuum interval"
-try_command_on_node -v $second $CTDB getvar VacuumInterval
-vacuum_interval="${out#* = }"
-
-echo "Wait until vacuuming deletes the record on active nodes"
-# Why 4?  Steps are:
-# 1. Original node processes delete queue, asks lmaster to fetch
-# 2. lmaster recoverd fetches
-# 3. lmaster processes delete queue
-# If vacuuming is just missed then need an extra interval
-t=$((vacuum_interval * 4))
-wait_until "${t}/10" database_has_zero_records
+echo "Trigger a recovery"
+try_command_on_node "$second" $CTDB recover
+
+echo "Checking that database has 0 records"
+database_has_zero_records
 
 echo "Continue node ${first}"
 try_command_on_node $first $CTDB continue
diff --git a/ctdb/tests/simple/79_volatile_db_traverse.sh 
b/ctdb/tests/simple/79_volatile_db_traverse.sh
index af7e962f579..7f3007d5105 100755
--- a/ctdb/tests/simple/79_volatile_db_traverse.sh
+++ b/ctdb/tests/simple/79_volatile_db_traverse.sh
@@ -42,11 +42,56 @@ try_command_on_node 0 $CTDB writekey "$TESTDB" "foo" "bar0"
 echo "write foo=bar1 on node 1"
 try_command_on_node 1 $CTDB writekey "$TESTDB" "foo" "bar1"
 
-echo "do traverse on node 0"
-try_command_on_node -v 0 $CTDB catdb "$TESTDB"
+echo
 
-echo "do traverse on node 1"
-try_command_on_node -v 1 $CTDB catdb "$TESTDB"
+check_db_num_records ()
+{
+       local node="$1"
+       local db="$2"
+       local n="$3"
+
+       echo "Checking on node ${node} to ensure ${db} has ${n} records..."
+       try_command_on_node "$node" "${CTDB} catdb ${db}"
+
+       num=$(sed -n -e 's|^Dumped \(.*\) records$|\1|p' "$outfile")
+       if [ "$num" = "$n" ] ; then
+               echo "OK: Number of records=${num}"
+               echo
+       else
+               echo "BAD: There were ${num} (!= ${n}) records"
+               cat "$outfile"
+               exit 1
+       fi
+}
+
+check_db_num_records 0 "$TESTDB" 1
+check_db_num_records 1 "$TESTDB" 1
+
+cat <<EOF
+
+Again, this time with 10 records, rewriting 5 of them on the 2nd node
+
+EOF
+
+echo "wipe test database $TESTDB"
+try_command_on_node 0 $CTDB wipedb "$TESTDB"
+
+for i in $(seq 0 9) ; do
+       k="foo${i}"
+       v="bar${i}@0"
+       echo "write ${k}=${v} on node 0"
+       try_command_on_node 0 "${CTDB} writekey ${TESTDB} ${k} ${v}"
+done
+
+for i in $(seq 1 5) ; do
+       k="foo${i}"
+       v="bar${i}@1"
+       echo "write ${k}=${v} on node 1"
+       try_command_on_node 1 "${CTDB} writekey ${TESTDB} ${k} ${v}"
+done
+
+check_db_num_records 0 "$TESTDB" 10
+check_db_num_records 1 "$TESTDB" 10
 
 cat <<EOF
 
@@ -63,8 +108,6 @@ try_command_on_node 1 $CTDB setlmasterrole off
 try_command_on_node -v 1 $CTDB getcapabilities
 
 wait_until_node_has_status 1 notlmaster 10 0
-# Wait for recovery and new VNN map to be pushed
-#sleep_for 10
 
 echo "write foo=bar0 on node 0"
 try_command_on_node 0 $CTDB writekey "$TESTDB" "foo" "bar0"
@@ -72,16 +115,10 @@ try_command_on_node 0 $CTDB writekey "$TESTDB" "foo" "bar0"
 echo "write foo=bar1 on node 1"
 try_command_on_node 1 $CTDB writekey "$TESTDB" "foo" "bar1"
 
-echo "do traverse on node 0"
-try_command_on_node -v 0 $CTDB catdb "$TESTDB"
+echo
 
-num=$(sed -n -e 's|^Dumped \(.*\) records$|\1|p' "$outfile")
-if [ "$num" = 1 ] ; then
-       echo "OK: There was 1 record"
-else
-       echo "BAD: There were ${num} (!= 1) records"
-       exit 1
-fi
+check_db_num_records 0 "$TESTDB" 1
+check_db_num_records 1 "$TESTDB" 1
 
 if grep -q "^data(4) = \"bar1\"\$" "$outfile" ; then
        echo "OK: Data from node 1 was returned"
diff --git a/ctdb/tests/src/cluster_mutex_test.c 
b/ctdb/tests/src/cluster_mutex_test.c
index 3bf653a3b00..34398a98ea9 100644
--- a/ctdb/tests/src/cluster_mutex_test.c
+++ b/ctdb/tests/src/cluster_mutex_test.c
@@ -53,7 +53,23 @@ static pid_t ctdb_fork(struct ctdb_context *ctdb)
 
 static int ctdb_kill(struct ctdb_context *ctdb, pid_t pid, int signum)
 {
-       return kill(pid, signum);
+       /*
+        * Tests need to wait for the child to exit to ensure that the
+        * lock really has been released.  The PID is only accessible
+        * in ctdb_cluster_mutex.c, so make a best attempt to ensure
+        * that the child process is waited for after it is killed.
+        * Avoid waiting if the process is already gone.
+        */
+       int ret;
+
+       if (signum == 0) {
+               return kill(pid, signum);
+       }
+
+       ret = kill(pid, signum);
+       waitpid(pid, NULL, 0);
+
+       return ret;
 }
 
 #include "server/ctdb_cluster_mutex.c"


-- 
Samba Shared Repository

[SCM] Samba Shared Repository - branch master updated

Reply via email to