Each cluster member typically always transfers leadership to the same
other member, which is the first in their list of servers.  This may
result in two servers in a 3-node cluster to transfer leadership to
each other and never to the third one.

Randomizing the selection to make the load more evenly distributed.

This also makes cluster failure tests cover more scenarios as servers
will transfer leadership to servers they didn't before.  This is
important especially for cluster joining tests.

Ideally, we would transfer to a random server with a highest apply
index, but not trying to implement this for now.

Signed-off-by: Ilya Maximets <i.maxim...@ovn.org>
---
 ovsdb/raft.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/ovsdb/raft.c b/ovsdb/raft.c
index f463afcb3..25f462431 100644
--- a/ovsdb/raft.c
+++ b/ovsdb/raft.c
@@ -1261,8 +1261,12 @@ raft_transfer_leadership(struct raft *raft, const char 
*reason)
         return;
     }
 
+    size_t n = hmap_count(&raft->servers) * 3;
     struct raft_server *s;
-    HMAP_FOR_EACH (s, hmap_node, &raft->servers) {
+
+    while (n--) {
+        s = CONTAINER_OF(hmap_random_node(&raft->servers),
+                         struct raft_server, hmap_node);
         if (!uuid_equals(&raft->sid, &s->sid)
             && s->phase == RAFT_PHASE_STABLE) {
             struct raft_conn *conn = raft_find_conn_by_sid(raft, &s->sid);
-- 
2.43.0

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to