When an ovsdb raft cluster in kubernetes is restarted the individual DBs may get assigned different ips. The corresponding DNS entries may take a moment to update, such that other cluster members will still see the old ip address for some time. During that time an ovsdb server from another cluster or another server from the same cluster may appear under the old ip address such that the remaining cluster members reconnect to this ovsdb server. Currently they will notice that either the server id or cluster id does not match what they expect, but will keep the connection alive and just ignore messages from that connection.
Fix this by terminating connections for which we notice a mismatched cluster or server id to force a reconnect such that we will eventually reconnect to the right server once the DNS entries have caught up. Suggested-by: Felix Huettner <[email protected]> Signed-off-by: Felix Moebius <[email protected]> --- v2: - simplified patch as suggested by Ilya --- ovsdb/raft.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/ovsdb/raft.c b/ovsdb/raft.c index a5809b7e7..d549a3fb5 100644 --- a/ovsdb/raft.c +++ b/ovsdb/raft.c @@ -1540,6 +1540,7 @@ raft_conn_receive(struct raft *raft, struct raft_conn *conn, if (error) { char *s = ovsdb_error_to_string_free(error); VLOG_INFO("%s: %s", jsonrpc_session_get_name(conn->js), s); + jsonrpc_session_force_reconnect(conn->js); free(s); return false; } @@ -1555,6 +1556,7 @@ raft_conn_receive(struct raft *raft, struct raft_conn *conn, SID_FMT" (expected "SID_FMT")", jsonrpc_session_get_name(conn->js), SID_ARGS(&rpc->common.sid), SID_ARGS(&conn->sid)); + jsonrpc_session_force_reconnect(conn->js); raft_rpc_uninit(rpc); return false; } -- 2.52.0 _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
