Re: [ovs-dev] [PATCH] ovsdb: raft: Fix inability to join the cluster after interrupted attempt.

Dumitru Ceara Tue, 22 Feb 2022 06:34:50 -0800

On 1/28/22 19:51, Ilya Maximets wrote:
> If the joining server re-connects while catching up (e.g. if it crashed
> or connection got closed due to inactivity), the data we sent might be
> lost, so the server will never reply to append request or a snapshot
> installation request.  At the same time, leader will decline all the
> subsequent requests to join from that server with the 'in progress'
> resolution.  At this point the new server will never be able to join
> the cluster, because it will never receive the raft log while leader
> thinks that it was already sent.
> 
> This happened in practice when one of the servers got preempted for a
> few seconds, so the leader closed connection due to inactivity.
> 
> Destroying the joining server if disconnection detected.  This will
> allow to start the joining from scratch when the server re-connects
> and sends the new join request.
> 
> We can't track re-connection in the raft_conn_run(), because it's
> incoming connection and the jsonrpc will not keep it alive or
> try to reconnect.  Next time the server re-connects it will be an
> entirely new raft conn.
> 
> Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered 
> databases.")
> Reported-at: https://bugzilla.redhat.com/2033514
> Signed-off-by: Ilya Maximets <i.maxim...@ovn.org>
> ---


As far as I can tell, this change is fine; the test case also helps!

Acked-by: Dumitru Ceara <dce...@redhat.com>

Regards,
Dumitru

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] ovsdb: raft: Fix inability to join the cluster after interrupted attempt.

Reply via email to