Sorry for hijacking this thread, I'd like to get some clarifications. How is the initial balanced state established, say 100 ovn-controllers connecting to 3 ovn-sb-db?
The ovn-controller doesn't have to connect to the leader of ovn-sb-db, does it? In case it connects to the follower, the write request still needs to be forwarded to the leader, right? These logs keep showing up. ======== 2020-08-05T22:48:33.141Z|103607|reconnect|INFO|tcp:10.6.20.84:6642: connecting... 2020-08-05T22:48:33.151Z|103608|reconnect|INFO|tcp:127.0.0.1:6640: connected 2020-08-05T22:48:33.151Z|103609|reconnect|INFO|tcp:10.6.20.84:6642: connected 2020-08-05T22:48:33.159Z|103610|main|INFO|OVNSB commit failed, force recompute next time. 2020-08-05T22:48:33.161Z|103611|ovsdb_idl|INFO|tcp:10.6.20.84:6642: clustered database server is disconnected from cluster; trying another server 2020-08-05T22:48:33.161Z|103612|reconnect|INFO|tcp:10.6.20.84:6642: connection attempt timed out 2020-08-05T22:48:33.161Z|103613|reconnect|INFO|tcp:10.6.20.84:6642: waiting 2 seconds before reconnect ======== What's that "clustered database server is disconnected from cluster" mean? Thanks! Tony > -----Original Message----- > From: discuss <ovs-discuss-boun...@openvswitch.org> On Behalf Of Han > Zhou > Sent: Wednesday, August 5, 2020 3:05 PM > To: Winson Wang <windson.w...@gmail.com> > Cc: winson wang <zhew...@nvidia.com>; ovn-kuberne...@googlegroups.com; > ovs-discuss@openvswitch.org > Subject: Re: [ovs-discuss] OVN Scale with RAFT: how to make raft cluster > clients to balanced state again > > > > On Wed, Aug 5, 2020 at 12:51 PM Winson Wang <windson.w...@gmail.com > <mailto:windson.w...@gmail.com> > wrote: > > > Hello OVN Experts: > > With large scale ovn-k8s cluster, there are several conditions > that would make ovn-controller clients connect SB central from a > balanced state to an unbalanced state. > > Is there an ongoing project to address this problem? > If not, I have one proposal not sure if it is doable. > Please share your thoughts. > > The issue: > > OVN SB RAFT 3 node cluster, at first all the ovn-controller > clients will connect all the 3 nodes in a balanced state. > > The following conditions will make the connections become > unbalanced. > > * One RAFT node restart, all the ovn-controller clients to > reconnect to the two remaining cluster nodes. > > * Ovn-k8s, after SB raft pods rolling upgrade, the last raft > pod has no client connections. > > > RAFT clients in an unbalanced state would trigger more stress to > the raft cluster, which makes the raft unstable under stress compared > to a balanced state. > > > The proposal solution: > > > > Ovn-controller adds next unix commands “reconnect” with argument of > preferred SB node IP. > > When unbalanced state happens, the UNIX command can trigger ovn- > controller reconnect > > To new SB raft node with fast sync which doesn’t trigger the whole > DB downloading process. > > > > Thanks Winson. The proposal sounds good to me. Will you implement it? > > Han > > > > > > -- > > Winson > > > > -- > You received this message because you are subscribed to the Google > Groups "ovn-kubernetes" group. > To unsubscribe from this group and stop receiving emails from it, > send an email to ovn-kubernetes+unsubscr...@googlegroups.com > <mailto:ovn-kubernetes+unsubscr...@googlegroups.com> . > To view this discussion on the web visit > https://groups.google.com/d/msgid/ovn-kubernetes/CAMu6iS-- > iOW0LxxtkOhJpRT49E-9bJVy0iXraC1LMDUWeu6kLA%40mail.gmail.com > <https://groups.google.com/d/msgid/ovn-kubernetes/CAMu6iS-- > iOW0LxxtkOhJpRT49E- > 9bJVy0iXraC1LMDUWeu6kLA%40mail.gmail.com?utm_medium=email&utm_source=foo > ter> . > _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss