Hi,Han Thanks for your reply. I think maybe we can disconnect the failed follower from the Haproxy then synchronize the date, after all completed, reconnect it to Haproxy again. But I do not know how to synchronize actually. It is just my naive idea. Do you have some suggestion about how to fix this problem. If not very completed, I wii have a try.
Thanks Yun 在 2019-11-28 11:47:55,"Han Zhou" <hz...@ovn.org> 写道: On Wed, Nov 27, 2019 at 7:22 PM taoyunupt <taoyun...@126.com> wrote: > > Hi, > My OVN cluster has 3 OVN-northd nodes, They are proxied by Haproxy with a > VIP. Recently, I restart OVN cluster frequently. One of the members report > the logs below. > After read the code and paper of RAFT, it seems normal process ,If the > follower does not find an entry in its log with the same index and term, then > it refuses the new entries. > I think it's reasonable to refuse. But, as we could not control Haproxy > or some proxy maybe, so it will happen error when an session assignate to the > failed follower. > > Does have some means or ways to solve this problem. Maybe we can kick off > the failed follower or disconnect it from the haproxy then synchronize the > date ? Hope to hear your suggestion. > > > 2019-11-27T14:22:17.060Z|00240|raft|INFO|rejecting append_request because > previous entry 1103,50975 not in local log (mismatch past end of log) > 2019-11-27T14:22:17.064Z|00241|raft|ERR|Dropped 34 log messages in last 12 > seconds (most recently, 0 seconds ago) due to excessive rate > 2019-11-27T14:22:17.064Z|00242|raft|ERR|internal error: deferred append_reply > message completed but not ready to send because message index 14890 is past > last synced index 0: a2b2 append_reply "mismatch past end of log": term=1103 > log_end=14891 result="inconsistency" > 2019-11-27T14:22:17.402Z|00243|raft|INFO|rejecting append_request because > previous entry 1103,50975 not in local log (mismatch past end of log) > > > [root@ovn1 ~]# ovs-appctl -t /var/run/openvswitch/ovnsb_db.ctl > cluster/status OVN_Southbound > a2b2 > Name: OVN_Southbound > Cluster ID: 4c54 (4c546513-77e3-4602-b211-2e200014ad79) > Server ID: a2b2 (a2b2a9c5-cf58-4724-8421-88fd5ca5d94d) > Address: tcp:10.254.8.209:6644 > Status: cluster member > Role: leader > Term: 1103 > Leader: self > Vote: self > > Log: [42052, 51009] > Entries not yet committed: 0 > Entries not yet applied: 0 > Connections: ->beaf ->9a33 <-9a33 <-beaf > Servers: > a2b2 (a2b2 at tcp:10.254.8.209:6644) (self) next_index=15199 > match_index=51008 > beaf (beaf at tcp:10.254.8.208:6644) next_index=51009 match_index=0 > 9a33 (9a33 at tcp:10.254.8.210:6644) next_index=51009 match_index=51008 > I think it is a bug. I noticed that this problem happens when the cluster is restarted after DB compaction. I mentioned it in one of the test cases: https://github.com/openvswitch/ovs/blob/master/tests/ovsdb-cluster.at#L252 I also mentioned another problem related to compaction: https://github.com/openvswitch/ovs/blob/master/tests/ovsdb-cluster.at#L239 I was planning to debug these but didn't get the time yet. I will try to find some time next week (it would be great if you could figure it out and submit patches). Thanks, Han _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev