Hi,Han
         Another question. NO COMPACT. If restart a follower , leader sender 
some entries during the  break time, when it has started, if it also happend to 
this problem?  What is the difference between simply restart and COMPACT with 
restart ?


Thanks,
Yun








在 2019-11-28 13:58:36,"taoyunupt" <taoyun...@126.com> 写道:

Hi,Han
         Thanks for your reply.  I think maybe we can disconnect the failed 
follower from the Haproxy then synchronize the date, after all completed, 
reconnect it to Haproxy again. But I do not know how to synchronize actually.  
         It is just my naive idea. Do you have some suggestion about how to fix 
this problem.  If not very completed, I wii have a try.


Thanks 
Yun






在 2019-11-28 11:47:55,"Han Zhou" <hz...@ovn.org> 写道:



On Wed, Nov 27, 2019 at 7:22 PM taoyunupt <taoyun...@126.com> wrote:
>
> Hi,
>     My OVN cluster has 3 OVN-northd nodes, They are proxied by Haproxy with a 
> VIP. Recently, I restart OVN cluster frequently.  One of the members report 
> the logs below.
>     After read the code and paper of RAFT, it seems normal process ,If the 
> follower does not find an entry in its log with the same index and term, then 
> it refuses the new entries.
>     I think it's reasonable to refuse. But, as we could not control Haproxy 
> or some proxy maybe, so it will happen error when an session assignate to the 
> failed follower.
>    
>     Does have some means or ways to solve this problem. Maybe we can kick off 
> the failed follower or disconnect it from the haproxy then synchronize the 
> date ?  Hope to hear your suggestion.
>
>
> 2019-11-27T14:22:17.060Z|00240|raft|INFO|rejecting append_request because 
> previous entry 1103,50975 not in local log (mismatch past end of log)
> 2019-11-27T14:22:17.064Z|00241|raft|ERR|Dropped 34 log messages in last 12 
> seconds (most recently, 0 seconds ago) due to excessive rate
> 2019-11-27T14:22:17.064Z|00242|raft|ERR|internal error: deferred append_reply 
> message completed but not ready to send because message index 14890 is past 
> last synced index 0: a2b2 append_reply "mismatch past end of log": term=1103 
> log_end=14891 result="inconsistency"
> 2019-11-27T14:22:17.402Z|00243|raft|INFO|rejecting append_request because 
> previous entry 1103,50975 not in local log (mismatch past end of log)
>
>
> [root@ovn1 ~]#  ovs-appctl -t /var/run/openvswitch/ovnsb_db.ctl 
> cluster/status OVN_Southbound
> a2b2
> Name: OVN_Southbound
> Cluster ID: 4c54 (4c546513-77e3-4602-b211-2e200014ad79)
> Server ID: a2b2 (a2b2a9c5-cf58-4724-8421-88fd5ca5d94d)
> Address: tcp:10.254.8.209:6644
> Status: cluster member
> Role: leader
> Term: 1103
> Leader: self
> Vote: self
>
> Log: [42052, 51009]
> Entries not yet committed: 0
> Entries not yet applied: 0
> Connections: ->beaf ->9a33 <-9a33 <-beaf
> Servers:
>     a2b2 (a2b2 at tcp:10.254.8.209:6644) (self) next_index=15199 
> match_index=51008
>     beaf (beaf at tcp:10.254.8.208:6644) next_index=51009 match_index=0
>     9a33 (9a33 at tcp:10.254.8.210:6644) next_index=51009 match_index=51008

>


I think it is a bug. I noticed that this problem happens when the cluster is 
restarted after DB compaction. I mentioned it in one of the test cases: 
https://github.com/openvswitch/ovs/blob/master/tests/ovsdb-cluster.at#L252
I also mentioned another problem related to compaction: 
https://github.com/openvswitch/ovs/blob/master/tests/ovsdb-cluster.at#L239
I was planning to debug these but didn't get the time yet. I will try to find 
some time next week (it would be great if you could figure it out and submit 
patches).



Thanks,
Han
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to