Hi Han,

Thanks for taking the time to look into this. This problem is not consistently 
reproduced.
Developers normally ignore it:) I think we collected enough context and we can 
let it go for now.
I will rebuild setup, tune that RAFT heartbeat timer and rerun the test. Will 
keep you posted.


Thanks again!

Tony

________________________________
From: Han Zhou <zhou...@gmail.com>
Sent: July 23, 2020 06:53 PM
To: Tony Liu <tonyliu0...@hotmail.com>; Ben Pfaff <b...@ovn.org>
Cc: Numan Siddique <num...@ovn.org>; ovs-dev <ovs-...@openvswitch.org>; 
ovs-discuss@openvswitch.org <ovs-discuss@openvswitch.org>
Subject: Re: [ovs-dev] OVN: Two datapath-bindings are created for the same 
logical-switch


On Thu, Jul 23, 2020 at 10:33 AM Tony Liu 
<tonyliu0...@hotmail.com<mailto:tonyliu0...@hotmail.com>> wrote:
>
> Changed the title for this specific problem.
> I looked into logs and have more findings.
> The problem was happening when sb-db leader switched.

Hi Tony,

Thanks for this detailed information. Could you confirm which version of OVS is 
used (to understand OVSDB behavior).

>
> For ovsdb cluster, what may trigger the leader switch? Given the log,
> 2020-07-21T01:08:38.119Z|00074|raft|INFO|term 2: 1135 ms timeout expired, 
> starting election
> The election is asked by a follower node. Is that because the connection from 
> follower to leader timeout,
> then follower assumes the leader is dead and starts an election?

You are right, the RAFT heart beat would timeout when server is too busy and 
the election timer is too small (default 1s). For large scale test, please 
increase the election timer by:
ovn-appctl -t <ovn-sb> cluster/change-election-timer OVN_Southbound <time>

I suggest to set <time> to be at least bigger than 10000 or more in your case. 
(you need to increase the value gradually - 2000, 4000, 8000, 16000 - so it 
will take you 4 commands to reach this from the initial default value 1000, not 
very convenient, I know :)

<ovn-sb> here is the path to the socket ctl file of ovn-sb, usually under 
/var/run/ovn.

>
> For ovn-northd (3 instances), they all connect to the sb-db leader, whoever 
> has the locker is the master.
> When sb-db leader switches, all ovn-northd instances look for the new leader. 
> In this case, there is no
> guarantee that the old ovn-northd master remains the role, other ovn-northd 
> instance may find the
> leader and acquire the lock first. So, the sb-db leader switch may also cause 
> ovn-northd master switch.
> Such switch may happen in the middle of ovn-northd transaction, in that case, 
> is there any guarantee to
> the transaction completeness? My guess is that, the older created a 
> datapath-binding for a logical-switch,
> switch happened when this transaction is not completed, then the new 
> master/leader created another
> data-path binding for the same logical-switch. Does it make any sense?

I agree with you it could be related to the failover and the lock behavior 
during the failover. It could be a lock problem causing 2 northds became active 
at the same time for a short moment. However, I still can't imagine how the 
duplicated entries are created with different tunnel keys. If both northd 
create the datapath binding for the same LS at the same time, they should 
allocate the same tunnel key, and then one of them should fail during the 
transaction commit because of index conflict in DB. But here they have 
different keys so both were inserted in DB.

(OVSDB transaction is atomic even during failover and no client should see 
partial data of a transaction.)

(cc Ben to comment more on the possibility of both clients acquiring the lock 
during failover)

>
> From the log, when sb-db switched, ovn-northd master connected to the new 
> leader and lost the master,
> but immediately, it acquired the lock and become master again. Not sure how 
> this happened.

>From the ovn-northd logs, the ovn-northd on .86 firstly connected to SB DB on 
>.85, which suggests that it regarded .85 as the leader (otherwise it would 
>disconnect and retry another server), and then immediately after connecting 
>.85 and acquiring the lock, it disconnected because it somehow noticed that 
>.85 is not the leader, and then retried and connected to .86 (the new leader) 
>and found out that the lock is already acquired by .85 northd, so it switched 
>to standby. The .85 northd luckly connected to .86 in the first try so it was 
>able to acquire the lock on the leader node first. Maybe the key thing is to 
>figure out why the .86 northd initially connected to .85 DB which is not the 
>leader and acquired lock.

Thanks,
Han

>
> Here are some loggings.
> ==== .84 sb-db leader =================
> 2020-07-21T01:08:20.221Z|01408|raft|INFO|current entry eid 
> 639238ba-bc00-4efe-bb66-6ac766bb5f4b does not match prerequisite 
> 78e8e167-8b4c-4292-8e25-d9975631b010 in execute_command_request
>
> 2020-07-21T01:08:38.450Z|01409|timeval|WARN|Unreasonably long 1435ms poll 
> interval (1135ms user, 43ms system)
> 2020-07-21T01:08:38.451Z|01410|timeval|WARN|faults: 5942 minor, 0 major
> 2020-07-21T01:08:38.451Z|01411|timeval|WARN|disk: 0 reads, 50216 writes
> 2020-07-21T01:08:38.452Z|01412|timeval|WARN|context switches: 60 voluntary, 
> 25 involuntary
> 2020-07-21T01:08:38.453Z|01413|coverage|INFO|Skipping details of duplicate 
> event coverage for hash=45329980
>
> 2020-07-21T01:08:38.455Z|01414|raft|WARN|ignoring vote request received as 
> leader
> 2020-07-21T01:08:38.456Z|01415|raft|INFO|server 1f9e is leader for term 2
> 2020-07-21T01:08:38.457Z|01416|raft|INFO|rejected append_reply (not leader)
> 2020-07-21T01:08:38.471Z|01417|raft|INFO|rejected append_reply (not leader)
>
> 2020-07-21T01:23:00.890Z|01418|timeval|WARN|Unreasonably long 1336ms poll 
> interval (1102ms user, 20ms system)
>
> ==== .85 sb-db ==================
> 2020-07-21T01:05:25.734Z|00061|coverage|INFO|Skipping details of duplicate 
> event coverage for hash=a1e16f5b
>
> 2020-07-21T01:08:38.166Z|00062|raft|INFO|server 1f9e is leader for term 2
> 2020-07-21T01:08:38.455Z|00063|raft|INFO|rejecting term 1 < current term 2 
> received in append_request message from server 39a9
>
> 2020-07-21T01:23:44.500Z|00064|timeval|WARN|Unreasonably long 1447ms poll 
> interval (1118ms user, 33ms system)
>
> ==== .86 sb-db =================
> 2020-07-21T01:02:35.607Z|00073|coverage|INFO|70 events never hit
>
> 2020-07-21T01:08:38.119Z|00074|raft|INFO|term 2: 1135 ms timeout expired, 
> starting election
> 2020-07-21T01:08:38.165Z|00075|raft|INFO|term 2: elected leader by 2+ of 3 
> servers
> 2020-07-21T01:08:38.455Z|00076|raft|INFO|rejecting term 1 < current term 2 
> received in append_request message from server 39a9
> 2020-07-21T01:08:38.456Z|00077|raft|INFO|rejecting term 1 < current term 2 
> received in vote_reply message from server 39a9
>
> 2020-07-21T01:12:51.363Z|00078|timeval|WARN|Unreasonably long 1466ms poll 
> interval (1186ms user, 27ms system)
>
> ==== .85 ovn-northd master ==============
> 2020-07-21T01:08:25.925Z|00479|poll_loop|INFO|wakeup due to [POLLIN] on fd 8 
> (10.6.20.85:45478<http://10.6.20.85:45478><->10.6.20.84:6641<http://10.6.20.84:6641>)
>  at lib/stream-fd.c:157 (54% CPU usage)
>
> 2020-07-21T01:08:38.464Z|00480|ovsdb_idl|INFO|tcp:10.6.20.84:6642<http://10.6.20.84:6642>:
>  clustered database server is not cluster leader; trying another server
> 2020-07-21T01:08:38.465Z|00481|reconnect|INFO|tcp:10.6.20.84:6642<http://10.6.20.84:6642>:
>  connection attempt timed out
> 2020-07-21T01:08:38.465Z|00482|reconnect|INFO|tcp:10.6.20.86:6642<http://10.6.20.86:6642>:
>  connecting...
> 2020-07-21T01:08:38.465Z|00483|reconnect|INFO|tcp:10.6.20.86:6642<http://10.6.20.86:6642>:
>  connected
> 2020-07-21T01:08:38.465Z|00484|ovn_northd|INFO|ovn-northd lock lost. This 
> ovn-northd instance is now on standby.
> 2020-07-21T01:08:38.465Z|00485|ovn_northd|INFO|ovn-northd lock acquired. This 
> ovn-northd instance is now active.
> 2020-07-21T01:08:38.966Z|00486|ovn_northd|INFO|deleting Datapath_Binding 
> 108cf745-db82-43c0-a9d3-afe27a41e4aa with duplicate 
> external-ids:logical-switch/router 8a5d1d3c-e9fc-4cbe-a461-98ff838e6473
>
> 2020-07-20T21:32:48.803Z|00019|reconnect|INFO|tcp:10.6.20.84:6641<http://10.6.20.84:6641>:
>  connected
>
> ==== .86 ovn-northd ==============
> 2020-07-21T01:08:38.463Z|00020|ovsdb_idl|INFO|tcp:10.6.20.84:6642<http://10.6.20.84:6642>:
>  clustered database server is not cluster leader; trying another server
> 2020-07-21T01:08:38.464Z|00021|reconnect|INFO|tcp:10.6.20.84:6642<http://10.6.20.84:6642>:
>  connection attempt timed out
> 2020-07-21T01:08:38.464Z|00022|reconnect|INFO|tcp:10.6.20.85:6642<http://10.6.20.85:6642>:
>  connecting...
> 2020-07-21T01:08:38.464Z|00023|reconnect|INFO|tcp:10.6.20.85:6642<http://10.6.20.85:6642>:
>  connected
> 2020-07-21T01:08:38.464Z|00024|ovn_northd|INFO|ovn-northd lock acquired. This 
> ovn-northd instance is now active.
> 2020-07-21T01:08:38.465Z|00025|ovsdb_idl|INFO|tcp:10.6.20.85:6642<http://10.6.20.85:6642>:
>  clustered database server is not cluster leader; trying another server
> 2020-07-21T01:08:38.466Z|00026|reconnect|INFO|tcp:10.6.20.85:6642<http://10.6.20.85:6642>:
>  connection attempt timed out
> 2020-07-21T01:08:39.467Z|00027|reconnect|INFO|tcp:10.6.20.86:6642<http://10.6.20.86:6642>:
>  connecting...
> 2020-07-21T01:08:39.468Z|00028|reconnect|INFO|tcp:10.6.20.86:6642<http://10.6.20.86:6642>:
>  connected
> 2020-07-21T01:08:39.468Z|00029|ovn_northd|INFO|ovn-northd lock lost. This 
> ovn-northd instance is now on standby.
>
> 2020-07-21T02:04:22.718Z|00030|ovsdb_idl|INFO|tcp:10.6.20.86:6642<http://10.6.20.86:6642>:
>  clustered database server is not cluster leader; trying another server
>
>
> Thanks!
>
> Tony
> ________________________________
>
> From: Numan Siddique <num...@ovn.org<mailto:num...@ovn.org>>
> Sent: July 23, 2020 03:15 AM
> To: Tony Liu <tonyliu0...@hotmail.com<mailto:tonyliu0...@hotmail.com>>
> Cc: ovs-dev <ovs-...@openvswitch.org<mailto:ovs-...@openvswitch.org>>; 
> ovs-discuss@openvswitch.org<mailto:ovs-discuss@openvswitch.org> 
> <ovs-discuss@openvswitch.org<mailto:ovs-discuss@openvswitch.org>>
> Subject: Re: [ovs-discuss] OVN nb-db and sb-db out of sync
>
>
>
> On Thu, Jul 23, 2020 at 8:22 AM Tony Liu 
> <tonyliu0...@hotmail.com<mailto:tonyliu0...@hotmail.com><mailto:tonyliu0...@hotmail.com<mailto:tonyliu0...@hotmail.com>>>
>  wrote:
> Hi,
>
> I see why sb-db broke at 1568th port-binding.
> The 1568th datapath-binding in sb-db references the same
>
> _uuid               : 108cf745-db82-43c0-a9d3-afe27a41e4aa
> external_ids        : {logical-switch="8a5d1d3c-e9fc-4cbe-a461-98ff838e6473", 
> name=neutron-e907dc17-f1e8-4217-a37d-86e9a98c86c2, name2=net-97-192}
> tunnel_key          : 1567
>
> _uuid               : d934ed92-2f3c-4b31-8a76-2a5047a3bb46
> external_ids        : {logical-switch="8a5d1d3c-e9fc-4cbe-a461-98ff838e6473", 
> name=neutron-e907dc17-f1e8-4217-a37d-86e9a98c86c2, name2=net-97-192}
> tunnel_key          : 1568
>
> I don't believe this is supposed to happen. Any idea how could it happen?
> Then ovn-northd is stuck in trying to delete this duplication, and it ignores 
> all the following updates.
> That caused out-of-sync between nb-db and sb-db.
> Any way I can fix it manually, like with ovn-sbctl to delete it?
>
> If you delete the ovn sb db resources manually, ovn-northd should sync it up.
>
> But I'm surprised why ovn-northd didn't sync earlier. There's something wrong 
> related to raft going
> on here. Not sure what.
>
> Thanks
> Numan
>
>
>
>
> Thanks!
>
> Tony
>
> ________________________________
> From: dev 
> <ovs-dev-boun...@openvswitch.org<mailto:ovs-dev-boun...@openvswitch.org><mailto:ovs-dev-boun...@openvswitch.org<mailto:ovs-dev-boun...@openvswitch.org>>>
>  on behalf of Tony Liu 
> <tonyliu0...@hotmail.com<mailto:tonyliu0...@hotmail.com><mailto:tonyliu0...@hotmail.com<mailto:tonyliu0...@hotmail.com>>>
> Sent: July 22, 2020 11:33 AM
> To: ovs-dev 
> <ovs-...@openvswitch.org<mailto:ovs-...@openvswitch.org><mailto:ovs-...@openvswitch.org<mailto:ovs-...@openvswitch.org>>>
> Subject: [ovs-dev] OVN nb-db and sb-db out of sync
>
> Hi,
>
> During a scaling test where 4000 networks are created from OpenStack, I see 
> that
> nb-db and sb-db are out of sync. All 4000 logical switches and 8000 LS ports
> (GW port and service port of each network) are created in nb-db. In sb-db,
> only 1567 port-bindings, 4000 is expected.
>
> [root@ovn-db-2 ~]# ovn-nbctl list nb_global
> _uuid               : b7b3aa05-f7ed-4dbc-979f-10445ac325b8
> connections         : []
> external_ids        : {"neutron:liveness_check_at"="2020-07-22 
> 04:03:17.726917+00:00"}
> hv_cfg              : 312
> ipsec               : false
> name                : ""
> nb_cfg              : 2636
> options             : {mac_prefix="ca:e8:07", 
> svc_monitor_mac="4e:d0:3a:80:d4:b7"}
> sb_cfg              : 2005
> ssl                 : []
>
> [root@ovn-db-2 ~]# ovn-sbctl list sb_global
> _uuid               : 3720bc1d-b0da-47ce-85ca-96fa8d398489
> connections         : []
> external_ids        : {}
> ipsec               : false
> nb_cfg              : 312
> options             : {mac_prefix="ca:e8:07", 
> svc_monitor_mac="4e:d0:3a:80:d4:b7"}
> ssl                 : []
>
> Is there any way to force ovn-northd to rebuild sb-db to sync with nb-db,
> like manipulating nb_cfg or anything else? Note, it's 3-node RAFT cluster for 
> both
> nb-db and sb-db.
>
> Is that "incremental update" implemented in 20.03?
> If not, in which release it's going to be available?
>
>
> Thanks!
>
> Tony
>
> _______________________________________________
> dev mailing list
> d...@openvswitch.org<mailto:d...@openvswitch.org><mailto:d...@openvswitch.org<mailto:d...@openvswitch.org>>
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> _______________________________________________
> discuss mailing list
> disc...@openvswitch.org<mailto:disc...@openvswitch.org><mailto:disc...@openvswitch.org<mailto:disc...@openvswitch.org>>
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> _______________________________________________
> dev mailing list
> d...@openvswitch.org<mailto:d...@openvswitch.org>
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to