Re: [ovs-discuss] [ovs-dev] OVN: Two datapath-bindings are created for the same logical-switch

2020-07-30 Thread Han Zhou
resend as plain text, since I got "The message's content type was not
explicitly allowed" reply from ovs-dev-owner.

On Thu, Jul 30, 2020 at 7:30 PM Han Zhou  wrote:
>
>
>
> On Thu, Jul 30, 2020 at 7:24 PM Tony Liu  wrote:
>>
>> Hi Han,
>>
>>
>>
>> Continue with this thread. Regarding to your comment in another thread.
>>
>> ===
>>
>> 2) OVSDB clients usually monitors and syncs all (interested) data from
server to local, so when they do declarative processing, they could correct
problems by themselves. In fact, ovn-northd does the check and deletes
duplicated datapaths. I did a simple test and it did cleanup by itself:
>>
>> 2020-07-30T18:55:53.057Z|6|ovn_northd|INFO|ovn-northd lock acquired.
This ovn-northd instance is now active.
>> 2020-07-30T19:02:10.465Z|7|ovn_northd|INFO|deleting Datapath_Binding
abef9503-445e-4a52-ae88-4c826cbad9d6 with duplicate
external-ids:logical-switch/router ee80c38b-2016-4cbc-9437-f73e3a59369e
>>
>>
>>
>> I am not sure why in your case north was stuck, but I agree there must
be something wrong. Please collect northd logs if you encounter this again
so we can dig further.
>>
>> ===
>>
>>
>>
>> You are right that ovn-northd will try to clean up the duplication, but,
>>
>> there are ports in port-binding referencing to this datapath-binding, so
>>
>> ovn-northd fails to delete the datapath-binding. I have to manually
delete
>>
>> those ports to be able to delete the datapath-binding. I believe it’s not
>>
>> supported for ovn-northd to delete a configuration that is being
>>
>> referenced. Is that right? If yes, should we fix it or it's the
intention?
>>
>>
>
>
> Yes, good point!
> It is definitely a bug and we should fix it. I think the best fix is to
change the schema and add "logical_datapath" as a index, but we'll need to
make it backward compatible to avoid upgrade issues.
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovs-dev] OVN: Two datapath-bindings are created for the same logical-switch

2020-07-30 Thread Tony Liu
I agree, that will stop the duplication from being created.


Thanks!

Tony

From: Han Zhou<mailto:zhou...@gmail.com>
Sent: Thursday, July 30, 2020 7:30 PM
To: Tony Liu<mailto:tonyliu0...@hotmail.com>
Cc: Ben Pfaff<mailto:b...@ovn.org>; ovs-dev<mailto:ovs-...@openvswitch.org>; 
ovs-discuss@openvswitch.org<mailto:ovs-discuss@openvswitch.org>
Subject: Re: [ovs-dev] OVN: Two datapath-bindings are created for the same 
logical-switch



On Thu, Jul 30, 2020 at 7:24 PM Tony Liu 
mailto:tonyliu0...@hotmail.com>> wrote:
Hi Han,

Continue with this thread. Regarding to your comment in another thread.
===
2) OVSDB clients usually monitors and syncs all (interested) data from server 
to local, so when they do declarative processing, they could correct problems 
by themselves. In fact, ovn-northd does the check and deletes duplicated 
datapaths. I did a simple test and it did cleanup by itself:
2020-07-30T18:55:53.057Z|6|ovn_northd|INFO|ovn-northd lock acquired. This 
ovn-northd instance is now active.
2020-07-30T19:02:10.465Z|7|ovn_northd|INFO|deleting Datapath_Binding 
abef9503-445e-4a52-ae88-4c826cbad9d6 with duplicate 
external-ids:logical-switch/router ee80c38b-2016-4cbc-9437-f73e3a59369e

I am not sure why in your case north was stuck, but I agree there must be 
something wrong. Please collect northd logs if you encounter this again so we 
can dig further.
===

You are right that ovn-northd will try to clean up the duplication, but,
there are ports in port-binding referencing to this datapath-binding, so
ovn-northd fails to delete the datapath-binding. I have to manually delete
those ports to be able to delete the datapath-binding. I believe it’s not
supported for ovn-northd to delete a configuration that is being
referenced. Is that right? If yes, should we fix it or it's the intention?


Yes, good point!
It is definitely a bug and we should fix it. I think the best fix is to change 
the schema and add "logical_datapath" as a index, but we'll need to make it 
backward compatible to avoid upgrade issues.


Thanks!

Tony

From: Tony Liu<mailto:tonyliu0...@hotmail.com>
Sent: Thursday, July 23, 2020 7:51 PM
To: Han Zhou<mailto:zhou...@gmail.com>; Ben Pfaff<mailto:b...@ovn.org>
Cc: ovs-dev<mailto:ovs-...@openvswitch.org>; 
ovs-discuss@openvswitch.org<mailto:ovs-discuss@openvswitch.org>
Subject: Re: [ovs-discuss] [ovs-dev] OVN: Two datapath-bindings are created for 
the same logical-switch

Hi Han,

Thanks for taking the time to look into this. This problem is not consistently 
reproduced.
Developers normally ignore it:) I think we collected enough context and we can 
let it go for now.
I will rebuild setup, tune that RAFT heartbeat timer and rerun the test. Will 
keep you posted.


Thanks again!

Tony


From: Han Zhou mailto:zhou...@gmail.com>>
Sent: July 23, 2020 06:53 PM
To: Tony Liu mailto:tonyliu0...@hotmail.com>>; Ben 
Pfaff mailto:b...@ovn.org>>
Cc: Numan Siddique mailto:num...@ovn.org>>; ovs-dev 
mailto:ovs-...@openvswitch.org>>; 
ovs-discuss@openvswitch.org<mailto:ovs-discuss@openvswitch.org> 
mailto:ovs-discuss@openvswitch.org>>
Subject: Re: [ovs-dev] OVN: Two datapath-bindings are created for the same 
logical-switch


On Thu, Jul 23, 2020 at 10:33 AM Tony Liu 
mailto:tonyliu0...@hotmail.com>> wrote:
>
> Changed the title for this specific problem.
> I looked into logs and have more findings.
> The problem was happening when sb-db leader switched.

Hi Tony,

Thanks for this detailed information. Could you confirm which version of OVS is 
used (to understand OVSDB behavior).

>
> For ovsdb cluster, what may trigger the leader switch? Given the log,
> 2020-07-21T01:08:38.119Z|00074|raft|INFO|term 2: 1135 ms timeout expired, 
> starting election
> The election is asked by a follower node. Is that because the connection from 
> follower to leader timeout,
> then follower assumes the leader is dead and starts an election?

You are right, the RAFT heart beat would timeout when server is too busy and 
the election timer is too small (default 1s). For large scale test, please 
increase the election timer by:
ovn-appctl -t  cluster/change-election-timer OVN_Southbound 

I suggest to set  to be at least bigger than 1 or more in your case. 
(you need to increase the value gradually - 2000, 4000, 8000, 16000 - so it 
will take you 4 commands to reach this from the initial default value 1000, not 
very convenient, I know :)

 here is the path to the socket ctl file of ovn-sb, usually under 
/var/run/ovn.

>
> For ovn-northd (3 instances), they all connect to the sb-db leader, whoever 
> has the locker is the master.
> When sb-db leader switches, all ovn-northd instances look for the new leader. 
> In this case, there is no
> guarantee that the old ovn-northd master remains the role, other ovn-nort

Re: [ovs-discuss] [ovs-dev] OVN: Two datapath-bindings are created for the same logical-switch

2020-07-30 Thread Han Zhou
On Thu, Jul 30, 2020 at 7:24 PM Tony Liu  wrote:

> Hi Han,
>
>
>
> Continue with this thread. Regarding to your comment in another thread.
>
> ===
>
> 2) OVSDB clients usually monitors and syncs all (interested) data from
> server to local, so when they do declarative processing, they could correct
> problems by themselves. In fact, ovn-northd does the check and deletes
> duplicated datapaths. I did a simple test and it did cleanup by itself:
>
> 2020-07-30T18:55:53.057Z|6|ovn_northd|INFO|ovn-northd lock acquired.
> This ovn-northd instance is now active.
> 2020-07-30T19:02:10.465Z|7|ovn_northd|INFO|deleting Datapath_Binding
> abef9503-445e-4a52-ae88-4c826cbad9d6 with duplicate
> external-ids:logical-switch/router ee80c38b-2016-4cbc-9437-f73e3a59369e
>
>
>
> I am not sure why in your case north was stuck, but I agree there must be
> something wrong. Please collect northd logs if you encounter this again so
> we can dig further.
>
> ===
>
>
>
> You are right that ovn-northd will try to clean up the duplication, but,
>
> there are ports in port-binding referencing to this datapath-binding, so
>
> ovn-northd fails to delete the datapath-binding. I have to manually delete
>
> those ports to be able to delete the datapath-binding. I believe it’s not
>
> supported for ovn-northd to delete a configuration that is being
>
> referenced. Is that right? If yes, should we fix it or it's the intention?
>
>
>

Yes, good point!
It is definitely a bug and we should fix it. I think the best fix is to
change the schema and add "logical_datapath" as a index, but we'll need to
make it backward compatible to avoid upgrade issues.


>
>
> Thanks!
>
>
>
> Tony
>
>
>
> *From: *Tony Liu 
> *Sent: *Thursday, July 23, 2020 7:51 PM
> *To: *Han Zhou ; Ben Pfaff 
> *Cc: *ovs-dev ; ovs-discuss@openvswitch.org
> *Subject: *Re: [ovs-discuss] [ovs-dev] OVN: Two datapath-bindings are
> created for the same logical-switch
>
>
>
> Hi Han,
>
>
>
> Thanks for taking the time to look into this. This problem is not
> consistently reproduced.
>
> Developers normally ignore it:) I think we collected enough context and we
> can let it go for now.
>
> I will rebuild setup, tune that RAFT heartbeat timer and rerun the test.
> Will keep you posted.
>
>
>
>
>
> Thanks again!
>
>
>
> Tony
>
>
>
> *From:* Han Zhou 
> *Sent:* July 23, 2020 06:53 PM
> *To:* Tony Liu ; Ben Pfaff 
> *Cc:* Numan Siddique ; ovs-dev ;
> ovs-discuss@openvswitch.org 
> *Subject:* Re: [ovs-dev] OVN: Two datapath-bindings are created for the
> same logical-switch
>
>
>
>
> On Thu, Jul 23, 2020 at 10:33 AM Tony Liu  wrote:
> >
> > Changed the title for this specific problem.
> > I looked into logs and have more findings.
> > The problem was happening when sb-db leader switched.
>
>
>
> Hi Tony,
>
>
>
> Thanks for this detailed information. Could you confirm which version of
> OVS is used (to understand OVSDB behavior).
>
>
>
> >
> > For ovsdb cluster, what may trigger the leader switch? Given the log,
> > 2020-07-21T01:08:38.119Z|00074|raft|INFO|term 2: 1135 ms timeout
> expired, starting election
> > The election is asked by a follower node. Is that because the connection
> from follower to leader timeout,
> > then follower assumes the leader is dead and starts an election?
>
>
>
> You are right, the RAFT heart beat would timeout when server is too busy
> and the election timer is too small (default 1s). For large scale test,
> please increase the election timer by:
>
> ovn-appctl -t  cluster/change-election-timer OVN_Southbound 
>
>
>
> I suggest to set  to be at least bigger than 1 or more in your
> case. (you need to increase the value gradually - 2000, 4000, 8000, 16000 -
> so it will take you 4 commands to reach this from the initial default value
> 1000, not very convenient, I know :)
>
>
>
>  here is the path to the socket ctl file of ovn-sb, usually under
> /var/run/ovn.
>
>
>
> >
>
> > For ovn-northd (3 instances), they all connect to the sb-db leader,
> whoever has the locker is the master.
> > When sb-db leader switches, all ovn-northd instances look for the new
> leader. In this case, there is no
> > guarantee that the old ovn-northd master remains the role, other
> ovn-northd instance may find the
> > leader and acquire the lock first. So, the sb-db leader switch may also
> cause ovn-northd master switch.
> > Such switch may happen in the middle of ovn-northd transaction, in that
> case, is there any guaran

Re: [ovs-discuss] [ovs-dev] OVN: Two datapath-bindings are created for the same logical-switch

2020-07-30 Thread Tony Liu
Hi Han,

Continue with this thread. Regarding to your comment in another thread.
===
2) OVSDB clients usually monitors and syncs all (interested) data from server 
to local, so when they do declarative processing, they could correct problems 
by themselves. In fact, ovn-northd does the check and deletes duplicated 
datapaths. I did a simple test and it did cleanup by itself:
2020-07-30T18:55:53.057Z|6|ovn_northd|INFO|ovn-northd lock acquired. This 
ovn-northd instance is now active.
2020-07-30T19:02:10.465Z|7|ovn_northd|INFO|deleting Datapath_Binding 
abef9503-445e-4a52-ae88-4c826cbad9d6 with duplicate 
external-ids:logical-switch/router ee80c38b-2016-4cbc-9437-f73e3a59369e

I am not sure why in your case north was stuck, but I agree there must be 
something wrong. Please collect northd logs if you encounter this again so we 
can dig further.
===

You are right that ovn-northd will try to clean up the duplication, but,
there are ports in port-binding referencing to this datapath-binding, so
ovn-northd fails to delete the datapath-binding. I have to manually delete
those ports to be able to delete the datapath-binding. I believe it’s not
supported for ovn-northd to delete a configuration that is being
referenced. Is that right? If yes, should we fix it or it's the intention?


Thanks!

Tony

From: Tony Liu<mailto:tonyliu0...@hotmail.com>
Sent: Thursday, July 23, 2020 7:51 PM
To: Han Zhou<mailto:zhou...@gmail.com>; Ben Pfaff<mailto:b...@ovn.org>
Cc: ovs-dev<mailto:ovs-...@openvswitch.org>; 
ovs-discuss@openvswitch.org<mailto:ovs-discuss@openvswitch.org>
Subject: Re: [ovs-discuss] [ovs-dev] OVN: Two datapath-bindings are created for 
the same logical-switch

Hi Han,

Thanks for taking the time to look into this. This problem is not consistently 
reproduced.
Developers normally ignore it:) I think we collected enough context and we can 
let it go for now.
I will rebuild setup, tune that RAFT heartbeat timer and rerun the test. Will 
keep you posted.


Thanks again!

Tony


From: Han Zhou 
Sent: July 23, 2020 06:53 PM
To: Tony Liu ; Ben Pfaff 
Cc: Numan Siddique ; ovs-dev ; 
ovs-discuss@openvswitch.org 
Subject: Re: [ovs-dev] OVN: Two datapath-bindings are created for the same 
logical-switch


On Thu, Jul 23, 2020 at 10:33 AM Tony Liu 
mailto:tonyliu0...@hotmail.com>> wrote:
>
> Changed the title for this specific problem.
> I looked into logs and have more findings.
> The problem was happening when sb-db leader switched.

Hi Tony,

Thanks for this detailed information. Could you confirm which version of OVS is 
used (to understand OVSDB behavior).

>
> For ovsdb cluster, what may trigger the leader switch? Given the log,
> 2020-07-21T01:08:38.119Z|00074|raft|INFO|term 2: 1135 ms timeout expired, 
> starting election
> The election is asked by a follower node. Is that because the connection from 
> follower to leader timeout,
> then follower assumes the leader is dead and starts an election?

You are right, the RAFT heart beat would timeout when server is too busy and 
the election timer is too small (default 1s). For large scale test, please 
increase the election timer by:
ovn-appctl -t  cluster/change-election-timer OVN_Southbound 

I suggest to set  to be at least bigger than 1 or more in your case. 
(you need to increase the value gradually - 2000, 4000, 8000, 16000 - so it 
will take you 4 commands to reach this from the initial default value 1000, not 
very convenient, I know :)

 here is the path to the socket ctl file of ovn-sb, usually under 
/var/run/ovn.

>
> For ovn-northd (3 instances), they all connect to the sb-db leader, whoever 
> has the locker is the master.
> When sb-db leader switches, all ovn-northd instances look for the new leader. 
> In this case, there is no
> guarantee that the old ovn-northd master remains the role, other ovn-northd 
> instance may find the
> leader and acquire the lock first. So, the sb-db leader switch may also cause 
> ovn-northd master switch.
> Such switch may happen in the middle of ovn-northd transaction, in that case, 
> is there any guarantee to
> the transaction completeness? My guess is that, the older created a 
> datapath-binding for a logical-switch,
> switch happened when this transaction is not completed, then the new 
> master/leader created another
> data-path binding for the same logical-switch. Does it make any sense?

I agree with you it could be related to the failover and the lock behavior 
during the failover. It could be a lock problem causing 2 northds became active 
at the same time for a short moment. However, I still can't imagine how the 
duplicated entries are created with different tunnel keys. If both northd 
create the datapath binding for the same LS at the same time, they should 
allocate the same tunnel key, and then one of them should fail during the 
transa

Re: [ovs-discuss] [ovs-dev] OVN: Two datapath-bindings are created for the same logical-switch

2020-07-23 Thread Tony Liu
Hi Han,

Thanks for taking the time to look into this. This problem is not consistently 
reproduced.
Developers normally ignore it:) I think we collected enough context and we can 
let it go for now.
I will rebuild setup, tune that RAFT heartbeat timer and rerun the test. Will 
keep you posted.


Thanks again!

Tony


From: Han Zhou 
Sent: July 23, 2020 06:53 PM
To: Tony Liu ; Ben Pfaff 
Cc: Numan Siddique ; ovs-dev ; 
ovs-discuss@openvswitch.org 
Subject: Re: [ovs-dev] OVN: Two datapath-bindings are created for the same 
logical-switch


On Thu, Jul 23, 2020 at 10:33 AM Tony Liu 
mailto:tonyliu0...@hotmail.com>> wrote:
>
> Changed the title for this specific problem.
> I looked into logs and have more findings.
> The problem was happening when sb-db leader switched.

Hi Tony,

Thanks for this detailed information. Could you confirm which version of OVS is 
used (to understand OVSDB behavior).

>
> For ovsdb cluster, what may trigger the leader switch? Given the log,
> 2020-07-21T01:08:38.119Z|00074|raft|INFO|term 2: 1135 ms timeout expired, 
> starting election
> The election is asked by a follower node. Is that because the connection from 
> follower to leader timeout,
> then follower assumes the leader is dead and starts an election?

You are right, the RAFT heart beat would timeout when server is too busy and 
the election timer is too small (default 1s). For large scale test, please 
increase the election timer by:
ovn-appctl -t  cluster/change-election-timer OVN_Southbound 

I suggest to set  to be at least bigger than 1 or more in your case. 
(you need to increase the value gradually - 2000, 4000, 8000, 16000 - so it 
will take you 4 commands to reach this from the initial default value 1000, not 
very convenient, I know :)

 here is the path to the socket ctl file of ovn-sb, usually under 
/var/run/ovn.

>
> For ovn-northd (3 instances), they all connect to the sb-db leader, whoever 
> has the locker is the master.
> When sb-db leader switches, all ovn-northd instances look for the new leader. 
> In this case, there is no
> guarantee that the old ovn-northd master remains the role, other ovn-northd 
> instance may find the
> leader and acquire the lock first. So, the sb-db leader switch may also cause 
> ovn-northd master switch.
> Such switch may happen in the middle of ovn-northd transaction, in that case, 
> is there any guarantee to
> the transaction completeness? My guess is that, the older created a 
> datapath-binding for a logical-switch,
> switch happened when this transaction is not completed, then the new 
> master/leader created another
> data-path binding for the same logical-switch. Does it make any sense?

I agree with you it could be related to the failover and the lock behavior 
during the failover. It could be a lock problem causing 2 northds became active 
at the same time for a short moment. However, I still can't imagine how the 
duplicated entries are created with different tunnel keys. If both northd 
create the datapath binding for the same LS at the same time, they should 
allocate the same tunnel key, and then one of them should fail during the 
transaction commit because of index conflict in DB. But here they have 
different keys so both were inserted in DB.

(OVSDB transaction is atomic even during failover and no client should see 
partial data of a transaction.)

(cc Ben to comment more on the possibility of both clients acquiring the lock 
during failover)

>
> From the log, when sb-db switched, ovn-northd master connected to the new 
> leader and lost the master,
> but immediately, it acquired the lock and become master again. Not sure how 
> this happened.

>From the ovn-northd logs, the ovn-northd on .86 firstly connected to SB DB on 
>.85, which suggests that it regarded .85 as the leader (otherwise it would 
>disconnect and retry another server), and then immediately after connecting 
>.85 and acquiring the lock, it disconnected because it somehow noticed that 
>.85 is not the leader, and then retried and connected to .86 (the new leader) 
>and found out that the lock is already acquired by .85 northd, so it switched 
>to standby. The .85 northd luckly connected to .86 in the first try so it was 
>able to acquire the lock on the leader node first. Maybe the key thing is to 
>figure out why the .86 northd initially connected to .85 DB which is not the 
>leader and acquired lock.

Thanks,
Han

>
> Here are some loggings.
>  .84 sb-db leader =
> 2020-07-21T01:08:20.221Z|01408|raft|INFO|current entry eid 
> 639238ba-bc00-4efe-bb66-6ac766bb5f4b does not match prerequisite 
> 78e8e167-8b4c-4292-8e25-d9975631b010 in execute_command_request
>
> 2020-07-21T01:08:38.450Z|01409|timeval|WARN|Unreasonably long 1435ms poll 
> interval (1135ms user, 43ms system)
> 2020-07-21T01:08:38.451Z|01410|timeval|WARN|faults: 5942 minor, 0 major
> 2020-07-21T01:08:38.451Z|01411|timeval|WARN|disk: 0 reads, 50216 writes
> 

Re: [ovs-discuss] [ovs-dev] OVN: Two datapath-bindings are created for the same logical-switch

2020-07-23 Thread Han Zhou
On Thu, Jul 23, 2020 at 10:33 AM Tony Liu  wrote:
>
> Changed the title for this specific problem.
> I looked into logs and have more findings.
> The problem was happening when sb-db leader switched.

Hi Tony,

Thanks for this detailed information. Could you confirm which version of
OVS is used (to understand OVSDB behavior).

>
> For ovsdb cluster, what may trigger the leader switch? Given the log,
> 2020-07-21T01:08:38.119Z|00074|raft|INFO|term 2: 1135 ms timeout expired,
starting election
> The election is asked by a follower node. Is that because the connection
from follower to leader timeout,
> then follower assumes the leader is dead and starts an election?

You are right, the RAFT heart beat would timeout when server is too busy
and the election timer is too small (default 1s). For large scale test,
please increase the election timer by:
ovn-appctl -t  cluster/change-election-timer OVN_Southbound 

I suggest to set  to be at least bigger than 1 or more in your
case. (you need to increase the value gradually - 2000, 4000, 8000, 16000 -
so it will take you 4 commands to reach this from the initial default value
1000, not very convenient, I know :)

 here is the path to the socket ctl file of ovn-sb, usually under
/var/run/ovn.

>
> For ovn-northd (3 instances), they all connect to the sb-db leader,
whoever has the locker is the master.
> When sb-db leader switches, all ovn-northd instances look for the new
leader. In this case, there is no
> guarantee that the old ovn-northd master remains the role, other
ovn-northd instance may find the
> leader and acquire the lock first. So, the sb-db leader switch may also
cause ovn-northd master switch.
> Such switch may happen in the middle of ovn-northd transaction, in that
case, is there any guarantee to
> the transaction completeness? My guess is that, the older created a
datapath-binding for a logical-switch,
> switch happened when this transaction is not completed, then the new
master/leader created another
> data-path binding for the same logical-switch. Does it make any sense?

I agree with you it could be related to the failover and the lock behavior
during the failover. It could be a lock problem causing 2 northds became
active at the same time for a short moment. However, I still can't imagine
how the duplicated entries are created with different tunnel keys. If both
northd create the datapath binding for the same LS at the same time, they
should allocate the same tunnel key, and then one of them should fail
during the transaction commit because of index conflict in DB. But here
they have different keys so both were inserted in DB.

(OVSDB transaction is atomic even during failover and no client should see
partial data of a transaction.)

(cc Ben to comment more on the possibility of both clients acquiring the
lock during failover)

>
> From the log, when sb-db switched, ovn-northd master connected to the new
leader and lost the master,
> but immediately, it acquired the lock and become master again. Not sure
how this happened.

>From the ovn-northd logs, the ovn-northd on .86 firstly connected to SB DB
on .85, which suggests that it regarded .85 as the leader (otherwise it
would disconnect and retry another server), and then immediately after
connecting .85 and acquiring the lock, it disconnected because it somehow
noticed that .85 is not the leader, and then retried and connected to .86
(the new leader) and found out that the lock is already acquired by .85
northd, so it switched to standby. The .85 northd luckly connected to .86
in the first try so it was able to acquire the lock on the leader node
first. Maybe the key thing is to figure out why the .86 northd initially
connected to .85 DB which is not the leader and acquired lock.

Thanks,
Han

>
> Here are some loggings.
>  .84 sb-db leader =
> 2020-07-21T01:08:20.221Z|01408|raft|INFO|current entry eid
639238ba-bc00-4efe-bb66-6ac766bb5f4b does not match prerequisite
78e8e167-8b4c-4292-8e25-d9975631b010 in execute_command_request
>
> 2020-07-21T01:08:38.450Z|01409|timeval|WARN|Unreasonably long 1435ms poll
interval (1135ms user, 43ms system)
> 2020-07-21T01:08:38.451Z|01410|timeval|WARN|faults: 5942 minor, 0 major
> 2020-07-21T01:08:38.451Z|01411|timeval|WARN|disk: 0 reads, 50216 writes
> 2020-07-21T01:08:38.452Z|01412|timeval|WARN|context switches: 60
voluntary, 25 involuntary
> 2020-07-21T01:08:38.453Z|01413|coverage|INFO|Skipping details of
duplicate event coverage for hash=45329980
>
> 2020-07-21T01:08:38.455Z|01414|raft|WARN|ignoring vote request received
as leader
> 2020-07-21T01:08:38.456Z|01415|raft|INFO|server 1f9e is leader for term 2
> 2020-07-21T01:08:38.457Z|01416|raft|INFO|rejected append_reply (not
leader)
> 2020-07-21T01:08:38.471Z|01417|raft|INFO|rejected append_reply (not
leader)
>
> 2020-07-21T01:23:00.890Z|01418|timeval|WARN|Unreasonably long 1336ms poll
interval (1102ms user, 20ms system)
>
>  .85 sb-db ==
>