Re: [ovs-discuss] OVN Scale with RAFT: how to make ovn-northd more reliable when RAFT leader unstable

Winson Wang Thu, 30 Apr 2020 15:02:13 -0700

Hi Han,  Dumitru,

With the fix from Dumitru
https://github.com/ovn-org/ovn/commit/97e82ae5f135a088c9e95b49122d8217718d23f4


It can greatly reduced the OVS SB RAFT workload based on my stress test
mode with k8s svc with large endpoints.

The DB file size increased much less with fix, so it will not trigger the
leader election with same work load.

Dumitru,  based my test,  logic flows number is fixed with cluster size
regardless of number of VIP endpoints.

But the open flow count on each node still have the relationship of the
endpoints size.
Any idea how to reduce the open flow cnt on each node's br-int?


Regards,
Winson







On Wed, Apr 29, 2020 at 1:42 PM Winson Wang <windson.w...@gmail.com> wrote:

> Hi Han,
>
> Thanks for quick reply.
> Please see my reply below.
>
> On Wed, Apr 29, 2020 at 12:31 PM Han Zhou <hz...@ovn.org> wrote:
>
>>
>>
>> On Wed, Apr 29, 2020 at 10:29 AM Winson Wang <windson.w...@gmail.com>
>> wrote:
>> >
>> > Hello Experts,
>> >
>> > I am doing stress with k8s cluster with ovn,  one thing I am seeing is
>> that when raft nodes
>> > got update for large data in short time from ovn-northd,  3 raft nodes
>> will trigger voting and leader role switched from one node to another.
>> >
>> > From ovn-northd side,  I can see ovn-northd will trigger the BACKOFF,
>> RECONNECT...
>> >
>> > Since ovn-northd only connect to NB/SB leader only and how can we make
>> ovn-northd more available  in most of the time?
>> >
>> > Is it possible to make ovn-northd have established connections to all
>> raft nodes to avoid the
>> > reconnect mechanism?
>> > Since the backoff time 8s is not configurable for now.
>> >
>> >
>> > Test logs:
>> >
>> > 2020-04-29T17:03:08.296Z|41861|ovsdb_idl|INFO|tcp:10.0.2.152:6642:
>> clustered database server is not cluster leader; trying another server
>> >
>> > 2020-04-29T17:03:08.296Z|41862|reconnect|DBG|tcp:10.0.2.152:6642:
>> entering RECONNECT
>> >
>> > 2020-04-29T17:03:08.304Z|41863|reconnect|DBG|tcp:10.0.2.152:6642:
>> entering BACKOFF
>> >
>> > 2020-04-29T17:03:09.708Z|41867|coverage|INFO|Dropped 2 log messages in
>> last 78 seconds (most recently, 71 seconds ago) due to excessive rate
>> >
>> > 2020-04-29T17:03:09.708Z|41868|coverage|INFO|Skipping details of
>> duplicate event coverage for hash=ceada91f
>> >
>> > 2020-04-29T17:03:16.304Z|41869|reconnect|DBG|tcp:10.0.2.153:6642:
>> entering CONNECTING
>> >
>> > 2020-04-29T17:03:16.308Z|41870|reconnect|INFO|tcp:10.0.2.153:6642:
>> connected
>> >
>> > 2020-04-29T17:03:16.308Z|41871|reconnect|DBG|tcp:10.0.2.153:6642:
>> entering ACTIVE
>> >
>> > 2020-04-29T17:03:16.308Z|41872|ovn_northd|INFO|ovn-northd lock lost.
>> This ovn-northd instance is now on standby.
>> >
>> > 2020-04-29T17:03:16.309Z|41873|ovn_northd|INFO|ovn-northd lock
>> acquired. This ovn-northd instance is now active.
>> >
>> > 2020-04-29T17:03:16.311Z|41874|ovsdb_idl|INFO|tcp:10.0.2.153:6642:
>> clustered database server is disconnected from cluster; trying another
>> server
>> >
>> > 2020-04-29T17:03:16.311Z|41875|reconnect|DBG|tcp:10.0.2.153:6642:
>> entering RECONNECT
>> >
>> > 2020-04-29T17:03:16.312Z|41876|reconnect|DBG|tcp:10.0.2.153:6642:
>> entering BACKOFF
>> >
>> > 2020-04-29T17:03:24.316Z|41877|reconnect|DBG|tcp:10.0.2.151:6642:
>> entering CONNECTING
>> >
>> > 2020-04-29T17:03:24.321Z|41878|reconnect|INFO|tcp:10.0.2.151:6642:
>> connected
>> >
>> > 2020-04-29T17:03:24.321Z|41879|reconnect|DBG|tcp:10.0.2.151:6642:
>> entering ACTIVE
>> >
>> > 2020-04-29T17:03:24.321Z|41880|ovn_northd|INFO|ovn-northd lock lost.
>> This ovn-northd instance is now on standby.
>> >
>> > 2020-04-29T17:03:24.354Z|41881|ovn_northd|INFO|ovn-northd lock
>> acquired. This ovn-northd instance is now active.
>> >
>> > 2020-04-29T17:03:24.358Z|41882|ovsdb_idl|INFO|tcp:10.0.2.151:6642:
>> clustered database server is not cluster leader; trying another server
>> >
>> > 2020-04-29T17:03:24.358Z|41883|reconnect|DBG|tcp:10.0.2.151:6642:
>> entering RECONNECT
>> >
>> > 2020-04-29T17:03:24.360Z|41884|reconnect|DBG|tcp:10.0.2.151:6642:
>> entering BACKOFF
>> >
>> > 2020-04-29T17:03:32.367Z|41885|reconnect|DBG|tcp:10.0.2.152:6642:
>> entering CONNECTING
>> >
>> > 2020-04-29T17:03:32.372Z|41886|reconnect|INFO|tcp:10.0.2.152:6642:
>> connected
>> >
>> > 2020-04-29T17:03:32.372Z|41887|reconnect|DBG|tcp:10.0.2.152:6642:
>> entering ACTIVE
>> >
>> > 2020-04-29T17:03:32.372Z|41888|ovn_northd|INFO|ovn-northd lock lost.
>> This ovn-northd instance is now on standby.
>> >
>> > 2020-04-29T17:03:32.373Z|41889|ovn_northd|INFO|ovn-northd lock
>> acquired. This ovn-northd instance is now active.
>> >
>> > 2020-04-29T17:03:32.376Z|41890|ovsdb_idl|INFO|tcp:10.0.2.152:6642:
>> clustered database server is not cluster leader; trying another server
>> >
>> > 2020-04-29T17:03:32.376Z|41891|reconnect|DBG|tcp:10.0.2.152:6642:
>> entering RECONNECT
>> >
>> > 2020-04-29T17:03:32.378Z|41892|reconnect|DBG|tcp:10.0.2.152:6642:
>> entering BACKOFF
>> >
>> > 2020-04-29T17:03:40.381Z|41893|reconnect|DBG|tcp:10.0.2.153:6642:
>> entering CONNECTING
>> >
>> > 2020-04-29T17:03:40.385Z|41894|reconnect|INFO|tcp:10.0.2.153:6642:
>> connected
>> >
>> > 2020-04-29T17:03:40.385Z|41895|reconnect|DBG|tcp:10.0.2.153:6642:
>> entering ACTIVE
>> >
>> > 2020-04-29T17:03:40.385Z|41896|ovn_northd|INFO|ovn-northd lock lost.
>> This ovn-northd instance is now on standby.
>> >
>> > 2020-04-29T17:03:40.385Z|41897|ovn_northd|INFO|ovn-northd lock
>> acquired. This ovn-northd instance is now active.
>> >
>> >
>> > --
>> > Winson
>>
>> Hi Winson,
>>
>> Since northd heavily writes to SB DB, it is implemented to connect to
>> leader only, for better performance (avoid the extra cost of a follower
>> forwarding writes to leader). When leader re-election happened, it has to
>> reconnect to the new leader. However, if the cluster is unstable, this step
>> would also take longer time than expected. I'd suggest to tune the election
>> timer to avoid re-election during heavy operations.
>>
> I can see with election timer to higher value can avoid this, but if more
> stress generated then I  see it happen again.
> For real workload,  it may not hit the spike stress I trigger for stress
> test, so this is just for scale profiling.
>
>
>>
>> If the server is overloaded for too long and longer election timer is
>> unacceptable, the only way to solve the availability problem is to improve
>> ovsdb performance. How big is your transaction and what's your election
>> timer setting?
>>
> I can see ovn-northd send 33MB data in short time,  and ovsdb-server need
> sync with clients,  I run iftop on on-controller side, each node will
> receive around 25MB update.
> Each ovn-controller get 25MB data,  3 raft nodes total send 25*646 ~16GB
>
>
>> The number of clients also impacts the performance since the heavy update
>> needs to be synced to all clients. How many clients do you have?
>>
> Is there one mechanism for all the ovn-controller clients to connect to
> raft followers only to skip leader?
> That will make leader node more cpu resource for voting and cluster level
> sync.
> Based my stress test,  after ovn-controller connected to 2 follower nodes,
>  leader node only connect to ovn-northd.
> This model can finish raft voting finish in shorter time when ovn-northd
> trigger same workload.
>
>  Total clients is 646 nodes.
> Before the leader role changes,  all clients connected to 3 nodes in
> balanced way,  each raft node has 200+ connections.
> After lead role change,  ovn controller side get the following messages:
> 2020-04-29T04:21:14.566Z|00674|ovsdb_idl|INFO|tcp:10.0.2.153:6642:
> clustered database server is disconnected from cluster; trying another
> server
>
> Node 10.0.2.153 :
>
> SB role changed from follower to candidate on 21:21:06
>
> SB role changed from candidate to leader on 21:22:16
>
> netstat for 6642 port connections:
>
> 21:21:31 ESTABLISHED 202
>
> 21:21:31 Pending 0
>
> 21:21:41 ESTABLISHED 0
>
> 21:21:41 Pending 0
>
>
> The above node in candidate role for more than 60s which more than my
> election timer setting 30s.
>
> all the 202 connections of node (10.0.2.153) shift to the other two nodes
> in short time. After that only
>
> ovn-northd connected to this node.
>
>
> Node 10.0.2.151:
>
> SB role changed from leader to follower on 21:21:23
>
>
> 21:21:35 ESTABLISHED 233
>
> 21:21:35 Pending 0
>
> 21:21:45 ESTABLISHED 282
>
> 21:21:45 Pending 9
>
> 21:21:55 ESTABLISHED 330
>
> 21:21:55 Pending 1
>
> 21:22:05 ESTABLISHED 330
>
> 21:22:05 Pending 1
>
>
>
> Node 10.0.2.152:
>
> SB role changed from follower to candidate on 21:21:57
>
> SB role changed from candidate to follower on 21:22:17
>
>
> 21:21:35 ESTABLISHED 211
>
> 21:21:35 Pending 0
>
> 21:21:45 ESTABLISHED 263
>
> 21:21:45 Pending 5
>
> 21:21:55 ESTABLISHED 316
>
> 21:21:55 Pending 0
>
>
>
>> Thanks,
>> Han
>>
>
>
> --
> Winson
>


-- 
Winson

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Re: [ovs-discuss] OVN Scale with RAFT: how to make ovn-northd more reliable when RAFT leader unstable

Reply via email to