Re: [ovs-discuss] OVN Scale with RAFT: how to make ovn-northd more reliable when RAFT leader unstable

Winson Wang Thu, 16 Jul 2020 17:57:24 -0700

Hi Dumitru,

most of the flows are in table 19.


-rw-r--r-- 1 root root 142M Jul 16 17:07 br-int.txt (all flows dump file)
-rw-r--r-- 1 root root 102M Jul 16 17:43 table-19.txt
-rw-r--r-- 1 root root 7.8M Jul 16 17:43 table-11.txt
-rw-r--r-- 1 root root 3.7M Jul 16 17:43 table-21.txt

# cat table-19.txt |wc -l
408458
]# cat table-19.txt | grep "=9153" | wc -l
124744
 cat table-19.txt | grep "=53" | wc -l
249488
Coredns pod has svc with port number 53 and 9153.

Please let me know if you need more information.


Regards,
Winson


On Thu, Jul 16, 2020 at 11:23 AM Dumitru Ceara <dce...@redhat.com> wrote:

> On 7/15/20 8:02 PM, Winson Wang wrote:
> > +add ovn-Kubernetes group.
> >
> > Hi Dumitru,
> >
> > With recent patches from you and Han,  now for k8s basic workload, such
> > node resources and pod resources are fixed and look good.
> > Much thanks!
>
> Hi Winson,
>
> Glad to hear that!
>
> >
> > For k8s workload which exposes as svc IP is every common,  for example,
> > the coreDNS pod's deployment.
> > With large cluster size such  as 1000,  there is service to auto scale
> > up coreDNS deployment,  if we use default 16 nodes per coredns,  it
> could be
> > 63 coredns pods.
> > On my 1006 nodes setup,  deployment from coreDNS from 2 to 63.
> > SB raft election 16s is not good for this operation in my test
> > environment, it makes one raft node cannot finish the election in two
> > election slot when making all it's
> > clients disconnect and reconnect to two other raft nodes,  which makes
> > raft clients in an unbalanced state after this operation.
> > This condition might be avoided without larger election timer.
> >
> > For the SB and work node resource side:
> > SB DB size increased 27MB.
> > br-int open flows increased around 369K,
> > RSS memory of (ovs + ovn-controller) increased more than 600MB.
>
> This increase on the hypervisor side is most likely because of the
> openflows for hairpin traffic for VIPs (service IP). To confirm, would
> it be possible to take a snapshot of the OVS flow table and see how many
> flows there are per table?
>
> >
> > So if OVN experts can figure how to optimize it would be very great for
> > ovn-k8s scale up to large cluster size I think.
> >
>
> If the above is due to flows for LB flows to handle hairpin traffic, the
> only idea I have is to use OVS "learn" action to have the flows
> generated as needed. However, I didn't get the chance to try it out yet.
>
> Thanks,
> Dumitru
>
> >
> > Regards,
> > Winson
> >
> >
> > On Fri, May 1, 2020 at 1:35 AM Dumitru Ceara <dce...@redhat.com
> > <mailto:dce...@redhat.com>> wrote:
> >
> >     On 5/1/20 12:00 AM, Winson Wang wrote:
> >     > Hi Han,  Dumitru,
> >     >
> >
> >     Hi Winson,
> >
> >     > With the fix from Dumitru
> >     >
> >
> https://github.com/ovn-org/ovn/commit/97e82ae5f135a088c9e95b49122d8217718d23f4
> >     >
> >     > It can greatly reduced the OVS SB RAFT workload based on my stress
> >     test
> >     > mode with k8s svc with large endpoints.
> >     >
> >     > The DB file size increased much less with fix, so it will not
> trigger
> >     > the leader election with same work load.
> >     >
> >     > Dumitru,  based my test,  logic flows number is fixed with cluster
> >     size
> >     > regardless of number of VIP endpoints.
> >
> >     The number of logical flows will be fixed based on number of VIPs (2
> per
> >     VIP) but the size of the match expression depends on the number of
> >     backends per VIP so the SB DB size will increase when adding
> backends to
> >     existing VIPs.
> >
> >     >
> >     > But the open flow count on each node still have the relationship
> >     of the
> >     > endpoints size.
> >
> >     Yes, this is due to the match expression in the logical flow above
> which
> >     is of the form:
> >
> >     (ip.src == backend-ip1 && ip.dst == backend-ip2) || .. || (ip.src ==
> >     backend-ipn && ip.dst == backend-ipn)
> >
> >     This will get expanded to n openflow rules, one per backend, to
> >     determine if traffic was hairpinned.
> >
> >     > Any idea how to reduce the open flow cnt on each node's br-int?
> >     >
> >     >
> >
> >     Unfortunately I don't think there's a way to determine if traffic was
> >     hairpinned because I don't think we can have openflow rules that
> match
> >     on "ip.src == ip.dst". So in the worst case, we will probably need
> two
> >     openflow rules per backend IP (one for initiator traffic, one for
> >     reply).
> >
> >     I'll think more about it though.
> >
> >     Regards,
> >     Dumitru
> >
> >     > Regards,
> >     > Winson
> >     >
> >     >
> >     >
> >     >
> >     >
> >     >
> >     >
> >     > On Wed, Apr 29, 2020 at 1:42 PM Winson Wang
> >     <windson.w...@gmail.com <mailto:windson.w...@gmail.com>
> >     > <mailto:windson.w...@gmail.com <mailto:windson.w...@gmail.com>>>
> >     wrote:
> >     >
> >     >     Hi Han,
> >     >
> >     >     Thanks for quick reply.
> >     >     Please see my reply below.
> >     >
> >     >     On Wed, Apr 29, 2020 at 12:31 PM Han Zhou <hz...@ovn.org
> >     <mailto:hz...@ovn.org>
> >     >     <mailto:hz...@ovn.org <mailto:hz...@ovn.org>>> wrote:
> >     >
> >     >
> >     >
> >     >         On Wed, Apr 29, 2020 at 10:29 AM Winson Wang
> >     >         <windson.w...@gmail.com <mailto:windson.w...@gmail.com>
> >     <mailto:windson.w...@gmail.com <mailto:windson.w...@gmail.com>>>
> wrote:
> >     >         >
> >     >         > Hello Experts,
> >     >         >
> >     >         > I am doing stress with k8s cluster with ovn,  one thing
> I am
> >     >         seeing is that when raft nodes
> >     >         > got update for large data in short time from ovn-northd,
>  3
> >     >         raft nodes will trigger voting and leader role switched
> >     from one
> >     >         node to another.
> >     >         >
> >     >         > From ovn-northd side,  I can see ovn-northd will trigger
> the
> >     >         BACKOFF, RECONNECT...
> >     >         >
> >     >         > Since ovn-northd only connect to NB/SB leader only and
> >     how can
> >     >         we make ovn-northd more available  in most of the time?
> >     >         >
> >     >         > Is it possible to make ovn-northd have established
> >     connections
> >     >         to all raft nodes to avoid the
> >     >         > reconnect mechanism?
> >     >         > Since the backoff time 8s is not configurable for now.
> >     >         >
> >     >         >
> >     >         > Test logs:
> >     >         >
> >     >         >
> >     >
> >      2020-04-29T17:03:08.296Z|41861|ovsdb_idl|INFO|tcp:10.0.2.152:6642
> >     <http://10.0.2.152:6642> <http://10.0.2.152:6642>:
> >     >         clustered database server is not cluster leader; trying
> >     another
> >     >         server
> >     >         >
> >     >         >
> >     >
> >      2020-04-29T17:03:08.296Z|41862|reconnect|DBG|tcp:10.0.2.152:6642
> >     <http://10.0.2.152:6642>
> >     >         <http://10.0.2.152:6642>: entering RECONNECT
> >     >         >
> >     >         >
> >     >
> >      2020-04-29T17:03:08.304Z|41863|reconnect|DBG|tcp:10.0.2.152:6642
> >     <http://10.0.2.152:6642>
> >     >         <http://10.0.2.152:6642>: entering BACKOFF
> >     >         >
> >     >         > 2020-04-29T17:03:09.708Z|41867|coverage|INFO|Dropped 2
> log
> >     >         messages in last 78 seconds (most recently, 71 seconds
> >     ago) due
> >     >         to excessive rate
> >     >         >
> >     >         > 2020-04-29T17:03:09.708Z|41868|coverage|INFO|Skipping
> >     details
> >     >         of duplicate event coverage for hash=ceada91f
> >     >         >
> >     >         >
> >     >
> >      2020-04-29T17:03:16.304Z|41869|reconnect|DBG|tcp:10.0.2.153:6642
> >     <http://10.0.2.153:6642>
> >     >         <http://10.0.2.153:6642>: entering CONNECTING
> >     >         >
> >     >         >
> >     >
> >      2020-04-29T17:03:16.308Z|41870|reconnect|INFO|tcp:10.0.2.153:6642
> >     <http://10.0.2.153:6642> <http://10.0.2.153:6642>:
> >     >         connected
> >     >         >
> >     >         >
> >     >
> >      2020-04-29T17:03:16.308Z|41871|reconnect|DBG|tcp:10.0.2.153:6642
> >     <http://10.0.2.153:6642>
> >     >         <http://10.0.2.153:6642>: entering ACTIVE
> >     >         >
> >     >         >
> >     2020-04-29T17:03:16.308Z|41872|ovn_northd|INFO|ovn-northd lock
> >     >         lost. This ovn-northd instance is now on standby.
> >     >         >
> >     >         >
> >     2020-04-29T17:03:16.309Z|41873|ovn_northd|INFO|ovn-northd lock
> >     >         acquired. This ovn-northd instance is now active.
> >     >         >
> >     >         >
> >     >
> >      2020-04-29T17:03:16.311Z|41874|ovsdb_idl|INFO|tcp:10.0.2.153:6642
> >     <http://10.0.2.153:6642> <http://10.0.2.153:6642>:
> >     >         clustered database server is disconnected from cluster;
> trying
> >     >         another server
> >     >         >
> >     >         >
> >     >
> >      2020-04-29T17:03:16.311Z|41875|reconnect|DBG|tcp:10.0.2.153:6642
> >     <http://10.0.2.153:6642>
> >     >         <http://10.0.2.153:6642>: entering RECONNECT
> >     >         >
> >     >         >
> >     >
> >      2020-04-29T17:03:16.312Z|41876|reconnect|DBG|tcp:10.0.2.153:6642
> >     <http://10.0.2.153:6642>
> >     >         <http://10.0.2.153:6642>: entering BACKOFF
> >     >         >
> >     >         >
> >     >
> >      2020-04-29T17:03:24.316Z|41877|reconnect|DBG|tcp:10.0.2.151:6642
> >     <http://10.0.2.151:6642>
> >     >         <http://10.0.2.151:6642>: entering CONNECTING
> >     >         >
> >     >         >
> >     >
> >      2020-04-29T17:03:24.321Z|41878|reconnect|INFO|tcp:10.0.2.151:6642
> >     <http://10.0.2.151:6642> <http://10.0.2.151:6642>:
> >     >         connected
> >     >         >
> >     >         >
> >     >
> >      2020-04-29T17:03:24.321Z|41879|reconnect|DBG|tcp:10.0.2.151:6642
> >     <http://10.0.2.151:6642>
> >     >         <http://10.0.2.151:6642>: entering ACTIVE
> >     >         >
> >     >         >
> >     2020-04-29T17:03:24.321Z|41880|ovn_northd|INFO|ovn-northd lock
> >     >         lost. This ovn-northd instance is now on standby.
> >     >         >
> >     >         >
> >     2020-04-29T17:03:24.354Z|41881|ovn_northd|INFO|ovn-northd lock
> >     >         acquired. This ovn-northd instance is now active.
> >     >         >
> >     >         >
> >     >
> >      2020-04-29T17:03:24.358Z|41882|ovsdb_idl|INFO|tcp:10.0.2.151:6642
> >     <http://10.0.2.151:6642> <http://10.0.2.151:6642>:
> >     >         clustered database server is not cluster leader; trying
> >     another
> >     >         server
> >     >         >
> >     >         >
> >     >
> >      2020-04-29T17:03:24.358Z|41883|reconnect|DBG|tcp:10.0.2.151:6642
> >     <http://10.0.2.151:6642>
> >     >         <http://10.0.2.151:6642>: entering RECONNECT
> >     >         >
> >     >         >
> >     >
> >      2020-04-29T17:03:24.360Z|41884|reconnect|DBG|tcp:10.0.2.151:6642
> >     <http://10.0.2.151:6642>
> >     >         <http://10.0.2.151:6642>: entering BACKOFF
> >     >         >
> >     >         >
> >     >
> >      2020-04-29T17:03:32.367Z|41885|reconnect|DBG|tcp:10.0.2.152:6642
> >     <http://10.0.2.152:6642>
> >     >         <http://10.0.2.152:6642>: entering CONNECTING
> >     >         >
> >     >         >
> >     >
> >      2020-04-29T17:03:32.372Z|41886|reconnect|INFO|tcp:10.0.2.152:6642
> >     <http://10.0.2.152:6642> <http://10.0.2.152:6642>:
> >     >         connected
> >     >         >
> >     >         >
> >     >
> >      2020-04-29T17:03:32.372Z|41887|reconnect|DBG|tcp:10.0.2.152:6642
> >     <http://10.0.2.152:6642>
> >     >         <http://10.0.2.152:6642>: entering ACTIVE
> >     >         >
> >     >         >
> >     2020-04-29T17:03:32.372Z|41888|ovn_northd|INFO|ovn-northd lock
> >     >         lost. This ovn-northd instance is now on standby.
> >     >         >
> >     >         >
> >     2020-04-29T17:03:32.373Z|41889|ovn_northd|INFO|ovn-northd lock
> >     >         acquired. This ovn-northd instance is now active.
> >     >         >
> >     >         >
> >     >
> >      2020-04-29T17:03:32.376Z|41890|ovsdb_idl|INFO|tcp:10.0.2.152:6642
> >     <http://10.0.2.152:6642> <http://10.0.2.152:6642>:
> >     >         clustered database server is not cluster leader; trying
> >     another
> >     >         server
> >     >         >
> >     >         >
> >     >
> >      2020-04-29T17:03:32.376Z|41891|reconnect|DBG|tcp:10.0.2.152:6642
> >     <http://10.0.2.152:6642>
> >     >         <http://10.0.2.152:6642>: entering RECONNECT
> >     >         >
> >     >         >
> >     >
> >      2020-04-29T17:03:32.378Z|41892|reconnect|DBG|tcp:10.0.2.152:6642
> >     <http://10.0.2.152:6642>
> >     >         <http://10.0.2.152:6642>: entering BACKOFF
> >     >         >
> >     >         >
> >     >
> >      2020-04-29T17:03:40.381Z|41893|reconnect|DBG|tcp:10.0.2.153:6642
> >     <http://10.0.2.153:6642>
> >     >         <http://10.0.2.153:6642>: entering CONNECTING
> >     >         >
> >     >         >
> >     >
> >      2020-04-29T17:03:40.385Z|41894|reconnect|INFO|tcp:10.0.2.153:6642
> >     <http://10.0.2.153:6642> <http://10.0.2.153:6642>:
> >     >         connected
> >     >         >
> >     >         >
> >     >
> >      2020-04-29T17:03:40.385Z|41895|reconnect|DBG|tcp:10.0.2.153:6642
> >     <http://10.0.2.153:6642>
> >     >         <http://10.0.2.153:6642>: entering ACTIVE
> >     >         >
> >     >         >
> >     2020-04-29T17:03:40.385Z|41896|ovn_northd|INFO|ovn-northd lock
> >     >         lost. This ovn-northd instance is now on standby.
> >     >         >
> >     >         >
> >     2020-04-29T17:03:40.385Z|41897|ovn_northd|INFO|ovn-northd lock
> >     >         acquired. This ovn-northd instance is now active.
> >     >         >
> >     >         >
> >     >         > --
> >     >         > Winson
> >     >
> >     >         Hi Winson,
> >     >
> >     >         Since northd heavily writes to SB DB, it is implemented to
> >     >         connect to leader only, for better performance (avoid the
> >     extra
> >     >         cost of a follower forwarding writes to leader). When
> leader
> >     >         re-election happened, it has to reconnect to the new
> leader.
> >     >         However, if the cluster is unstable, this step would also
> take
> >     >         longer time than expected. I'd suggest to tune the election
> >     >         timer to avoid re-election during heavy operations.
> >     >
> >     >     I can see with election timer to higher value can avoid this,
> >     but if
> >     >     more stress generated then I  see it happen again.
> >     >     For real workload,  it may not hit the spike stress I trigger
> for
> >     >     stress test, so this is just for scale profiling.
> >     >
> >     >
> >     >
> >     >         If the server is overloaded for too long and longer
> election
> >     >         timer is unacceptable, the only way to solve the
> availability
> >     >         problem is to improve ovsdb performance. How big is your
> >     >         transaction and what's your election timer setting?
> >     >
> >     >     I can see ovn-northd send 33MB data in short time,  and
> >     ovsdb-server
> >     >     need sync with clients,  I run iftop on on-controller side,
> each
> >     >     node will receive around 25MB update.
> >     >     Each ovn-controller get 25MB data,  3 raft nodes total send
> >     25*646 ~16GB
> >     >
> >     >
> >     >         The number of clients also impacts the performance since
> the
> >     >         heavy update needs to be synced to all clients. How many
> >     clients
> >     >         do you have?
> >     >
> >     >     Is there one mechanism for all the ovn-controller clients to
> >     connect
> >     >     to raft followers only to skip leader?
> >     >     That will make leader node more cpu resource for voting and
> >     cluster
> >     >     level sync.
> >     >     Based my stress test,  after ovn-controller connected to 2
> >     follower
> >     >     nodes,  leader node only connect to ovn-northd.
> >     >     This model can finish raft voting finish in shorter time when
> >     >     ovn-northd trigger same workload.
> >     >
> >     >      Total clients is 646 nodes.
> >     >     Before the leader role changes,  all clients connected to 3
> >     nodes in
> >     >     balanced way,  each raft node has 200+ connections.
> >     >     After lead role change,  ovn controller side get the following
> >     messages:
> >     >
> >      2020-04-29T04:21:14.566Z|00674|ovsdb_idl|INFO|tcp:10.0.2.153:6642
> >     <http://10.0.2.153:6642>
> >     >     <http://10.0.2.153:6642>: clustered database server is
> >     disconnected
> >     >     from cluster; trying another server
> >     >
> >     >     Node 10.0.2.153 :
> >     >
> >     >     SB role changed from follower to candidate on 21:21:06
> >     >
> >     >     SB role changed from candidate to leader on 21:22:16
> >     >
> >     >     netstat for 6642 port connections:
> >     >
> >     >     21:21:31 ESTABLISHED 202
> >     >
> >     >     21:21:31 Pending 0
> >     >
> >     >     21:21:41 ESTABLISHED 0
> >     >
> >     >     21:21:41 Pending 0
> >     >
> >     >
> >     >     The above node in candidate role for more than 60s which more
> than
> >     >     my election timer setting 30s.
> >     >
> >     >     all the 202 connections of node (10.0.2.153) shift to the
> >     other two
> >     >     nodes in short time. After that only
> >     >
> >     >     ovn-northd connected to this node.
> >     >
> >     >
> >     >     Node 10.0.2.151 <http://10.0.2.151>:
> >     >
> >     >     SB role changed from leader to follower on 21:21:23
> >     >
> >     >
> >     >     21:21:35 ESTABLISHED 233
> >     >
> >     >     21:21:35 Pending 0
> >     >
> >     >     21:21:45 ESTABLISHED 282
> >     >
> >     >     21:21:45 Pending 9
> >     >
> >     >     21:21:55 ESTABLISHED 330
> >     >
> >     >     21:21:55 Pending 1
> >     >
> >     >     21:22:05 ESTABLISHED 330
> >     >
> >     >     21:22:05 Pending 1
> >     >
> >     >
> >     >
> >     >     Node 10.0.2.152 <http://10.0.2.152>:
> >     >
> >     >     SB role changed from follower to candidate on 21:21:57
> >     >
> >     >     SB role changed from candidate to follower on 21:22:17
> >     >
> >     >
> >     >     21:21:35 ESTABLISHED 211
> >     >
> >     >     21:21:35 Pending 0
> >     >
> >     >     21:21:45 ESTABLISHED 263
> >     >
> >     >     21:21:45 Pending 5
> >     >
> >     >     21:21:55 ESTABLISHED 316
> >     >
> >     >     21:21:55 Pending 0
> >     >
> >     >
> >     >
> >     >
> >     >         Thanks,
> >     >         Han
> >     >
> >     >
> >     >
> >     >     --
> >     >     Winson
> >     >
> >     >
> >     >
> >     > --
> >     > Winson
> >
> >
> >
> > --
> > Winson
>
>

-- 
Winson

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Re: [ovs-discuss] OVN Scale with RAFT: how to make ovn-northd more reliable when RAFT leader unstable

Reply via email to