Hi Dumitru, most of the flows are in table 19.
-rw-r--r-- 1 root root 142M Jul 16 17:07 br-int.txt (all flows dump file) -rw-r--r-- 1 root root 102M Jul 16 17:43 table-19.txt -rw-r--r-- 1 root root 7.8M Jul 16 17:43 table-11.txt -rw-r--r-- 1 root root 3.7M Jul 16 17:43 table-21.txt # cat table-19.txt |wc -l 408458 ]# cat table-19.txt | grep "=9153" | wc -l 124744 cat table-19.txt | grep "=53" | wc -l 249488 Coredns pod has svc with port number 53 and 9153. Please let me know if you need more information. Regards, Winson On Thu, Jul 16, 2020 at 11:23 AM Dumitru Ceara <dce...@redhat.com> wrote: > On 7/15/20 8:02 PM, Winson Wang wrote: > > +add ovn-Kubernetes group. > > > > Hi Dumitru, > > > > With recent patches from you and Han, now for k8s basic workload, such > > node resources and pod resources are fixed and look good. > > Much thanks! > > Hi Winson, > > Glad to hear that! > > > > > For k8s workload which exposes as svc IP is every common, for example, > > the coreDNS pod's deployment. > > With large cluster size such as 1000, there is service to auto scale > > up coreDNS deployment, if we use default 16 nodes per coredns, it > could be > > 63 coredns pods. > > On my 1006 nodes setup, deployment from coreDNS from 2 to 63. > > SB raft election 16s is not good for this operation in my test > > environment, it makes one raft node cannot finish the election in two > > election slot when making all it's > > clients disconnect and reconnect to two other raft nodes, which makes > > raft clients in an unbalanced state after this operation. > > This condition might be avoided without larger election timer. > > > > For the SB and work node resource side: > > SB DB size increased 27MB. > > br-int open flows increased around 369K, > > RSS memory of (ovs + ovn-controller) increased more than 600MB. > > This increase on the hypervisor side is most likely because of the > openflows for hairpin traffic for VIPs (service IP). To confirm, would > it be possible to take a snapshot of the OVS flow table and see how many > flows there are per table? > > > > > So if OVN experts can figure how to optimize it would be very great for > > ovn-k8s scale up to large cluster size I think. > > > > If the above is due to flows for LB flows to handle hairpin traffic, the > only idea I have is to use OVS "learn" action to have the flows > generated as needed. However, I didn't get the chance to try it out yet. > > Thanks, > Dumitru > > > > > Regards, > > Winson > > > > > > On Fri, May 1, 2020 at 1:35 AM Dumitru Ceara <dce...@redhat.com > > <mailto:dce...@redhat.com>> wrote: > > > > On 5/1/20 12:00 AM, Winson Wang wrote: > > > Hi Han, Dumitru, > > > > > > > Hi Winson, > > > > > With the fix from Dumitru > > > > > > https://github.com/ovn-org/ovn/commit/97e82ae5f135a088c9e95b49122d8217718d23f4 > > > > > > It can greatly reduced the OVS SB RAFT workload based on my stress > > test > > > mode with k8s svc with large endpoints. > > > > > > The DB file size increased much less with fix, so it will not > trigger > > > the leader election with same work load. > > > > > > Dumitru, based my test, logic flows number is fixed with cluster > > size > > > regardless of number of VIP endpoints. > > > > The number of logical flows will be fixed based on number of VIPs (2 > per > > VIP) but the size of the match expression depends on the number of > > backends per VIP so the SB DB size will increase when adding > backends to > > existing VIPs. > > > > > > > > But the open flow count on each node still have the relationship > > of the > > > endpoints size. > > > > Yes, this is due to the match expression in the logical flow above > which > > is of the form: > > > > (ip.src == backend-ip1 && ip.dst == backend-ip2) || .. || (ip.src == > > backend-ipn && ip.dst == backend-ipn) > > > > This will get expanded to n openflow rules, one per backend, to > > determine if traffic was hairpinned. > > > > > Any idea how to reduce the open flow cnt on each node's br-int? > > > > > > > > > > Unfortunately I don't think there's a way to determine if traffic was > > hairpinned because I don't think we can have openflow rules that > match > > on "ip.src == ip.dst". So in the worst case, we will probably need > two > > openflow rules per backend IP (one for initiator traffic, one for > > reply). > > > > I'll think more about it though. > > > > Regards, > > Dumitru > > > > > Regards, > > > Winson > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Apr 29, 2020 at 1:42 PM Winson Wang > > <windson.w...@gmail.com <mailto:windson.w...@gmail.com> > > > <mailto:windson.w...@gmail.com <mailto:windson.w...@gmail.com>>> > > wrote: > > > > > > Hi Han, > > > > > > Thanks for quick reply. > > > Please see my reply below. > > > > > > On Wed, Apr 29, 2020 at 12:31 PM Han Zhou <hz...@ovn.org > > <mailto:hz...@ovn.org> > > > <mailto:hz...@ovn.org <mailto:hz...@ovn.org>>> wrote: > > > > > > > > > > > > On Wed, Apr 29, 2020 at 10:29 AM Winson Wang > > > <windson.w...@gmail.com <mailto:windson.w...@gmail.com> > > <mailto:windson.w...@gmail.com <mailto:windson.w...@gmail.com>>> > wrote: > > > > > > > > Hello Experts, > > > > > > > > I am doing stress with k8s cluster with ovn, one thing > I am > > > seeing is that when raft nodes > > > > got update for large data in short time from ovn-northd, > 3 > > > raft nodes will trigger voting and leader role switched > > from one > > > node to another. > > > > > > > > From ovn-northd side, I can see ovn-northd will trigger > the > > > BACKOFF, RECONNECT... > > > > > > > > Since ovn-northd only connect to NB/SB leader only and > > how can > > > we make ovn-northd more available in most of the time? > > > > > > > > Is it possible to make ovn-northd have established > > connections > > > to all raft nodes to avoid the > > > > reconnect mechanism? > > > > Since the backoff time 8s is not configurable for now. > > > > > > > > > > > > Test logs: > > > > > > > > > > > > > 2020-04-29T17:03:08.296Z|41861|ovsdb_idl|INFO|tcp:10.0.2.152:6642 > > <http://10.0.2.152:6642> <http://10.0.2.152:6642>: > > > clustered database server is not cluster leader; trying > > another > > > server > > > > > > > > > > > > > 2020-04-29T17:03:08.296Z|41862|reconnect|DBG|tcp:10.0.2.152:6642 > > <http://10.0.2.152:6642> > > > <http://10.0.2.152:6642>: entering RECONNECT > > > > > > > > > > > > > 2020-04-29T17:03:08.304Z|41863|reconnect|DBG|tcp:10.0.2.152:6642 > > <http://10.0.2.152:6642> > > > <http://10.0.2.152:6642>: entering BACKOFF > > > > > > > > 2020-04-29T17:03:09.708Z|41867|coverage|INFO|Dropped 2 > log > > > messages in last 78 seconds (most recently, 71 seconds > > ago) due > > > to excessive rate > > > > > > > > 2020-04-29T17:03:09.708Z|41868|coverage|INFO|Skipping > > details > > > of duplicate event coverage for hash=ceada91f > > > > > > > > > > > > > 2020-04-29T17:03:16.304Z|41869|reconnect|DBG|tcp:10.0.2.153:6642 > > <http://10.0.2.153:6642> > > > <http://10.0.2.153:6642>: entering CONNECTING > > > > > > > > > > > > > 2020-04-29T17:03:16.308Z|41870|reconnect|INFO|tcp:10.0.2.153:6642 > > <http://10.0.2.153:6642> <http://10.0.2.153:6642>: > > > connected > > > > > > > > > > > > > 2020-04-29T17:03:16.308Z|41871|reconnect|DBG|tcp:10.0.2.153:6642 > > <http://10.0.2.153:6642> > > > <http://10.0.2.153:6642>: entering ACTIVE > > > > > > > > > > 2020-04-29T17:03:16.308Z|41872|ovn_northd|INFO|ovn-northd lock > > > lost. This ovn-northd instance is now on standby. > > > > > > > > > > 2020-04-29T17:03:16.309Z|41873|ovn_northd|INFO|ovn-northd lock > > > acquired. This ovn-northd instance is now active. > > > > > > > > > > > > > 2020-04-29T17:03:16.311Z|41874|ovsdb_idl|INFO|tcp:10.0.2.153:6642 > > <http://10.0.2.153:6642> <http://10.0.2.153:6642>: > > > clustered database server is disconnected from cluster; > trying > > > another server > > > > > > > > > > > > > 2020-04-29T17:03:16.311Z|41875|reconnect|DBG|tcp:10.0.2.153:6642 > > <http://10.0.2.153:6642> > > > <http://10.0.2.153:6642>: entering RECONNECT > > > > > > > > > > > > > 2020-04-29T17:03:16.312Z|41876|reconnect|DBG|tcp:10.0.2.153:6642 > > <http://10.0.2.153:6642> > > > <http://10.0.2.153:6642>: entering BACKOFF > > > > > > > > > > > > > 2020-04-29T17:03:24.316Z|41877|reconnect|DBG|tcp:10.0.2.151:6642 > > <http://10.0.2.151:6642> > > > <http://10.0.2.151:6642>: entering CONNECTING > > > > > > > > > > > > > 2020-04-29T17:03:24.321Z|41878|reconnect|INFO|tcp:10.0.2.151:6642 > > <http://10.0.2.151:6642> <http://10.0.2.151:6642>: > > > connected > > > > > > > > > > > > > 2020-04-29T17:03:24.321Z|41879|reconnect|DBG|tcp:10.0.2.151:6642 > > <http://10.0.2.151:6642> > > > <http://10.0.2.151:6642>: entering ACTIVE > > > > > > > > > > 2020-04-29T17:03:24.321Z|41880|ovn_northd|INFO|ovn-northd lock > > > lost. This ovn-northd instance is now on standby. > > > > > > > > > > 2020-04-29T17:03:24.354Z|41881|ovn_northd|INFO|ovn-northd lock > > > acquired. This ovn-northd instance is now active. > > > > > > > > > > > > > 2020-04-29T17:03:24.358Z|41882|ovsdb_idl|INFO|tcp:10.0.2.151:6642 > > <http://10.0.2.151:6642> <http://10.0.2.151:6642>: > > > clustered database server is not cluster leader; trying > > another > > > server > > > > > > > > > > > > > 2020-04-29T17:03:24.358Z|41883|reconnect|DBG|tcp:10.0.2.151:6642 > > <http://10.0.2.151:6642> > > > <http://10.0.2.151:6642>: entering RECONNECT > > > > > > > > > > > > > 2020-04-29T17:03:24.360Z|41884|reconnect|DBG|tcp:10.0.2.151:6642 > > <http://10.0.2.151:6642> > > > <http://10.0.2.151:6642>: entering BACKOFF > > > > > > > > > > > > > 2020-04-29T17:03:32.367Z|41885|reconnect|DBG|tcp:10.0.2.152:6642 > > <http://10.0.2.152:6642> > > > <http://10.0.2.152:6642>: entering CONNECTING > > > > > > > > > > > > > 2020-04-29T17:03:32.372Z|41886|reconnect|INFO|tcp:10.0.2.152:6642 > > <http://10.0.2.152:6642> <http://10.0.2.152:6642>: > > > connected > > > > > > > > > > > > > 2020-04-29T17:03:32.372Z|41887|reconnect|DBG|tcp:10.0.2.152:6642 > > <http://10.0.2.152:6642> > > > <http://10.0.2.152:6642>: entering ACTIVE > > > > > > > > > > 2020-04-29T17:03:32.372Z|41888|ovn_northd|INFO|ovn-northd lock > > > lost. This ovn-northd instance is now on standby. > > > > > > > > > > 2020-04-29T17:03:32.373Z|41889|ovn_northd|INFO|ovn-northd lock > > > acquired. This ovn-northd instance is now active. > > > > > > > > > > > > > 2020-04-29T17:03:32.376Z|41890|ovsdb_idl|INFO|tcp:10.0.2.152:6642 > > <http://10.0.2.152:6642> <http://10.0.2.152:6642>: > > > clustered database server is not cluster leader; trying > > another > > > server > > > > > > > > > > > > > 2020-04-29T17:03:32.376Z|41891|reconnect|DBG|tcp:10.0.2.152:6642 > > <http://10.0.2.152:6642> > > > <http://10.0.2.152:6642>: entering RECONNECT > > > > > > > > > > > > > 2020-04-29T17:03:32.378Z|41892|reconnect|DBG|tcp:10.0.2.152:6642 > > <http://10.0.2.152:6642> > > > <http://10.0.2.152:6642>: entering BACKOFF > > > > > > > > > > > > > 2020-04-29T17:03:40.381Z|41893|reconnect|DBG|tcp:10.0.2.153:6642 > > <http://10.0.2.153:6642> > > > <http://10.0.2.153:6642>: entering CONNECTING > > > > > > > > > > > > > 2020-04-29T17:03:40.385Z|41894|reconnect|INFO|tcp:10.0.2.153:6642 > > <http://10.0.2.153:6642> <http://10.0.2.153:6642>: > > > connected > > > > > > > > > > > > > 2020-04-29T17:03:40.385Z|41895|reconnect|DBG|tcp:10.0.2.153:6642 > > <http://10.0.2.153:6642> > > > <http://10.0.2.153:6642>: entering ACTIVE > > > > > > > > > > 2020-04-29T17:03:40.385Z|41896|ovn_northd|INFO|ovn-northd lock > > > lost. This ovn-northd instance is now on standby. > > > > > > > > > > 2020-04-29T17:03:40.385Z|41897|ovn_northd|INFO|ovn-northd lock > > > acquired. This ovn-northd instance is now active. > > > > > > > > > > > > -- > > > > Winson > > > > > > Hi Winson, > > > > > > Since northd heavily writes to SB DB, it is implemented to > > > connect to leader only, for better performance (avoid the > > extra > > > cost of a follower forwarding writes to leader). When > leader > > > re-election happened, it has to reconnect to the new > leader. > > > However, if the cluster is unstable, this step would also > take > > > longer time than expected. I'd suggest to tune the election > > > timer to avoid re-election during heavy operations. > > > > > > I can see with election timer to higher value can avoid this, > > but if > > > more stress generated then I see it happen again. > > > For real workload, it may not hit the spike stress I trigger > for > > > stress test, so this is just for scale profiling. > > > > > > > > > > > > If the server is overloaded for too long and longer > election > > > timer is unacceptable, the only way to solve the > availability > > > problem is to improve ovsdb performance. How big is your > > > transaction and what's your election timer setting? > > > > > > I can see ovn-northd send 33MB data in short time, and > > ovsdb-server > > > need sync with clients, I run iftop on on-controller side, > each > > > node will receive around 25MB update. > > > Each ovn-controller get 25MB data, 3 raft nodes total send > > 25*646 ~16GB > > > > > > > > > The number of clients also impacts the performance since > the > > > heavy update needs to be synced to all clients. How many > > clients > > > do you have? > > > > > > Is there one mechanism for all the ovn-controller clients to > > connect > > > to raft followers only to skip leader? > > > That will make leader node more cpu resource for voting and > > cluster > > > level sync. > > > Based my stress test, after ovn-controller connected to 2 > > follower > > > nodes, leader node only connect to ovn-northd. > > > This model can finish raft voting finish in shorter time when > > > ovn-northd trigger same workload. > > > > > > Total clients is 646 nodes. > > > Before the leader role changes, all clients connected to 3 > > nodes in > > > balanced way, each raft node has 200+ connections. > > > After lead role change, ovn controller side get the following > > messages: > > > > > 2020-04-29T04:21:14.566Z|00674|ovsdb_idl|INFO|tcp:10.0.2.153:6642 > > <http://10.0.2.153:6642> > > > <http://10.0.2.153:6642>: clustered database server is > > disconnected > > > from cluster; trying another server > > > > > > Node 10.0.2.153 : > > > > > > SB role changed from follower to candidate on 21:21:06 > > > > > > SB role changed from candidate to leader on 21:22:16 > > > > > > netstat for 6642 port connections: > > > > > > 21:21:31 ESTABLISHED 202 > > > > > > 21:21:31 Pending 0 > > > > > > 21:21:41 ESTABLISHED 0 > > > > > > 21:21:41 Pending 0 > > > > > > > > > The above node in candidate role for more than 60s which more > than > > > my election timer setting 30s. > > > > > > all the 202 connections of node (10.0.2.153) shift to the > > other two > > > nodes in short time. After that only > > > > > > ovn-northd connected to this node. > > > > > > > > > Node 10.0.2.151 <http://10.0.2.151>: > > > > > > SB role changed from leader to follower on 21:21:23 > > > > > > > > > 21:21:35 ESTABLISHED 233 > > > > > > 21:21:35 Pending 0 > > > > > > 21:21:45 ESTABLISHED 282 > > > > > > 21:21:45 Pending 9 > > > > > > 21:21:55 ESTABLISHED 330 > > > > > > 21:21:55 Pending 1 > > > > > > 21:22:05 ESTABLISHED 330 > > > > > > 21:22:05 Pending 1 > > > > > > > > > > > > Node 10.0.2.152 <http://10.0.2.152>: > > > > > > SB role changed from follower to candidate on 21:21:57 > > > > > > SB role changed from candidate to follower on 21:22:17 > > > > > > > > > 21:21:35 ESTABLISHED 211 > > > > > > 21:21:35 Pending 0 > > > > > > 21:21:45 ESTABLISHED 263 > > > > > > 21:21:45 Pending 5 > > > > > > 21:21:55 ESTABLISHED 316 > > > > > > 21:21:55 Pending 0 > > > > > > > > > > > > > > > Thanks, > > > Han > > > > > > > > > > > > -- > > > Winson > > > > > > > > > > > > -- > > > Winson > > > > > > > > -- > > Winson > > -- Winson
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss