CC: Numan, Han, Mark, and Leonid. On 11/25/21 22:33, Dumitru Ceara wrote: > This series started as an effort to port the support for > Load Balancer Groups to the DDlog version of northd. The initial > patch that did that turned out to be very simple and small but test > results were still not great. > > This series documents the incremental effort that was done to > determine what exactly is causing the performance bottleneck in > ovn-northd-ddlog. > > The series consists of: > - 1/7: a simple, hacky, script to simulate part of an ovn-k8s > deployment. > - 2/7: the initial LB Groups DDlog implementation. > - 3/7: port of a northd fix to avoid a LS_IN_LKUP flow explosion > for load balancer VIPs. > - 4/7: port of a northd fix to optimize LR ARP responder flows for > load balancer VIPs (using address sets). > - 5/7: a HACK to simulate the effect of generating ARP responder > flows in the router pipeline only for VIPs that are reachable > on at least one of the router's subnets. In the ovn-k8s > scenario this means to skip all such flows. I couldn't > figure out the proper DDlog implementation but the hack is > good enough for proving the point. > - 6/7: Split the generation of sb:Out_Load_Balancer relation in two > steps. This significantly improves performance. > - 7/7: Remove the load balancer routable/unroutable logic. It's not > relevant to ovn-k8s and it is very CPU intensive. > > I ran the benchmark that creates N ovn-kubernetes-like nodes (switches > and routers) and M services (load balancers) applied to all of them > using a load balancer group; after M load balancers are created and > propagated to SB, the test adds one more load balancer: > > # For 20 nodes, 3K services: > $ SANDBOXFLAGS="--no-ovn-rbac --ddlog --no-ddlog-record" make sandbox > $ ./lb-group-stress.sh 20 3000 > > # For 120 nodes, 3K services: > $ SANDBOXFLAGS="--no-ovn-rbac --ddlog --no-ddlog-record" make sandbox > $ ./lb-group-stress.sh 20 3000 > > RUN Patch # RSS Last northd loop duration Comment > (almost equivalent to > time spent incrementally > processing last load > balancer addition) > ----------------- ------- ----- ------------------------- ------- > 20 nodes + 3K LB 2/7 31.4g 94028ms Initial LB Group > implementation > 20 nodes + 3K LB 3/7 25.6g 89220ms Skip unreachable > VIPs in switch pipeline > 20 nodes + 3K LB 4/7 26.1g 92615ms Use address sets > for VIPs in router pipeline > 20 nodes + 3K LB 5/7 30.0g 96535ms Skip all VIP ARP > responder flows in router pipeline > 20 nodes + 3K LB 6/7 0.5g 15783ms Split > sb::Out_Load_Balancer relation > 20 nodes + 3K LB 7/7 0.5g 1581ms Remove load > balancer routable/unroutable logic > > 120 nodes + 3K LB 2/7 62.3g* DNF (>= 346215ms)* Initial LB Group > implementation > 120 nodes + 3K LB 3/7 71.0g* DNF (>= 111938ms)* Skip unreachable > VIPs in switch pipeline > 120 nodes + 3K LB 4/7 65.3g* DNF (>= 121180ms)* Use address sets > for VIPs in router pipeline > 120 nodes + 3K LB 5/7 57.2g* DNF (>= 207258ms)* Skip all VIP ARP > responder flows in router pipeline > 120 nodes + 3K LB 6/7 2.1g 96363ms Split > sb::Out_Load_Balancer relation > 120 nodes + 3K LB 7/7 2.1g 10899ms Remove load > balancer routable/unroutable logic > * I stopped the test after a while. > > While some of the patches in the series are just quick and dirty > hacks used to prove a hypothesis, the results seem to indicate > that some careful optimization of the ovn-northd-ddlog code > (preferably by someone more knowledgeable wrt. DDlog internals) > would generate a very efficient implementation. > > For reference, the last step of the test, adding a new load balancer to > the network, generates just a handful of Southbound updates, e.g.: > > record 48: 2021-11-25 15:49:05.343 "ovn-northd-ddlog" > table Load_Balancer insert row "lb4001" (51ca22c0): > name=lb4001 > protocol=tcp > external_ids={lb_id="51ca22c0-8647-4811-8856-84738988dc61"} > options={hairpin_orig_tuple="true"} > datapaths=[05fbfbed-a976-4ad7-a484-3a452d63d838, > 0794e508-2908-45a5-905a-0dd84ace89e6, 08d5c20e-68cb-49e0-803a-03b9699186c6, > 28f77742-9e3e-4788-b6f2-e485e0880013, 3f8ae79a-0b83-4f92-ae1a-3b996626cf5b, > 55865879-e6cc-4f6c-a582-56f2368a1263, 601a8e53-ad8d-411c-bf52-1de8f84645fa, > 60d914d0-0ee1-47a1-8219-aec4aaa793c8, 682bd907-0854-492f-9331-c3fe39f7d9a9, > 6af9ac9c-40bb-463f-a9e2-5ad58eac98b1, 6eb3c1f7-1467-46b3-9ccb-5e29cf4cf517, > 70bb0703-7edb-4caf-90ad-a72f9081ad91, 7ecabe7b-7cd2-48ff-86e8-26eee4e9018c, > 8b19c2d2-8571-4410-a29a-3c9aac9bc4b2, af362ec0-c07b-48f6-92b8-4a0fb03b7f44, > b3eec0bd-7c96-459a-bec4-597c8d2defe1, c708f850-9823-4225-a275-9efef1786efd, > c8b3d672-1f7f-497a-98a3-e30543fb37a7, dc4d30e6-5a56-4ce0-82f5-bc2c922196fb, > de5b5c9d-dcfe-42fe-97c9-3573a62d95ef] > vips={"42.15.176.1:8080"="42.15.176.2:8081"} > table Logical_Flow insert row f3163fb7: > pipeline=ingress > match="ip && ip4.dst == 42.15.176.1 && tcp" > logical_dp_group=64b4348e-4978-7e7f-7c8a-d263bcea5480 > priority=110 > external_ids={stage-name=lr_in_defrag} > table_id=5 > actions="reg0 = 42.15.176.1; reg9[16..31] = tcp.dst; ct_dnat;" > table Logical_Flow insert row 1d9318a2: > pipeline=ingress > match="ct.new && ip4 && reg0 == 42.15.176.1 && tcp && reg9[16..31] == > 8080" > logical_dp_group=64b4348e-4978-7e7f-7c8a-d263bcea5480 > priority=120 > external_ids={stage-name=lr_in_dnat} > table_id=6 > actions="ct_lb(backends=42.15.176.2:8081);" > table Logical_Flow insert row ff0be33e: > pipeline=ingress > match="ct.est && ip4 && reg0 == 42.15.176.1 && tcp && reg9[16..31] == > 8080 && ct_label.natted == 1" > logical_dp_group=64b4348e-4978-7e7f-7c8a-d263bcea5480 > priority=120 > external_ids={stage-name=lr_in_dnat} > table_id=6 > actions="next;" > table Logical_Flow insert row 74843573: > pipeline=ingress > match="ct.new && ip4.dst == 42.15.176.1 && tcp.dst == 8080" > logical_dp_group=4bbda436-d7b7-833a-3f1d-b44c77abc1d4 > priority=120 > external_ids={stage-name=ls_in_stateful} > table_id=12 > actions="reg1 = 42.15.176.1; reg2[0..15] = 8080; > ct_lb(backends=42.15.176.2:8081);" > > Dumitru Ceara (7): > tutorial: Add hacky load balancer stress test. > northd-ddlog: Add LB Group support. > northd-ddlog: Don't add ARP responder flows for unreachable VIPs. > northd-ddlog: Use address sets for ARP responder flows for VIPs. > northd-ddlog: HACK: Generate ARP responder flows only for reachable > VIPs. > northd-ddlog: Split sb::Out_Load_Balancer relation. > HACK: Remove load balancer routable/unroutable logic. > > > northd/lrouter.dl | 61 ++++++++++++---------- > northd/lswitch.dl | 6 ++ > northd/ovn-nb.dlopts | 1 > northd/ovn_northd.dl | 122 > ++++++++++++++++++++++--------------------- > tutorial/automake.mk | 3 + > tutorial/lb-group-stress.sh | 60 +++++++++++++++++++++ > 6 files changed, 164 insertions(+), 89 deletions(-) > create mode 100755 tutorial/lb-group-stress.sh > > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
_______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev