CC: Numan, Han, Mark, and Leonid.

On 11/25/21 22:33, Dumitru Ceara wrote:
> This series started as an effort to port the support for
> Load Balancer Groups to the DDlog version of northd.  The initial
> patch that did that turned out to be very simple and small but test
> results were still not great.
> 
> This series documents the incremental effort that was done to
> determine what exactly is causing the performance bottleneck in
> ovn-northd-ddlog.
> 
> The series consists of:
> - 1/7: a simple, hacky, script to simulate part of an ovn-k8s
>        deployment.
> - 2/7: the initial LB Groups DDlog implementation.
> - 3/7: port of a northd fix to avoid a LS_IN_LKUP flow explosion
>        for load balancer VIPs.
> - 4/7: port of a northd fix to optimize LR ARP responder flows for
>        load balancer VIPs (using address sets).
> - 5/7: a HACK to simulate the effect of generating ARP responder
>        flows in the router pipeline only for VIPs that are reachable
>        on at least one of the router's subnets.  In the ovn-k8s
>        scenario this means to skip all such flows.  I couldn't
>        figure out the proper DDlog implementation but the hack is
>        good enough for proving the point.
> - 6/7: Split the generation of sb:Out_Load_Balancer relation in two
>        steps.  This significantly improves performance.
> - 7/7: Remove the load balancer routable/unroutable logic.  It's not
>        relevant to ovn-k8s and it is very CPU intensive.
> 
> I ran the benchmark that creates N ovn-kubernetes-like nodes (switches
> and routers) and M services (load balancers) applied to all of them
> using a load balancer group; after M load balancers are created and
> propagated to SB, the test adds one more load balancer:
> 
> # For 20 nodes, 3K services:
> $ SANDBOXFLAGS="--no-ovn-rbac --ddlog --no-ddlog-record" make sandbox
> $ ./lb-group-stress.sh 20 3000
> 
> # For 120 nodes, 3K services:
> $ SANDBOXFLAGS="--no-ovn-rbac --ddlog --no-ddlog-record" make sandbox
> $ ./lb-group-stress.sh 20 3000
> 
> RUN               Patch #  RSS    Last northd loop duration  Comment
>                                   (almost equivalent to
>                                    time spent incrementally
>                                    processing last load
>                                    balancer addition)
> ----------------- -------  -----  -------------------------  -------
>  20 nodes + 3K LB   2/7    31.4g          94028ms            Initial LB Group 
> implementation
>  20 nodes + 3K LB   3/7    25.6g          89220ms            Skip unreachable 
> VIPs in switch pipeline
>  20 nodes + 3K LB   4/7    26.1g          92615ms            Use address sets 
> for VIPs in router pipeline
>  20 nodes + 3K LB   5/7    30.0g          96535ms            Skip all VIP ARP 
> responder flows in router pipeline
>  20 nodes + 3K LB   6/7     0.5g          15783ms            Split 
> sb::Out_Load_Balancer relation
>  20 nodes + 3K LB   7/7     0.5g           1581ms            Remove load 
> balancer routable/unroutable logic
> 
> 120 nodes + 3K LB   2/7    62.3g*     DNF (>= 346215ms)*     Initial LB Group 
> implementation
> 120 nodes + 3K LB   3/7    71.0g*     DNF (>= 111938ms)*     Skip unreachable 
> VIPs in switch pipeline
> 120 nodes + 3K LB   4/7    65.3g*     DNF (>= 121180ms)*     Use address sets 
> for VIPs in router pipeline
> 120 nodes + 3K LB   5/7    57.2g*     DNF (>= 207258ms)*     Skip all VIP ARP 
> responder flows in router pipeline
> 120 nodes + 3K LB   6/7     2.1g          96363ms            Split 
> sb::Out_Load_Balancer relation
> 120 nodes + 3K LB   7/7     2.1g          10899ms            Remove load 
> balancer routable/unroutable logic
> * I stopped the test after a while.
> 
> While some of the patches in the series are just quick and dirty
> hacks used to prove a hypothesis, the results seem to indicate
> that some careful optimization of the ovn-northd-ddlog code
> (preferably by someone more knowledgeable wrt. DDlog internals)
> would generate a very efficient implementation.
> 
> For reference, the last step of the test, adding a new load balancer to
> the network, generates just a handful of Southbound updates, e.g.:
> 
> record 48: 2021-11-25 15:49:05.343 "ovn-northd-ddlog"
>   table Load_Balancer insert row "lb4001" (51ca22c0):
>     name=lb4001
>     protocol=tcp
>     external_ids={lb_id="51ca22c0-8647-4811-8856-84738988dc61"}
>     options={hairpin_orig_tuple="true"}
>     datapaths=[05fbfbed-a976-4ad7-a484-3a452d63d838, 
> 0794e508-2908-45a5-905a-0dd84ace89e6, 08d5c20e-68cb-49e0-803a-03b9699186c6, 
> 28f77742-9e3e-4788-b6f2-e485e0880013, 3f8ae79a-0b83-4f92-ae1a-3b996626cf5b, 
> 55865879-e6cc-4f6c-a582-56f2368a1263, 601a8e53-ad8d-411c-bf52-1de8f84645fa, 
> 60d914d0-0ee1-47a1-8219-aec4aaa793c8, 682bd907-0854-492f-9331-c3fe39f7d9a9, 
> 6af9ac9c-40bb-463f-a9e2-5ad58eac98b1, 6eb3c1f7-1467-46b3-9ccb-5e29cf4cf517, 
> 70bb0703-7edb-4caf-90ad-a72f9081ad91, 7ecabe7b-7cd2-48ff-86e8-26eee4e9018c, 
> 8b19c2d2-8571-4410-a29a-3c9aac9bc4b2, af362ec0-c07b-48f6-92b8-4a0fb03b7f44, 
> b3eec0bd-7c96-459a-bec4-597c8d2defe1, c708f850-9823-4225-a275-9efef1786efd, 
> c8b3d672-1f7f-497a-98a3-e30543fb37a7, dc4d30e6-5a56-4ce0-82f5-bc2c922196fb, 
> de5b5c9d-dcfe-42fe-97c9-3573a62d95ef]
>     vips={"42.15.176.1:8080"="42.15.176.2:8081"}
>   table Logical_Flow insert row f3163fb7:
>     pipeline=ingress
>     match="ip && ip4.dst == 42.15.176.1 && tcp"
>     logical_dp_group=64b4348e-4978-7e7f-7c8a-d263bcea5480
>     priority=110
>     external_ids={stage-name=lr_in_defrag}
>     table_id=5
>     actions="reg0 = 42.15.176.1; reg9[16..31] = tcp.dst; ct_dnat;"
>   table Logical_Flow insert row 1d9318a2:
>     pipeline=ingress
>     match="ct.new && ip4 && reg0 == 42.15.176.1 && tcp && reg9[16..31] == 
> 8080"
>     logical_dp_group=64b4348e-4978-7e7f-7c8a-d263bcea5480
>     priority=120
>     external_ids={stage-name=lr_in_dnat}
>     table_id=6
>     actions="ct_lb(backends=42.15.176.2:8081);"
>   table Logical_Flow insert row ff0be33e:
>     pipeline=ingress
>     match="ct.est && ip4 && reg0 == 42.15.176.1 && tcp && reg9[16..31] == 
> 8080 && ct_label.natted == 1"
>     logical_dp_group=64b4348e-4978-7e7f-7c8a-d263bcea5480
>     priority=120
>     external_ids={stage-name=lr_in_dnat}
>     table_id=6
>     actions="next;"
>   table Logical_Flow insert row 74843573:
>     pipeline=ingress
>     match="ct.new && ip4.dst == 42.15.176.1 && tcp.dst == 8080"
>     logical_dp_group=4bbda436-d7b7-833a-3f1d-b44c77abc1d4
>     priority=120
>     external_ids={stage-name=ls_in_stateful}
>     table_id=12
>     actions="reg1 = 42.15.176.1; reg2[0..15] = 8080; 
> ct_lb(backends=42.15.176.2:8081);"
> 
> Dumitru Ceara (7):
>       tutorial: Add hacky load balancer stress test.
>       northd-ddlog: Add LB Group support.
>       northd-ddlog: Don't add ARP responder flows for unreachable VIPs.
>       northd-ddlog: Use address sets for ARP responder flows for VIPs.
>       northd-ddlog: HACK: Generate ARP responder flows only for reachable 
> VIPs.
>       northd-ddlog: Split sb::Out_Load_Balancer relation.
>       HACK: Remove load balancer routable/unroutable logic.
> 
> 
>  northd/lrouter.dl           |   61 ++++++++++++----------
>  northd/lswitch.dl           |    6 ++
>  northd/ovn-nb.dlopts        |    1 
>  northd/ovn_northd.dl        |  122 
> ++++++++++++++++++++++---------------------
>  tutorial/automake.mk        |    3 +
>  tutorial/lb-group-stress.sh |   60 +++++++++++++++++++++
>  6 files changed, 164 insertions(+), 89 deletions(-)
>  create mode 100755 tutorial/lb-group-stress.sh
> 
> _______________________________________________
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to