On 27/08/2021 19:03, Ilya Maximets wrote: > On 8/27/21 7:05 PM, Mark Gray wrote: >> On 27/08/2021 16:24, Ilya Maximets wrote: >>> It's common for CMS (e.g., ovn-kubernetes) to create 3 identical load >>> balancers, one per protocol (tcp, udp, sctp). However, if VIPs of >>> these load balancers has no ports specified (!vip_port), northd will >>> generate identical logical flows for them. 2 of 3 such flows will be >>> just discarded, so it's better to not build them form the beginning. >> >> s/form/from >> >>> >>> For example, in an ovn-heater's 120 node density-heavy scenario we >>> have 3 load balancers with 15K VIPs in each. One for tcp, one for >>> udp and one for sctp. In this case, ovn-northd generates 26M of >>> logical flows in total. ~7.5M of them are flows for a single load >>> balancer. 2 * 7.5M = 15M are identical to the first 7.5M and just >>> discarded. >>> >>> Let's find all these identical load balancers and skip while building >>> logical flows. With this change, 15M of redundant logical flows are >>> not generated saving ~1.5 seconds of the CPU time per run. >>> >>> Comparison function and the loop looks heavy, but in testing it takes >>> only a few milliseconds on these large load balancers. >>> >>> Signed-off-by: Ilya Maximets <i.maxim...@ovn.org> >> >> I see you haven't changed any tests? I am guessing because this doesn't >> change the logical flows .. which is surprising? > > This is not surprising. :) > This patch only allows to not generate duplicate flows that will never > end up in a Southbound anyway. So, total number and look of logical > flows is exactly the same. Hence, it's hard to test. It's a pure > performance fix.
For me, I was surprised that we didn't have a test for the kind of issue that we had below. I was expecting something like that to fail. > >> >> Also, as an FYI, I was unable to create load balancers like this using >> nbctl directly. It fails on creation of the load balancer. However, you >> can modify the NBDB after creation. So I presume this is an allowed >> configuration. However, it is not very well specified. > > Yeah. nbctl seems inconsistent. However, ovn-k8s and other CMSs are > not using it right now and communicate directly with a dtabase, so it's > not an issue for them. > >> >> `$ ovn-nbctl lb-add lb0 11.0.0.200 192.168.0.2 tcp >> ovn-nbctl: Protocol is unnecessary when no port of vip is given.` >> >>> --- >>> >>> The better option might be to allow multiple protocols being configured >>> per load balancer, but this will require big API/schema/etc changes >>> and may reuire re-desing of certain northd/ovn-controller logic. >>> >>> lib/lb.c | 56 +++++++++++++++++++++++++++++++++++++++++++++ >>> lib/lb.h | 4 ++++ >>> northd/ovn-northd.c | 35 ++++++++++++++++++++++++++++ >>> 3 files changed, 95 insertions(+) >>> >>> diff --git a/lib/lb.c b/lib/lb.c >>> index 7b0ed1abe..fb12c3457 100644 >>> --- a/lib/lb.c >>> +++ b/lib/lb.c >>> @@ -168,6 +168,7 @@ ovn_northd_lb_create(const struct nbrec_load_balancer >>> *nbrec_lb) >>> bool is_sctp = nullable_string_is_equal(nbrec_lb->protocol, "sctp"); >>> struct ovn_northd_lb *lb = xzalloc(sizeof *lb); >>> >>> + lb->skip_lflow_build = false; >>> lb->nlb = nbrec_lb; >>> lb->proto = is_udp ? "udp" : is_sctp ? "sctp" : "tcp"; >>> lb->n_vips = smap_count(&nbrec_lb->vips); >>> @@ -238,6 +239,61 @@ ovn_northd_lb_find(struct hmap *lbs, const struct uuid >>> *uuid) >>> return NULL; >>> } >>> >>> +/* Compares two load balancers for equality ignoring the 'protocol'. */ >>> +bool >>> +ovn_northd_lb_equal_except_for_proto(const struct ovn_northd_lb *lb1, >>> + const struct ovn_northd_lb *lb2) >>> +{ >> >> I'd have a bit of a concern about maintaining this. For example, if we >> add more fields, we would have to remember to update this but I can't >> think of a better way of doing it. > > I can add a comment to the 'struct ovn_northd_lb' for developers that > any change should be reflected in this function. > >> >>> + /* It's much more convenient to compare Northbound DB configuration. */ >>> + const struct nbrec_load_balancer *lb_a = lb1->nlb; >>> + const struct nbrec_load_balancer *lb_b = lb2->nlb; >>> + >>> + if (!smap_equal(&lb_a->external_ids, &lb_b->external_ids)) { >>> + return false; >>> + } >>> + if (lb_a->n_health_check != lb_b->n_health_check) { >>> + return false; >>> + } >>> + if (lb_a->n_health_check >>> + && !memcmp(lb_a->health_check, lb_b->health_check, >>> + lb_a->n_health_check * sizeof *lb_a->health_check)) { >>> + return false; >>> + } >>> + if (!smap_equal(&lb_a->ip_port_mappings, &lb_b->ip_port_mappings)) { >>> + return false; >>> + } >>> + if (!smap_equal(&lb_a->options, &lb_b->options)) { >>> + return false; >>> + } >>> + if (lb_a->n_selection_fields != lb_b->n_selection_fields) { >>> + return false; >>> + } >>> + if (lb_a->n_selection_fields && >>> + memcmp(lb_a->selection_fields, lb_b->selection_fields, >>> + lb_a->n_selection_fields * sizeof *lb_a->selection_fields)) >>> { >>> + return false; >>> + } >>> + if (!smap_equal(&lb_a->vips, &lb_b->vips)) { >>> + return false; >>> + } >>> + >>> + /* Below fields are not easily accessible from the Nb DB entry, so >>> + * comparing parsed versions. */ >>> + if (lb1->n_nb_ls != lb2->n_nb_ls || lb1->n_nb_lr != lb2->n_nb_lr) { >>> + return false; >>> + } >>> + if (lb1->n_nb_ls >>> + && memcmp(lb1->nb_ls, lb1->nb_ls, lb1->n_nb_ls * sizeof >>> *lb1->nb_ls)) { >>> + return false; >>> + } >>> + if (lb1->n_nb_lr >>> + && memcmp(lb1->nb_lr, lb1->nb_lr, lb1->n_nb_lr * sizeof >>> *lb1->nb_lr)) { >>> + return false; >>> + } >>> + >>> + return true; >>> +} >>> + >>> void >>> ovn_northd_lb_add_lr(struct ovn_northd_lb *lb, struct ovn_datapath *od) >>> { >>> diff --git a/lib/lb.h b/lib/lb.h >>> index 832ed31fb..8e92338a3 100644 >>> --- a/lib/lb.h >>> +++ b/lib/lb.h >>> @@ -50,6 +50,8 @@ struct ovn_northd_lb { >>> size_t n_nb_lr; >>> size_t n_allocated_nb_lr; >>> struct ovn_datapath **nb_lr; >>> + >>> + bool skip_lflow_build; >>> }; >>> >>> struct ovn_lb_vip { >>> @@ -87,6 +89,8 @@ struct ovn_northd_lb_backend { >>> >>> struct ovn_northd_lb *ovn_northd_lb_create(const struct >>> nbrec_load_balancer *); >>> struct ovn_northd_lb * ovn_northd_lb_find(struct hmap *, const struct uuid >>> *); >>> +bool ovn_northd_lb_equal_except_for_proto(const struct ovn_northd_lb *, >>> + const struct ovn_northd_lb *); >>> void ovn_northd_lb_destroy(struct ovn_northd_lb *); >>> void >>> ovn_northd_lb_add_lr(struct ovn_northd_lb *lb, struct ovn_datapath *od); >>> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c >>> index af413aba4..d1efc28f4 100644 >>> --- a/northd/ovn-northd.c >>> +++ b/northd/ovn-northd.c >>> @@ -3732,6 +3732,35 @@ build_ovn_lbs(struct northd_context *ctx, struct >>> hmap *datapaths, >>> sbrec_datapath_binding_set_load_balancers(od->sb, NULL, 0); >>> } >>> } >>> + >>> + /* Northd doesn't use protocol in case vips has no ports specified. >>> + * And it's also common that user configures several identical load >>> + * balancers, one per protocol. We need to find them and use only one >>> + * for logical flow construction. Logical flows will be identical, >>> + * so it's faster to just not build them. */ >>> + struct ovn_northd_lb *lb2; >>> + HMAP_FOR_EACH (lb, hmap_node, lbs) { >>> + if (lb->skip_lflow_build) { >>> + continue; >>> + } >>> + for (size_t i = 0; i < lb->n_vips; i++) { >>> + if (lb->vips[i].vip_port) { >>> + goto next; >>> + } >>> + } >>> + HMAP_FOR_EACH (lb2, hmap_node, lbs) { >>> + if (lb2 == lb || lb2->skip_lflow_build) { >>> + continue; >>> + } >>> + if (ovn_northd_lb_equal_except_for_proto(lb, lb2)) { >>> + /* Load balancer still referenced from logical switch or >>> + * router, so we can't destroy it here. */ >>> + lb2->skip_lflow_build = true; >>> + } >>> + } >> >> I am not sure how this would scale in the case in which there were lots >> of load-balancers. OVN-K is changing to a model in which there are many >> (15k?) loadbalancers. This would increase this inner loop to 15k x 15k >> (225M) iterations. > > I'll try to test that. We can try to hash the load balancer and make > this part semi-linear. The code will be a bit more complex. > >> >>> +next:; >>> + } >>> + >>> } >>> >>> static void >>> @@ -12772,6 +12801,9 @@ build_lflows_thread(void *arg) >>> if (stop_parallel_processing()) { >>> return NULL; >>> } >>> + if (lb->skip_lflow_build) { >>> + continue; >>> + } >>> build_lswitch_arp_nd_service_monitor(lb, lsi->lflows, >>> &lsi->match, >>> &lsi->actions); >>> @@ -12943,6 +12975,9 @@ build_lswitch_and_lrouter_flows(struct hmap >>> *datapaths, struct hmap *ports, >>> build_lswitch_and_lrouter_iterate_by_op(op, &lsi); >>> } >>> HMAP_FOR_EACH (lb, hmap_node, lbs) { >>> + if (lb->skip_lflow_build) { >>> + continue; >>> + } >>> build_lswitch_arp_nd_service_monitor(lb, lsi.lflows, >>> &lsi.actions, >>> &lsi.match); >>> >> >> I have found a configuration that causes undefined behaviour. However, >> it is the same as the without your patch but it is relevant. If you >> define two load balancers with the protocol specified (but without the >> port) and attach different attributes to each. It can cause conflicting >> logical flows. Consider the following reproducer (can be run in sandbox >> environment): >> >> ` >> # Create the first logical switch with one port >> ovn-nbctl ls-add sw0 >> ovn-nbctl lsp-add sw0 sw0-port1 >> ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01 192.168.0.2" >> >> # Create the second logical switch with one port >> ovn-nbctl ls-add sw1 >> ovn-nbctl lsp-add sw1 sw1-port1 >> ovn-nbctl lsp-set-addresses sw1-port1 "50:54:00:00:00:03 11.0.0.2" >> >> # Create a logical router and attach both logical switches >> ovn-nbctl lr-add lr0 >> ovn-nbctl lrp-add lr0 lrp0 00:00:00:00:ff:01 192.168.0.1/24 >> ovn-nbctl lsp-add sw0 lrp0-attachment >> ovn-nbctl lsp-set-type lrp0-attachment router >> ovn-nbctl lsp-set-addresses lrp0-attachment 00:00:00:00:ff:01 >> ovn-nbctl lsp-set-options lrp0-attachment router-port=lrp0 >> ovn-nbctl lrp-add lr0 lrp1 00:00:00:00:ff:02 11.0.0.1/24 >> ovn-nbctl lsp-add sw1 lrp1-attachment >> ovn-nbctl lsp-set-type lrp1-attachment router >> ovn-nbctl lsp-set-addresses lrp1-attachment 00:00:00:00:ff:02 >> ovn-nbctl lsp-set-options lrp1-attachment router-port=lrp1 >> >> ovs-vsctl add-port br-int p1 -- \ >> set Interface p1 external_ids:iface-id=sw0-port1 >> ovs-vsctl add-port br-int p2 -- \ >> set Interface p2 external_ids:iface-id=sw1-port1 >> >> ovn-nbctl set Logical_Router lr0 options:chassis=hv1 >> ovn-nbctl lb-add lb0 11.0.0.200 192.168.0.2 >> ovn-nbctl lb-add lb1 11.0.0.200 192.168.0.2 >> ovn-nbctl set Load_Balancer lb0 protocol=tcp >> ovn-nbctl set Load_Balancer lb0 options=skip_snat=true >> ovn-nbctl set Load_Balancer lb1 protocol=udp >> ovn-nbctl lr-lb-add lr0 lb0 >> ovn-nbctl lr-lb-add lr0 lb1 >> ` >> >> Your code gives the following flows: >> $ ovn-sbctl dump-flows | grep lr_in_dnat >> table=6 (lr_in_dnat ), priority=110 , match=(ct.est && ip4 && >> reg0 == 11.0.0.200 && ct_label.natted == 1), >> action=(flags.skip_snat_for_lb = 1; next;) >> table=6 (lr_in_dnat ), priority=110 , match=(ct.est && ip4 && >> reg0 == 11.0.0.200 && ct_label.natted == 1), action=(next;) >> table=6 (lr_in_dnat ), priority=110 , match=(ct.new && ip4 && >> reg0 == 11.0.0.200), action=(ct_lb(backends=192.168.0.2);) >> table=6 (lr_in_dnat ), priority=110 , match=(ct.new && ip4 && >> reg0 == 11.0.0.200), action=(flags.skip_snat_for_lb = 1; >> ct_lb(backends=192.168.0.2);) >> table=6 (lr_in_dnat ), priority=0 , match=(1), action=(next;) >> >> These should produce different flows for each load balancer which means >> we need to specify protocol as a match field. I think this prevents your >> optimization? > > Since 'options' are different, optimization will not be applied. > ovn_northd_lb_equal_except_for_proto() compares them. > > But if options are the same, than you don't need a match on a protocol. > >> >> For reference, I raised a bugzilla bug for this and it is similar to one >> I fixed recently. >> >> https://bugzilla.redhat.com/show_bug.cgi?id=1998592 >> > > The problem is that ability to implement the optimization depends on > how this particular bug will be fixed in OVN. If it will be fixed by > blindly adding protocol matches everywhere, than, obviously, optimization > will not be possible as logical flows will never be identical anymore. > > On the other hand, if it will be fixed this way, number of logical flows > in a database will explode from 9.9M to 26M flows, i.e. 2.5x and it doesn't > sound like a good thing to have at that scale. > > So, I'd say that whoever will work on fixing a bug should try to avoid > having protocol matches as much as possible. This will save a lot of memory > and processing time in all OVN components. And will also allow optimization > impemented in this patch. > > On a side note, all these force_snat and skip_snat looks like a dirty > workarounds for some other problems. And they are too low level for > OVN, therefore, IMHO, should not exist or being exposed to the end user (CMS). Yeah, I agree but they are part of the interface now and seem to be used which, unfortunately, is a bit of a problem. I am not really sure of the origin of them TBH. > > Best regards, Ilya Maximets. > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev