On 27/08/2021 19:03, Ilya Maximets wrote:
> On 8/27/21 7:05 PM, Mark Gray wrote:
>> On 27/08/2021 16:24, Ilya Maximets wrote:
>>> It's common for CMS (e.g., ovn-kubernetes) to create 3 identical load
>>> balancers, one per protocol (tcp, udp, sctp).  However, if VIPs of
>>> these load balancers has no ports specified (!vip_port), northd will
>>> generate identical logical flows for them.  2 of 3 such flows will be
>>> just discarded, so it's better to not build them form the beginning.
>>
>> s/form/from
>>
>>>
>>> For example, in an ovn-heater's 120 node density-heavy scenario we
>>> have 3 load balancers with 15K VIPs in each.  One for tcp, one for
>>> udp and one for sctp.  In this case, ovn-northd generates 26M of
>>> logical flows in total.  ~7.5M of them are flows for a single load
>>> balancer.  2 * 7.5M = 15M are identical to the first 7.5M and just
>>> discarded.
>>>
>>> Let's find all these identical load balancers and skip while building
>>> logical flows.  With this change, 15M of redundant logical flows are
>>> not generated saving ~1.5 seconds of the CPU time per run.
>>>
>>> Comparison function and the loop looks heavy, but in testing it takes
>>> only a few milliseconds on these large load balancers.
>>>
>>> Signed-off-by: Ilya Maximets <i.maxim...@ovn.org>
>>
>> I see you haven't changed any tests? I am guessing because this doesn't
>> change the logical flows .. which is surprising?
> 
> This is not surprising. :)
> This patch only allows to not generate duplicate flows that will never
> end up in a Southbound anyway.  So, total number and look of logical
> flows is exactly the same.  Hence, it's hard to test.  It's a pure
> performance fix.

For me, I was surprised that we didn't have a test for the kind of issue
that we had below. I was expecting something like that to fail.

> 
>>
>> Also, as an FYI, I was unable to create load balancers like this using
>> nbctl directly. It fails on creation of the load balancer. However, you
>> can modify the NBDB after creation. So I presume this is an allowed
>> configuration. However, it is not very well specified.
> 
> Yeah.  nbctl seems inconsistent.  However, ovn-k8s and other CMSs are
> not using it right now and communicate directly with a dtabase, so it's
> not an issue for them.
> 
>>
>> `$ ovn-nbctl lb-add lb0 11.0.0.200 192.168.0.2 tcp
>> ovn-nbctl: Protocol is unnecessary when no port of vip is given.`
>>
>>> ---
>>>
>>> The better option might be to allow multiple protocols being configured
>>> per load balancer, but this will require big API/schema/etc changes
>>> and may reuire re-desing of certain northd/ovn-controller logic.
>>>
>>>  lib/lb.c            | 56 +++++++++++++++++++++++++++++++++++++++++++++
>>>  lib/lb.h            |  4 ++++
>>>  northd/ovn-northd.c | 35 ++++++++++++++++++++++++++++
>>>  3 files changed, 95 insertions(+)
>>>
>>> diff --git a/lib/lb.c b/lib/lb.c
>>> index 7b0ed1abe..fb12c3457 100644
>>> --- a/lib/lb.c
>>> +++ b/lib/lb.c
>>> @@ -168,6 +168,7 @@ ovn_northd_lb_create(const struct nbrec_load_balancer 
>>> *nbrec_lb)
>>>      bool is_sctp = nullable_string_is_equal(nbrec_lb->protocol, "sctp");
>>>      struct ovn_northd_lb *lb = xzalloc(sizeof *lb);
>>>  
>>> +    lb->skip_lflow_build = false;
>>>      lb->nlb = nbrec_lb;
>>>      lb->proto = is_udp ? "udp" : is_sctp ? "sctp" : "tcp";
>>>      lb->n_vips = smap_count(&nbrec_lb->vips);
>>> @@ -238,6 +239,61 @@ ovn_northd_lb_find(struct hmap *lbs, const struct uuid 
>>> *uuid)
>>>      return NULL;
>>>  }
>>>  
>>> +/* Compares two load balancers for equality ignoring the 'protocol'. */
>>> +bool
>>> +ovn_northd_lb_equal_except_for_proto(const struct ovn_northd_lb *lb1,
>>> +                                     const struct ovn_northd_lb *lb2)
>>> +{
>>
>> I'd have a bit of a concern about maintaining this. For example, if we
>> add more fields, we would have to remember to update this but I can't
>> think of a better way of doing it.
> 
> I can add a comment to the 'struct ovn_northd_lb' for developers that
> any change should be reflected in this function.
> 
>>
>>> +    /* It's much more convenient to compare Northbound DB configuration. */
>>> +    const struct nbrec_load_balancer *lb_a = lb1->nlb;
>>> +    const struct nbrec_load_balancer *lb_b = lb2->nlb;
>>> +
>>> +    if (!smap_equal(&lb_a->external_ids, &lb_b->external_ids)) {
>>> +        return false;
>>> +    }
>>> +    if (lb_a->n_health_check != lb_b->n_health_check) {
>>> +        return false;
>>> +    }
>>> +    if (lb_a->n_health_check
>>> +        && !memcmp(lb_a->health_check, lb_b->health_check,
>>> +                   lb_a->n_health_check * sizeof *lb_a->health_check)) {
>>> +        return false;
>>> +    }
>>> +    if (!smap_equal(&lb_a->ip_port_mappings, &lb_b->ip_port_mappings)) {
>>> +        return false;
>>> +    }
>>> +    if (!smap_equal(&lb_a->options, &lb_b->options)) {
>>> +        return false;
>>> +    }
>>> +    if (lb_a->n_selection_fields != lb_b->n_selection_fields) {
>>> +        return false;
>>> +    }
>>> +    if (lb_a->n_selection_fields &&
>>> +        memcmp(lb_a->selection_fields, lb_b->selection_fields,
>>> +               lb_a->n_selection_fields * sizeof *lb_a->selection_fields)) 
>>> {
>>> +        return false;
>>> +    }
>>> +    if (!smap_equal(&lb_a->vips, &lb_b->vips)) {
>>> +        return false;
>>> +    }
>>> +
>>> +    /* Below fields are not easily accessible from the Nb DB entry, so
>>> +     * comparing parsed versions. */
>>> +    if (lb1->n_nb_ls != lb2->n_nb_ls || lb1->n_nb_lr != lb2->n_nb_lr) {
>>> +        return false;
>>> +    }
>>> +    if (lb1->n_nb_ls
>>> +        && memcmp(lb1->nb_ls, lb1->nb_ls, lb1->n_nb_ls * sizeof 
>>> *lb1->nb_ls)) {
>>> +        return false;
>>> +    }
>>> +    if (lb1->n_nb_lr
>>> +        && memcmp(lb1->nb_lr, lb1->nb_lr, lb1->n_nb_lr * sizeof 
>>> *lb1->nb_lr)) {
>>> +        return false;
>>> +    }
>>> +
>>> +    return true;
>>> +}
>>> +
>>>  void
>>>  ovn_northd_lb_add_lr(struct ovn_northd_lb *lb, struct ovn_datapath *od)
>>>  {
>>> diff --git a/lib/lb.h b/lib/lb.h
>>> index 832ed31fb..8e92338a3 100644
>>> --- a/lib/lb.h
>>> +++ b/lib/lb.h
>>> @@ -50,6 +50,8 @@ struct ovn_northd_lb {
>>>      size_t n_nb_lr;
>>>      size_t n_allocated_nb_lr;
>>>      struct ovn_datapath **nb_lr;
>>> +
>>> +    bool skip_lflow_build;
>>>  };
>>>  
>>>  struct ovn_lb_vip {
>>> @@ -87,6 +89,8 @@ struct ovn_northd_lb_backend {
>>>  
>>>  struct ovn_northd_lb *ovn_northd_lb_create(const struct 
>>> nbrec_load_balancer *);
>>>  struct ovn_northd_lb * ovn_northd_lb_find(struct hmap *, const struct uuid 
>>> *);
>>> +bool ovn_northd_lb_equal_except_for_proto(const struct ovn_northd_lb *,
>>> +                                          const struct ovn_northd_lb *);
>>>  void ovn_northd_lb_destroy(struct ovn_northd_lb *);
>>>  void
>>>  ovn_northd_lb_add_lr(struct ovn_northd_lb *lb, struct ovn_datapath *od);
>>> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
>>> index af413aba4..d1efc28f4 100644
>>> --- a/northd/ovn-northd.c
>>> +++ b/northd/ovn-northd.c
>>> @@ -3732,6 +3732,35 @@ build_ovn_lbs(struct northd_context *ctx, struct 
>>> hmap *datapaths,
>>>              sbrec_datapath_binding_set_load_balancers(od->sb, NULL, 0);
>>>          }
>>>      }
>>> +
>>> +    /* Northd doesn't use protocol in case vips has no ports specified.
>>> +     * And it's also common that user configures several identical load
>>> +     * balancers, one per protocol. We need to find them and use only one
>>> +     * for logical flow construction.  Logical flows will be identical,
>>> +     * so it's faster to just not build them. */
>>> +    struct ovn_northd_lb *lb2;
>>> +    HMAP_FOR_EACH (lb, hmap_node, lbs) {
>>> +        if (lb->skip_lflow_build) {
>>> +            continue;
>>> +        }
>>> +        for (size_t i = 0; i < lb->n_vips; i++) {
>>> +            if (lb->vips[i].vip_port) {
>>> +                goto next;
>>> +            }
>>> +        }
>>> +        HMAP_FOR_EACH (lb2, hmap_node, lbs) {
>>> +            if (lb2 == lb || lb2->skip_lflow_build) {
>>> +                continue;
>>> +            }
>>> +            if (ovn_northd_lb_equal_except_for_proto(lb, lb2)) {
>>> +                /* Load balancer still referenced from logical switch or
>>> +                 * router, so we can't destroy it here. */
>>> +                lb2->skip_lflow_build = true;
>>> +            }
>>> +        }
>>
>> I am not sure how this would scale in the case in which there were lots
>> of load-balancers. OVN-K is changing to a model in which there are many
>> (15k?) loadbalancers. This would increase this inner loop to 15k x 15k
>> (225M) iterations.
> 
> I'll try to test that.  We can try to hash the load balancer and make
> this part semi-linear.  The code will be a bit more complex.
> 
>>
>>> +next:;
>>> +    }
>>> +
>>>  }
>>>  
>>>  static void
>>> @@ -12772,6 +12801,9 @@ build_lflows_thread(void *arg)
>>>                      if (stop_parallel_processing()) {
>>>                          return NULL;
>>>                      }
>>> +                    if (lb->skip_lflow_build) {
>>> +                        continue;
>>> +                    }
>>>                      build_lswitch_arp_nd_service_monitor(lb, lsi->lflows,
>>>                                                           &lsi->match,
>>>                                                           &lsi->actions);
>>> @@ -12943,6 +12975,9 @@ build_lswitch_and_lrouter_flows(struct hmap 
>>> *datapaths, struct hmap *ports,
>>>              build_lswitch_and_lrouter_iterate_by_op(op, &lsi);
>>>          }
>>>          HMAP_FOR_EACH (lb, hmap_node, lbs) {
>>> +            if (lb->skip_lflow_build) {
>>> +                continue;
>>> +            }
>>>              build_lswitch_arp_nd_service_monitor(lb, lsi.lflows,
>>>                                                   &lsi.actions,
>>>                                                   &lsi.match);
>>>
>>
>> I have found a configuration that causes undefined behaviour. However,
>> it is the same as the without your patch but it is relevant. If you
>> define two load balancers with the protocol specified (but without the
>> port) and attach different attributes to each. It can cause conflicting
>> logical flows. Consider the following reproducer (can be run in sandbox
>> environment):
>>
>> `
>> # Create the first logical switch with one port
>> ovn-nbctl ls-add sw0
>> ovn-nbctl lsp-add sw0 sw0-port1
>> ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01 192.168.0.2"
>>
>> # Create the second logical switch with one port
>> ovn-nbctl ls-add sw1
>> ovn-nbctl lsp-add sw1 sw1-port1
>> ovn-nbctl lsp-set-addresses sw1-port1 "50:54:00:00:00:03 11.0.0.2"
>>
>> # Create a logical router and attach both logical switches
>> ovn-nbctl lr-add lr0
>> ovn-nbctl lrp-add lr0 lrp0 00:00:00:00:ff:01 192.168.0.1/24
>> ovn-nbctl lsp-add sw0 lrp0-attachment
>> ovn-nbctl lsp-set-type lrp0-attachment router
>> ovn-nbctl lsp-set-addresses lrp0-attachment 00:00:00:00:ff:01
>> ovn-nbctl lsp-set-options lrp0-attachment router-port=lrp0
>> ovn-nbctl lrp-add lr0 lrp1 00:00:00:00:ff:02 11.0.0.1/24
>> ovn-nbctl lsp-add sw1 lrp1-attachment
>> ovn-nbctl lsp-set-type lrp1-attachment router
>> ovn-nbctl lsp-set-addresses lrp1-attachment 00:00:00:00:ff:02
>> ovn-nbctl lsp-set-options lrp1-attachment router-port=lrp1
>>
>> ovs-vsctl add-port br-int p1 -- \
>>     set Interface p1 external_ids:iface-id=sw0-port1
>> ovs-vsctl add-port br-int p2 -- \
>>     set Interface p2 external_ids:iface-id=sw1-port1
>>
>> ovn-nbctl set Logical_Router lr0 options:chassis=hv1
>> ovn-nbctl lb-add lb0 11.0.0.200 192.168.0.2
>> ovn-nbctl lb-add lb1 11.0.0.200 192.168.0.2
>> ovn-nbctl set Load_Balancer lb0 protocol=tcp
>> ovn-nbctl set Load_Balancer lb0 options=skip_snat=true
>> ovn-nbctl set Load_Balancer lb1 protocol=udp
>> ovn-nbctl lr-lb-add lr0 lb0
>> ovn-nbctl lr-lb-add lr0 lb1
>> `
>>
>> Your code gives the following flows:
>> $ ovn-sbctl dump-flows | grep lr_in_dnat
>>   table=6 (lr_in_dnat         ), priority=110  , match=(ct.est && ip4 &&
>> reg0 == 11.0.0.200 && ct_label.natted == 1),
>> action=(flags.skip_snat_for_lb = 1; next;)
>>   table=6 (lr_in_dnat         ), priority=110  , match=(ct.est && ip4 &&
>> reg0 == 11.0.0.200 && ct_label.natted == 1), action=(next;)
>>   table=6 (lr_in_dnat         ), priority=110  , match=(ct.new && ip4 &&
>> reg0 == 11.0.0.200), action=(ct_lb(backends=192.168.0.2);)
>>   table=6 (lr_in_dnat         ), priority=110  , match=(ct.new && ip4 &&
>> reg0 == 11.0.0.200), action=(flags.skip_snat_for_lb = 1;
>> ct_lb(backends=192.168.0.2);)
>>   table=6 (lr_in_dnat         ), priority=0    , match=(1), action=(next;)
>>
>> These should produce different flows for each load balancer which means
>> we need to specify protocol as a match field. I think this prevents your
>> optimization?
> 
> Since 'options' are different, optimization will not be applied.
> ovn_northd_lb_equal_except_for_proto() compares them.
> 
> But if options are the same, than you don't need a match on a protocol.
> 
>>
>> For reference, I raised a bugzilla bug for this and it is similar to one
>> I fixed recently.
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1998592
>>
> 
> The problem is that ability to implement the optimization depends on
> how this particular bug will be fixed in OVN.  If it will be fixed by
> blindly adding protocol matches everywhere, than, obviously, optimization
> will not be possible as logical flows will never be identical anymore.
> 
> On the other hand, if it will be fixed this way, number of logical flows
> in a database will explode from 9.9M to 26M flows, i.e. 2.5x and it doesn't
> sound like a good thing to have at that scale.
> 
> So, I'd say that whoever will work on fixing a bug should try to avoid
> having protocol matches as much as possible.  This will save a lot of memory
> and processing time in all OVN components.  And will also allow optimization
> impemented in this patch.
> 
> On a side note, all these force_snat and skip_snat looks like a dirty
> workarounds for some other problems.  And they are too low level for
> OVN, therefore, IMHO, should not exist or being exposed to the end user (CMS).

Yeah, I agree but they are part of the interface now and seem to be used
which, unfortunately, is a bit of a problem. I am not really sure of the
origin of them TBH.
> 
> Best regards, Ilya Maximets.
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to