Hi Han, Mark, Xavier,

On 5/10/22 03:52, Han Zhou wrote:
> On Mon, May 9, 2022 at 5:36 PM Xavier Simonart <xsimo...@redhat.com> wrote:
>>
>> Hi Mark, Han
>>
>> Thanks for looking into this.
>> Han's patch is addressing the issues I initially tried fixing with
> https://patchwork.ozlabs.org/project/ovn/patch/20220401101640.1139426-1-xsimo...@redhat.com/
>>
> https://patchwork.ozlabs.org/project/ovn/patch/20220401101640.1139426-1-xsimo...@redhat.com/
> is definitely not a good solution (incurs some delay and is more a
> workaround than a fix).
>>
>> I have been trying to follow another approach, which would fix a slightly
> broader issue.
>> The goal is to try to avoid recomputes when updating Port_Binding in SBDB.
>> In addition to the case Han (greatly) described, we also hit some
> recomputes when the same controller binds multiple ports quite quickly
> consecutively.
>>
>> So I have been trying to extend the approach used in if-status (i.e. not
> write to SBDB if SBDB is read-only, and have the if-status state machine
> running through the different interface states, update SBDB when possible,
> and change the states).
>> I have a patch (almost) ready but had some issues with some test cases,
> and was still investigating whether those issues were due to my patch....
>> I hope to post it tomorrow.
>> We can then discuss whether we need both approaches.
> 
> Thanks Xavier. I am interested in the solution you are working on.
> I also want to add that my patch here is to provide a generic solution that
> filters out the changes we don't care about and quits the I-P engine
> processing early. It can be extended easily in the future if we have new
> changes to ignore. In many cases this has been achieved by omitting columns
> in OVSDB IDL monitoring, but the case here is that we care about part of a
> column that is with k-v pairs and OVSDB IDL can monitor/omit the whole
> column. So I feel we may need this regardless of the new solution that
> addresses the consecutively multiple ports binding problem.

I think so too.  If I understand correctly we need both changes.

> Anyway, we can wait and evaluate when your new patch is posted.
> 

+1

Thanks,
Dumitru

> Thanks,
> Han
> 
>>
>> Thanks
>> Xavier
>>
>>
>> On Mon, May 9, 2022 at 9:32 PM Mark Michelson <mmich...@redhat.com> wrote:
>>>
>>> Based on the explanation and the code, this seems to solve the problem
>>> of the spurious recomputes. I'd like confirmation from Xavier that this
>>> is fixing the issue he was addressing in
>>>
> https://patchwork.ozlabs.org/project/ovn/patch/20220401101640.1139426-1-xsimo...@redhat.com/
>>> , and I'd also like confirmation that this approach doesn't incur the
>>> delays that Dumitru mentioned on Xavier's patch.
>>>
>>> With those confirmations, I'll add my ack to this patch.
>>>
>>> On 5/3/22 15:37, Han Zhou wrote:
>>>> During the process of claiming a VIF port on a chassis, there are
>>>> multiple SB and local OVSDB updates by ovn-controller, which may
> trigger
>>>> ovn-controller recompute, depending on the timing between the
> completion
>>>> of SB and local OVSDB updates:
>>>>
>>>> After port claiming, ovn-controller sends an update to the local OVSDB
>>>> to clean the Interface:external_ids:ovn-installed, and also sends an
>>>> update to the SB DB to update the Port_Binding:chassis.  If the local
>>>> OVSDB update notification comes back before the SB update completes,
>>>> ovn-controller's SB IDL is still "in transaction" and read-only, but
> the
>>>> local Interface update triggers the I-P engine runtime_data node to
>>>> handle the OVS_interface change, which considers the VIF to be claimed
>>>> (again) potentially, which requires SB updates while it is not possible
>>>> at the moment, and so the I-P handler returns failure and the engine
>>>> aborts, which triggers recompute at the next I-P engine iteration when
>>>> SB DB is writable.
>>>>
>>>> If we are lucky at the above step that the SB update completes before
>>>> the local OVSDB update is received, ovn-controller would handle the
>>>> change incrementally (and figures out that the SB Port_Binding chassis
>>>> is already the current one thus wouldn't reclaim the VIF). But the next
>>>> step after it completes flow computing for the newly bound VIF it sends
>>>> another update to the local OVSDB to set the
>>>> Interface:external_ids:ovn-installed and ovn-installed-ts, while at the
>>>> same time it also sends an update to the SB DB to set the
>>>> Port_Binding:up. These two parallel updates would lead to the same
>>>> situation as the above step and potentially cause ovn-controll
>>>> recompute.
>>>>
>>>> The recompute is very likely to be triggered in production because the
>>>> SB DB is remote (on central nodes) and has heavier load, which requires
>>>> longer time to complete the updates than the local OVSDB server.
>>>>
>>>> The problem is that the runtime_data's OVS_interface change handler
>>>> couldn't handle the update of the Interface:external_ids:ovn-installed
>>>> incrementally, because ovn-controller assumes it could require an SB
>>>> update that is not possible at the moment. However, the assumption is
>>>> not true. In fact it shouldn't even care about the change because this
>>>> external_id:ovn-installed is written by ovn-controller itself and it is
>>>> write-only (no need to read it as an input for any further processing).
>>>>
>>>> So, the idea to fix the problem is to ignore such changes that happen
> to
>>>> the "write-only" fields only. OVSDB change tracking provides APIs to
>>>> check if a specific column is updated, but the difficulty here is that
>>>> the column external_ids is a k-v map which contains not only the data
>>>> that ovn-controller can ignore, such as ovn-installed,
> ovn-installed-ts,
>>>> but also the data ovn-controller cares about, such as iface-id,
>>>> iface-id-ver and encap-ip. To solve this problem, we need to know
> what's
>>>> changed exactly inside the external_ids map.
>>>>
>>>> This patch solves it by introducing a new I-P engine node
>>>> ovs_interface_shadow to wrap the OVSDB input ovs_interface with an
> extra
>>>> copy of the older version of the column external_ids, and replace the
>>>> input OVS_interface of the runtime_data node. In the change handler of
> the
>>>> runtime_data node we evaluates if the change needs to be handled before
>>>> calling consider_iface_claim(), so in the above scenarios the changes
>>>> don't need to be handled and won't trigger recompute any more. A test
>>>> case is also updated to capture this (which is very likely, if not
> 100%,
>>>> to fail without the fix).
>>>>
>>>> Signed-off-by: Han Zhou <hz...@ovn.org>
>>>> ---
>>>> v2: Add comments that was missing while committing v1.
>>>>
>>>>   controller/binding.c        |  70 ++++++++++++++++++++++++
>>>>   controller/binding.h        |   1 +
>>>>   controller/ovn-controller.c | 105
> +++++++++++++++++++++++++++++++++---
>>>>   tests/ovn-performance.at    |  17 ++++++
>>>>   4 files changed, 186 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/controller/binding.c b/controller/binding.c
>>>> index 7281b0485..0563d9d1e 100644
>>>> --- a/controller/binding.c
>>>> +++ b/controller/binding.c
>>>> @@ -1907,6 +1907,71 @@ is_iface_in_int_bridge(const struct
> ovsrec_interface *iface,
>>>>       return false;
>>>>   }
>>>>
>>>> +static bool
>>>> +is_ext_id_changed(const struct smap *a, const struct smap *b, const
> char *key)
>>>> +{
>>>> +    const char *value_a = smap_get(a, key);
>>>> +    const char *value_b = smap_get(b, key);
>>>> +    if ((value_a && !value_b)
>>>> +        || (!value_a && value_b)
>>>> +        || (value_a && value_b && strcmp(value_a, value_b))) {
>>>> +        return true;
>>>> +    }
>>>> +    return false;
>>>> +}
>>>> +
>>>> +/* Check if the change in 'iface_rec' is something we are interested
> in from
>>>> + * port binding perspective.  Return true if the change needs to be
> handled,
>>>> + * otherwise return false.
>>>> + *
>>>> + * The 'iface_rec' must be change tracked, i.e. iterator from
>>>> + * OVSREC_INTERFACE_TABLE_FOR_EACH_TRACKED. */
>>>> +static bool
>>>> +ovs_interface_change_need_handle(const struct ovsrec_interface
> *iface_rec,
>>>> +                                 struct shash
> *iface_table_external_ids_old)
>>>> +{
>>>> +    if (ovsrec_interface_is_updated(iface_rec,
>>>> +                                    OVSREC_INTERFACE_COL_NAME)) {
>>>> +        return true;
>>>> +    }
>>>> +    if (ovsrec_interface_is_updated(iface_rec,
>>>> +                                    OVSREC_INTERFACE_COL_OFPORT)) {
>>>> +        return true;
>>>> +    }
>>>> +    if (ovsrec_interface_is_updated(iface_rec,
>>>> +                                    OVSREC_INTERFACE_COL_TYPE)) {
>>>> +        return true;
>>>> +    }
>>>> +    if (ovsrec_interface_is_updated(iface_rec,
>>>> +
>  OVSREC_INTERFACE_COL_EXTERNAL_IDS)) {
>>>> +        /* Compare the external_ids that we are interested in with
> the old
>>>> +         * values:
>>>> +         * - iface-id
>>>> +         * - iface-id-ver
>>>> +         * - encap-ip
>>>> +         * For any other changes, such as ovn-installed,
> ovn-installed-ts, etc,
>>>> +         * we don't need to handle. */
>>>> +        struct smap *external_ids_old =
>>>> +            shash_find_data(iface_table_external_ids_old,
> iface_rec->name);
>>>> +        if (!external_ids_old) {
>>>> +            return true;
>>>> +        }
>>>> +        if (is_ext_id_changed(&iface_rec->external_ids,
> external_ids_old,
>>>> +                              "iface-id")) {
>>>> +            return true;
>>>> +        }
>>>> +        if (is_ext_id_changed(&iface_rec->external_ids,
> external_ids_old,
>>>> +                              "iface-id-ver")) {
>>>> +            return true;
>>>> +        }
>>>> +        if (is_ext_id_changed(&iface_rec->external_ids,
> external_ids_old,
>>>> +                              "encap-ip")) {
>>>> +            return true;
>>>> +        }
>>>> +    }
>>>> +    return false;
>>>> +}
>>>> +
>>>>   /* Returns true if the ovs interface changes were handled
> successfully,
>>>>    * false otherwise.
>>>>    */
>>>> @@ -2018,6 +2083,11 @@ binding_handle_ovs_interface_changes(struct
> binding_ctx_in *b_ctx_in,
>>>>               continue;
>>>>           }
>>>>
>>>> +        if (!ovs_interface_change_need_handle(
>>>> +            iface_rec, b_ctx_in->iface_table_external_ids_old)) {
>>>> +            continue;
>>>> +        }
>>>> +
>>>>           const char *iface_id = smap_get(&iface_rec->external_ids,
> "iface-id");
>>>>           int64_t ofport = iface_rec->n_ofport ? *iface_rec->ofport :
> 0;
>>>>           if (iface_id && ofport > 0 &&
>>>> diff --git a/controller/binding.h b/controller/binding.h
>>>> index 430a8d9b1..e49e1ebb6 100644
>>>> --- a/controller/binding.h
>>>> +++ b/controller/binding.h
>>>> @@ -55,6 +55,7 @@ struct binding_ctx_in {
>>>>       const struct ovsrec_bridge_table *bridge_table;
>>>>       const struct ovsrec_open_vswitch_table *ovs_table;
>>>>       const struct ovsrec_interface_table *iface_table;
>>>> +    struct shash *iface_table_external_ids_old;
>>>>   };
>>>>
>>>>   /* Locally relevant port bindings, e.g., VIFs that might be bound
> locally,
>>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>>>> index 22b7fa935..649353f7d 100644
>>>> --- a/controller/ovn-controller.c
>>>> +++ b/controller/ovn-controller.c
>>>> @@ -1042,6 +1042,91 @@ en_ofctrl_is_connected_run(struct engine_node
> *node, void *data)
>>>>       engine_set_node_state(node, EN_UNCHANGED);
>>>>   }
>>>>
>>>> +/* This engine node is to wrap the OVS_interface input and maintain a
> copy of
>>>> + * the old version of data for the column external_ids.
>>>> + *
>>>> + * There are some special considerations of this engine node:
>>>> + * 1. It has a single input OVS_interface, and it transparently
> passes the
>>>> + *    input changes as its own output data to its dependants. So
> there is no
>>>> + *    processing to OVS_interface changes but simply mark the node
> status as
>>>> + *    UPDATED (and so the run() and the change handler is the same).
>>>> + * 2. The iface_table_external_ids_old is computed/updated in the
> member
>>>> + *    clear_tracked_data(), because that is when the last round of
> processing
>>>> + *    has completed but the new IDL data is yet to refresh, so we
> replace the
>>>> + *    old data with the current data. */
>>>> +struct ed_type_ovs_interface_shadow {
>>>> +    struct ovsrec_interface_table *iface_table;
>>>> +    struct shash iface_table_external_ids_old;
>>>> +};
>>>> +
>>>> +static void *
>>>> +en_ovs_interface_shadow_init(struct engine_node *node OVS_UNUSED,
>>>> +                             struct engine_arg *arg OVS_UNUSED)
>>>> +{
>>>> +    struct ed_type_ovs_interface_shadow *data = xzalloc(sizeof *data);
>>>> +    data->iface_table = NULL;
>>>> +    shash_init(&data->iface_table_external_ids_old);
>>>> +
>>>> +    return data;
>>>> +}
>>>> +
>>>> +static void
>>>> +iface_table_external_ids_old_destroy(struct shash *table_ext_ids)
>>>> +{
>>>> +    struct shash_node *node;
>>>> +    SHASH_FOR_EACH_SAFE (node, table_ext_ids) {
>>>> +        struct smap *ext_ids = node->data;
>>>> +        smap_destroy(ext_ids);
>>>> +    }
>>>> +    shash_destroy_free_data(table_ext_ids);
>>>> +}
>>>> +
>>>> +static void
>>>> +en_ovs_interface_shadow_cleanup(void *data_)
>>>> +{
>>>> +    struct ed_type_ovs_interface_shadow *data = data_;
>>>> +
>  iface_table_external_ids_old_destroy(&data->iface_table_external_ids_old);
>>>> +}
>>>> +
>>>> +static void
>>>> +en_ovs_interface_shadow_clear_tracked_data(void *data_)
>>>> +{
>>>> +    struct ed_type_ovs_interface_shadow *data = data_;
>>>> +
>  iface_table_external_ids_old_destroy(&data->iface_table_external_ids_old);
>>>> +    shash_init(&data->iface_table_external_ids_old);
>>>> +
>>>> +    if (!data->iface_table) {
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    const struct ovsrec_interface *iface_rec;
>>>> +    OVSREC_INTERFACE_TABLE_FOR_EACH (iface_rec, data->iface_table) {
>>>> +        struct smap *external_ids = xmalloc(sizeof *external_ids);
>>>> +        smap_clone(external_ids, &iface_rec->external_ids);
>>>> +        shash_add(&data->iface_table_external_ids_old,
> iface_rec->name,
>>>> +                  external_ids);
>>>> +    }
>>>> +}
>>>> +
>>>> +static void
>>>> +en_ovs_interface_shadow_run(struct engine_node *node, void *data_)
>>>> +{
>>>> +    struct ed_type_ovs_interface_shadow *data = data_;
>>>> +    struct ovsrec_interface_table *iface_table =
>>>> +        (struct ovsrec_interface_table *)EN_OVSDB_GET(
>>>> +            engine_get_input("OVS_interface", node));
>>>> +    data->iface_table = iface_table;
>>>> +    engine_set_node_state(node, EN_UPDATED);
>>>> +}
>>>> +
>>>> +static bool
>>>> +ovs_interface_shadow_ovs_interface_handler(struct engine_node *node,
>>>> +                                           void *data_)
>>>> +{
>>>> +    en_ovs_interface_shadow_run(node, data_);
>>>> +    return true;
>>>> +}
>>>> +
>>>>   struct ed_type_runtime_data {
>>>>       /* Contains "struct local_datapath" nodes. */
>>>>       struct hmap local_datapaths;
>>>> @@ -1214,9 +1299,8 @@ init_binding_ctx(struct engine_node *node,
>>>>           (struct ovsrec_port_table *)EN_OVSDB_GET(
>>>>               engine_get_input("OVS_port", node));
>>>>
>>>> -    struct ovsrec_interface_table *iface_table =
>>>> -        (struct ovsrec_interface_table *)EN_OVSDB_GET(
>>>> -            engine_get_input("OVS_interface", node));
>>>> +    struct ed_type_ovs_interface_shadow *iface_shadow =
>>>> +        engine_get_input_data("ovs_interface_shadow", node);
>>>>
>>>>       struct ovsrec_qos_table *qos_table =
>>>>           (struct ovsrec_qos_table *)EN_OVSDB_GET(
>>>> @@ -1249,7 +1333,9 @@ init_binding_ctx(struct engine_node *node,
>>>>       b_ctx_in->sbrec_port_binding_by_datapath =
> sbrec_port_binding_by_datapath;
>>>>       b_ctx_in->sbrec_port_binding_by_name =
> sbrec_port_binding_by_name;
>>>>       b_ctx_in->port_table = port_table;
>>>> -    b_ctx_in->iface_table = iface_table;
>>>> +    b_ctx_in->iface_table = iface_shadow->iface_table;
>>>> +    b_ctx_in->iface_table_external_ids_old =
>>>> +        &iface_shadow->iface_table_external_ids_old;
>>>>       b_ctx_in->qos_table = qos_table;
>>>>       b_ctx_in->port_binding_table = pb_table;
>>>>       b_ctx_in->br_int = br_int;
>>>> @@ -1331,7 +1417,7 @@ en_runtime_data_run(struct engine_node *node,
> void *data)
>>>>   }
>>>>
>>>>   static bool
>>>> -runtime_data_ovs_interface_handler(struct engine_node *node, void
> *data)
>>>> +runtime_data_ovs_interface_shadow_handler(struct engine_node *node,
> void *data)
>>>>   {
>>>>       struct ed_type_runtime_data *rt_data = data;
>>>>       struct binding_ctx_in b_ctx_in;
>>>> @@ -3338,6 +3424,8 @@ main(int argc, char *argv[])
>>>>
>>>>       /* Define inc-proc-engine nodes. */
>>>>       ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, "ct_zones");
>>>> +    ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow,
>>>> +                                      "ovs_interface_shadow");
>>>>       ENGINE_NODE_WITH_CLEAR_TRACK_DATA(runtime_data, "runtime_data");
>>>>       ENGINE_NODE(non_vif_data, "non_vif_data");
>>>>       ENGINE_NODE(mff_ovn_geneve, "mff_ovn_geneve");
>>>> @@ -3459,6 +3547,9 @@ main(int argc, char *argv[])
>>>>       engine_add_input(&en_ct_zones, &en_runtime_data,
>>>>                        ct_zones_runtime_data_handler);
>>>>
>>>> +    engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface,
>>>> +                     ovs_interface_shadow_ovs_interface_handler);
>>>> +
>>>>       engine_add_input(&en_runtime_data, &en_ofctrl_is_connected,
> NULL);
>>>>
>>>>       engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, NULL);
>>>> @@ -3480,8 +3571,8 @@ main(int argc, char *argv[])
>>>>        */
>>>>       engine_add_input(&en_runtime_data, &en_ovs_port,
>>>>                        engine_noop_handler);
>>>> -    engine_add_input(&en_runtime_data, &en_ovs_interface,
>>>> -                     runtime_data_ovs_interface_handler);
>>>> +    engine_add_input(&en_runtime_data, &en_ovs_interface_shadow,
>>>> +                     runtime_data_ovs_interface_shadow_handler);
>>>>
>>>>       engine_add_input(&en_flow_output, &en_lflow_output,
>>>>                        flow_output_lflow_output_handler);
>>>> diff --git a/tests/ovn-performance.at b/tests/ovn-performance.at
>>>> index bc133f93b..9affca498 100644
>>>> --- a/tests/ovn-performance.at
>>>> +++ b/tests/ovn-performance.at
>>>> @@ -366,6 +366,23 @@ for i in 1 2; do
>>>>            ovn-nbctl wait-until Logical_Switch_Port $lp 'up=true' &&
>>>>            ovn-nbctl --wait=hv sync]
>>>>       )
>>>> +
>>>> +    # Delete and recreate $lp to make it unbind and rebind multiple
> times, and
>>>> +    # ensure no recompute is triggered.
>>>> +    for k in $(seq 10); do
>>>> +        OVN_CONTROLLER_EXPECT_NO_HIT(
>>>> +            [hv$i hv$j], [lflow_run],
>>>> +            [as hv$i ovs-vsctl set Interface $vif
> external-ids:iface-id=xxxx &&
>>>> +             ovn-nbctl wait-until Logical_Switch_Port $lp 'up=false'
> &&
>>>> +             ovn-nbctl --wait=hv sync]
>>>> +        )
>>>> +        OVN_CONTROLLER_EXPECT_NO_HIT(
>>>> +            [hv$i hv$j], [lflow_run],
>>>> +            [as hv$i ovs-vsctl set Interface $vif
> external-ids:iface-id=$lp &&
>>>> +             ovn-nbctl wait-until Logical_Switch_Port $lp 'up=true' &&
>>>> +             ovn-nbctl --wait=hv sync]
>>>> +        )
>>>> +    done
>>>>   done
>>>>
>>>>   for i in 1 2; do
>>>>
>>>
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to