On 5/12/22 10:58, Dumitru Ceara wrote:
> On 5/3/22 21:37, Han Zhou wrote:
>> During the process of claiming a VIF port on a chassis, there are
>> multiple SB and local OVSDB updates by ovn-controller, which may trigger
>> ovn-controller recompute, depending on the timing between the completion
>> of SB and local OVSDB updates:
>>
>> After port claiming, ovn-controller sends an update to the local OVSDB
>> to clean the Interface:external_ids:ovn-installed, and also sends an
>> update to the SB DB to update the Port_Binding:chassis.  If the local
>> OVSDB update notification comes back before the SB update completes,
>> ovn-controller's SB IDL is still "in transaction" and read-only, but the
>> local Interface update triggers the I-P engine runtime_data node to
>> handle the OVS_interface change, which considers the VIF to be claimed
>> (again) potentially, which requires SB updates while it is not possible
>> at the moment, and so the I-P handler returns failure and the engine
>> aborts, which triggers recompute at the next I-P engine iteration when
>> SB DB is writable.
>>
>> If we are lucky at the above step that the SB update completes before
>> the local OVSDB update is received, ovn-controller would handle the
>> change incrementally (and figures out that the SB Port_Binding chassis
>> is already the current one thus wouldn't reclaim the VIF). But the next
>> step after it completes flow computing for the newly bound VIF it sends
>> another update to the local OVSDB to set the
>> Interface:external_ids:ovn-installed and ovn-installed-ts, while at the
>> same time it also sends an update to the SB DB to set the
>> Port_Binding:up. These two parallel updates would lead to the same
>> situation as the above step and potentially cause ovn-controll
>> recompute.
>>
>> The recompute is very likely to be triggered in production because the
>> SB DB is remote (on central nodes) and has heavier load, which requires
>> longer time to complete the updates than the local OVSDB server.
>>
>> The problem is that the runtime_data's OVS_interface change handler
>> couldn't handle the update of the Interface:external_ids:ovn-installed
>> incrementally, because ovn-controller assumes it could require an SB
>> update that is not possible at the moment. However, the assumption is
>> not true. In fact it shouldn't even care about the change because this
>> external_id:ovn-installed is written by ovn-controller itself and it is
>> write-only (no need to read it as an input for any further processing).
>>
>> So, the idea to fix the problem is to ignore such changes that happen to
>> the "write-only" fields only. OVSDB change tracking provides APIs to
>> check if a specific column is updated, but the difficulty here is that
>> the column external_ids is a k-v map which contains not only the data
>> that ovn-controller can ignore, such as ovn-installed, ovn-installed-ts,
>> but also the data ovn-controller cares about, such as iface-id,
>> iface-id-ver and encap-ip. To solve this problem, we need to know what's
>> changed exactly inside the external_ids map.
>>
>> This patch solves it by introducing a new I-P engine node
>> ovs_interface_shadow to wrap the OVSDB input ovs_interface with an extra
>> copy of the older version of the column external_ids, and replace the
>> input OVS_interface of the runtime_data node. In the change handler of the
>> runtime_data node we evaluates if the change needs to be handled before
>> calling consider_iface_claim(), so in the above scenarios the changes
>> don't need to be handled and won't trigger recompute any more. A test
>> case is also updated to capture this (which is very likely, if not 100%,
>> to fail without the fix).
>>
>> Signed-off-by: Han Zhou <hz...@ovn.org>
> 
> Hi Han,
> 
> Thanks for the fix!
> 
> I would add:
> Fixes: 4d3cb42b076b ("binding: Set Logical_Switch_Port.up when all OVS flows 
> are installed.")
> 
> I have another tiny comment below.  I'm currently rerunning the
> ovn-kuberenetes tests with your patch applied [0] because the run done
> by ovsrobot failed to bring up a cluster [1] and I want to make sure
> we're not breaking anything.
> 

The CI job passed so, with the minor comments addressed:

Acked-by: Dumitru Ceara <dce...@redhat.com>

> Thanks,
> Dumitru
> 
> [0] https://github.com/dceara/ovn/actions/runs/2312210800
> [1] 
> https://github.com/ovsrobot/ovn/runs/6280999584?check_suite_focus=true#step:9:725
> 
>> ---
>> v2: Add comments that was missing while committing v1.
>>
>>  controller/binding.c        |  70 ++++++++++++++++++++++++
>>  controller/binding.h        |   1 +
>>  controller/ovn-controller.c | 105 +++++++++++++++++++++++++++++++++---
>>  tests/ovn-performance.at    |  17 ++++++
>>  4 files changed, 186 insertions(+), 7 deletions(-)
>>
>> diff --git a/controller/binding.c b/controller/binding.c
>> index 7281b0485..0563d9d1e 100644
>> --- a/controller/binding.c
>> +++ b/controller/binding.c
>> @@ -1907,6 +1907,71 @@ is_iface_in_int_bridge(const struct ovsrec_interface 
>> *iface,
>>      return false;
>>  }
>>  
>> +static bool
>> +is_ext_id_changed(const struct smap *a, const struct smap *b, const char 
>> *key)
>> +{
>> +    const char *value_a = smap_get(a, key);
>> +    const char *value_b = smap_get(b, key);
>> +    if ((value_a && !value_b)
>> +        || (!value_a && value_b)
>> +        || (value_a && value_b && strcmp(value_a, value_b))) {
>> +        return true;
>> +    }
>> +    return false;
>> +}
>> +
>> +/* Check if the change in 'iface_rec' is something we are interested in from
>> + * port binding perspective.  Return true if the change needs to be handled,
>> + * otherwise return false.
>> + *
>> + * The 'iface_rec' must be change tracked, i.e. iterator from
>> + * OVSREC_INTERFACE_TABLE_FOR_EACH_TRACKED. */
>> +static bool
>> +ovs_interface_change_need_handle(const struct ovsrec_interface *iface_rec,
>> +                                 struct shash *iface_table_external_ids_old)
>> +{
>> +    if (ovsrec_interface_is_updated(iface_rec,
>> +                                    OVSREC_INTERFACE_COL_NAME)) {
>> +        return true;
>> +    }
>> +    if (ovsrec_interface_is_updated(iface_rec,
>> +                                    OVSREC_INTERFACE_COL_OFPORT)) {
>> +        return true;
>> +    }
>> +    if (ovsrec_interface_is_updated(iface_rec,
>> +                                    OVSREC_INTERFACE_COL_TYPE)) {
>> +        return true;
>> +    }
>> +    if (ovsrec_interface_is_updated(iface_rec,
>> +                                    OVSREC_INTERFACE_COL_EXTERNAL_IDS)) {
>> +        /* Compare the external_ids that we are interested in with the old
>> +         * values:
>> +         * - iface-id
>> +         * - iface-id-ver
>> +         * - encap-ip
>> +         * For any other changes, such as ovn-installed, ovn-installed-ts, 
>> etc,
>> +         * we don't need to handle. */
>> +        struct smap *external_ids_old =
>> +            shash_find_data(iface_table_external_ids_old, iface_rec->name);
>> +        if (!external_ids_old) {
>> +            return true;
>> +        }
>> +        if (is_ext_id_changed(&iface_rec->external_ids, external_ids_old,
>> +                              "iface-id")) {
>> +            return true;
>> +        }
>> +        if (is_ext_id_changed(&iface_rec->external_ids, external_ids_old,
>> +                              "iface-id-ver")) {
>> +            return true;
>> +        }
>> +        if (is_ext_id_changed(&iface_rec->external_ids, external_ids_old,
>> +                              "encap-ip")) {
>> +            return true;
>> +        }
>> +    }
>> +    return false;
>> +}
>> +
>>  /* Returns true if the ovs interface changes were handled successfully,
>>   * false otherwise.
>>   */
>> @@ -2018,6 +2083,11 @@ binding_handle_ovs_interface_changes(struct 
>> binding_ctx_in *b_ctx_in,
>>              continue;
>>          }
>>  
>> +        if (!ovs_interface_change_need_handle(
>> +            iface_rec, b_ctx_in->iface_table_external_ids_old)) {
>> +            continue;
>> +        }
>> +
>>          const char *iface_id = smap_get(&iface_rec->external_ids, 
>> "iface-id");
>>          int64_t ofport = iface_rec->n_ofport ? *iface_rec->ofport : 0;
>>          if (iface_id && ofport > 0 &&
>> diff --git a/controller/binding.h b/controller/binding.h
>> index 430a8d9b1..e49e1ebb6 100644
>> --- a/controller/binding.h
>> +++ b/controller/binding.h
>> @@ -55,6 +55,7 @@ struct binding_ctx_in {
>>      const struct ovsrec_bridge_table *bridge_table;
>>      const struct ovsrec_open_vswitch_table *ovs_table;
>>      const struct ovsrec_interface_table *iface_table;
>> +    struct shash *iface_table_external_ids_old;
>>  };
>>  
>>  /* Locally relevant port bindings, e.g., VIFs that might be bound locally,
>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>> index 22b7fa935..649353f7d 100644
>> --- a/controller/ovn-controller.c
>> +++ b/controller/ovn-controller.c
>> @@ -1042,6 +1042,91 @@ en_ofctrl_is_connected_run(struct engine_node *node, 
>> void *data)
>>      engine_set_node_state(node, EN_UNCHANGED);
>>  }
>>  
>> +/* This engine node is to wrap the OVS_interface input and maintain a copy 
>> of
>> + * the old version of data for the column external_ids.
>> + *
>> + * There are some special considerations of this engine node:
>> + * 1. It has a single input OVS_interface, and it transparently passes the
>> + *    input changes as its own output data to its dependants. So there is no
>> + *    processing to OVS_interface changes but simply mark the node status as
>> + *    UPDATED (and so the run() and the change handler is the same).
>> + * 2. The iface_table_external_ids_old is computed/updated in the member
>> + *    clear_tracked_data(), because that is when the last round of 
>> processing
>> + *    has completed but the new IDL data is yet to refresh, so we replace 
>> the
>> + *    old data with the current data. */
>> +struct ed_type_ovs_interface_shadow {
>> +    struct ovsrec_interface_table *iface_table;
>> +    struct shash iface_table_external_ids_old;
>> +};
>> +
>> +static void *
>> +en_ovs_interface_shadow_init(struct engine_node *node OVS_UNUSED,
>> +                             struct engine_arg *arg OVS_UNUSED)
>> +{
>> +    struct ed_type_ovs_interface_shadow *data = xzalloc(sizeof *data);
>> +    data->iface_table = NULL;
>> +    shash_init(&data->iface_table_external_ids_old);
>> +
>> +    return data;
>> +}
>> +
>> +static void
>> +iface_table_external_ids_old_destroy(struct shash *table_ext_ids)
>> +{
>> +    struct shash_node *node;
>> +    SHASH_FOR_EACH_SAFE (node, table_ext_ids) {
>> +        struct smap *ext_ids = node->data;
>> +        smap_destroy(ext_ids);
> 
> Nit: no need for SHASH_FOR_EACH_SAFE as far as I can tell.
> 
> 
>> +    }
>> +    shash_destroy_free_data(table_ext_ids);
>> +}
>> +
>> +static void
>> +en_ovs_interface_shadow_cleanup(void *data_)
>> +{
>> +    struct ed_type_ovs_interface_shadow *data = data_;
>> +    
>> iface_table_external_ids_old_destroy(&data->iface_table_external_ids_old);
>> +}
>> +
>> +static void
>> +en_ovs_interface_shadow_clear_tracked_data(void *data_)
>> +{
>> +    struct ed_type_ovs_interface_shadow *data = data_;
>> +    
>> iface_table_external_ids_old_destroy(&data->iface_table_external_ids_old);
>> +    shash_init(&data->iface_table_external_ids_old);
>> +
>> +    if (!data->iface_table) {
>> +        return;
>> +    }
>> +
>> +    const struct ovsrec_interface *iface_rec;
>> +    OVSREC_INTERFACE_TABLE_FOR_EACH (iface_rec, data->iface_table) {
>> +        struct smap *external_ids = xmalloc(sizeof *external_ids);
>> +        smap_clone(external_ids, &iface_rec->external_ids);
>> +        shash_add(&data->iface_table_external_ids_old, iface_rec->name,
>> +                  external_ids);
>> +    }
>> +}
>> +
>> +static void
>> +en_ovs_interface_shadow_run(struct engine_node *node, void *data_)
>> +{
>> +    struct ed_type_ovs_interface_shadow *data = data_;
>> +    struct ovsrec_interface_table *iface_table =
>> +        (struct ovsrec_interface_table *)EN_OVSDB_GET(
>> +            engine_get_input("OVS_interface", node));
>> +    data->iface_table = iface_table;
>> +    engine_set_node_state(node, EN_UPDATED);
>> +}
>> +
>> +static bool
>> +ovs_interface_shadow_ovs_interface_handler(struct engine_node *node,
>> +                                           void *data_)
>> +{
>> +    en_ovs_interface_shadow_run(node, data_);
>> +    return true;
>> +}
>> +
>>  struct ed_type_runtime_data {
>>      /* Contains "struct local_datapath" nodes. */
>>      struct hmap local_datapaths;
>> @@ -1214,9 +1299,8 @@ init_binding_ctx(struct engine_node *node,
>>          (struct ovsrec_port_table *)EN_OVSDB_GET(
>>              engine_get_input("OVS_port", node));
>>  
>> -    struct ovsrec_interface_table *iface_table =
>> -        (struct ovsrec_interface_table *)EN_OVSDB_GET(
>> -            engine_get_input("OVS_interface", node));
>> +    struct ed_type_ovs_interface_shadow *iface_shadow =
>> +        engine_get_input_data("ovs_interface_shadow", node);
>>  
>>      struct ovsrec_qos_table *qos_table =
>>          (struct ovsrec_qos_table *)EN_OVSDB_GET(
>> @@ -1249,7 +1333,9 @@ init_binding_ctx(struct engine_node *node,
>>      b_ctx_in->sbrec_port_binding_by_datapath = 
>> sbrec_port_binding_by_datapath;
>>      b_ctx_in->sbrec_port_binding_by_name = sbrec_port_binding_by_name;
>>      b_ctx_in->port_table = port_table;
>> -    b_ctx_in->iface_table = iface_table;
>> +    b_ctx_in->iface_table = iface_shadow->iface_table;
>> +    b_ctx_in->iface_table_external_ids_old =
>> +        &iface_shadow->iface_table_external_ids_old;
>>      b_ctx_in->qos_table = qos_table;
>>      b_ctx_in->port_binding_table = pb_table;
>>      b_ctx_in->br_int = br_int;
>> @@ -1331,7 +1417,7 @@ en_runtime_data_run(struct engine_node *node, void 
>> *data)
>>  }
>>  
>>  static bool
>> -runtime_data_ovs_interface_handler(struct engine_node *node, void *data)
>> +runtime_data_ovs_interface_shadow_handler(struct engine_node *node, void 
>> *data)
>>  {
>>      struct ed_type_runtime_data *rt_data = data;
>>      struct binding_ctx_in b_ctx_in;
>> @@ -3338,6 +3424,8 @@ main(int argc, char *argv[])
>>  
>>      /* Define inc-proc-engine nodes. */
>>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, "ct_zones");
>> +    ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow,
>> +                                      "ovs_interface_shadow");
>>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA(runtime_data, "runtime_data");
>>      ENGINE_NODE(non_vif_data, "non_vif_data");
>>      ENGINE_NODE(mff_ovn_geneve, "mff_ovn_geneve");
>> @@ -3459,6 +3547,9 @@ main(int argc, char *argv[])
>>      engine_add_input(&en_ct_zones, &en_runtime_data,
>>                       ct_zones_runtime_data_handler);
>>  
>> +    engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface,
>> +                     ovs_interface_shadow_ovs_interface_handler);
>> +
>>      engine_add_input(&en_runtime_data, &en_ofctrl_is_connected, NULL);
>>  
>>      engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, NULL);
>> @@ -3480,8 +3571,8 @@ main(int argc, char *argv[])
>>       */
>>      engine_add_input(&en_runtime_data, &en_ovs_port,
>>                       engine_noop_handler);
>> -    engine_add_input(&en_runtime_data, &en_ovs_interface,
>> -                     runtime_data_ovs_interface_handler);
>> +    engine_add_input(&en_runtime_data, &en_ovs_interface_shadow,
>> +                     runtime_data_ovs_interface_shadow_handler);
>>  
>>      engine_add_input(&en_flow_output, &en_lflow_output,
>>                       flow_output_lflow_output_handler);
>> diff --git a/tests/ovn-performance.at b/tests/ovn-performance.at
>> index bc133f93b..9affca498 100644
>> --- a/tests/ovn-performance.at
>> +++ b/tests/ovn-performance.at
>> @@ -366,6 +366,23 @@ for i in 1 2; do
>>           ovn-nbctl wait-until Logical_Switch_Port $lp 'up=true' &&
>>           ovn-nbctl --wait=hv sync]
>>      )
>> +
>> +    # Delete and recreate $lp to make it unbind and rebind multiple times, 
>> and
>> +    # ensure no recompute is triggered.
>> +    for k in $(seq 10); do
>> +        OVN_CONTROLLER_EXPECT_NO_HIT(
>> +            [hv$i hv$j], [lflow_run],
>> +            [as hv$i ovs-vsctl set Interface $vif 
>> external-ids:iface-id=xxxx &&
>> +             ovn-nbctl wait-until Logical_Switch_Port $lp 'up=false' &&
>> +             ovn-nbctl --wait=hv sync]
>> +        )
>> +        OVN_CONTROLLER_EXPECT_NO_HIT(
>> +            [hv$i hv$j], [lflow_run],
>> +            [as hv$i ovs-vsctl set Interface $vif external-ids:iface-id=$lp 
>> &&
>> +             ovn-nbctl wait-until Logical_Switch_Port $lp 'up=true' &&
>> +             ovn-nbctl --wait=hv sync]
>> +        )
>> +    done
>>  done
>>  
>>  for i in 1 2; do
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to