On 4/21/26 4:44 PM, Jacob Tanenbaum wrote:
> When ovn-monitor-all is set to false ovn-controller sets ovn-installed
> on OVS interfaces too early. ovn-controller needs to wait for the
> response from the southbound database with the updates to the newly
> monitored fields only then can it install flows and label the OVS
> interface as installed.
> 
> Reported-at: https://redhat.atlassian.net/browse/FDP-2887
> Signed-off-by: Jacob Tanenbaum <[email protected]>
> 

Hi Jacob,

Thanks for the new revision!

> ---
> v7->v8
> * removed printing in the system testcases that shouldn't be there and
>   caused failure.
> 
> v6->v7
> * added an sset that holds the uuid's of datapaths that are waiting for
>   sb updates
> * added back the the state OIF_WAITING_SB_COND to the if_mgr state
>   machine.
> * no longer hold if the datapath is updated in the local_datapaths
>   struct as that is generated each transaction and cannot be relied
>   upon.
> * removed some leftover logic from previous patch versions
> 
> v5->v6
> * simplified the logic as requested, Ales saw that we did not need to
>   save the seqno if we checked before the engine run.
> * removing the extra state from the state machine
> * removing some leftover logic that was seen.
> 
> v4->v5
> * corrected a sanitizer error: used bool update_seqno without
>   initializing
> 
> v3->v4
> * Added state OIF_WAITING_SB_COND to the state machine that manages
>   the adding of interfaces. This state waits until the Southbound has
>   updated the ovn-controller of relevent information about ports related
>   to it
> * Addressed several nits
> 
> v2->v3
> * adding the ld->monitor_updated required the additiona of checking for
>   monitor_all in update_sb_monitors. I didn't account for being able to
>   toggle on monitor_all
> 
> v1->v2
> * if_status_mgr_run() will run everytime the conditional seqno is
>   changed so it should be safe to only skip when the expected_seqno and
>   seqno returned from ovn are strictly not equal, that way we do not
>   have to deal with overflow in the seqno. Additionally add a boolean to
>   the local_datapath in the event that the seqno wraps around at the
>   same time the datapath would go back into the state OIF_INSTALL_FLOWS.
> * remove setting the state to itself for OIF_INSTALL_FLOWS in
>   if_status_mgr_update()
> * added assert(pb) in if_status_mgr_run()
> * removed a manual loop looking for the local_datapath and replaced with
>   get_local_datapath() in if_status_mgr_run
> * remove a few nit spelling errors in the test case
> 
> diff --git a/controller/if-status.c b/controller/if-status.c
> index ee9337e63..34875d23e 100644
> --- a/controller/if-status.c
> +++ b/controller/if-status.c
> @@ -18,6 +18,7 @@
>  #include "binding.h"
>  #include "if-status.h"
>  #include "lib/ofctrl-seqno.h"
> +#include "local_data.h"
>  #include "ovsport.h"
>  #include "simap.h"
>  
> @@ -58,6 +59,11 @@ VLOG_DEFINE_THIS_MODULE(if_status);
>  enum if_state {
>      OIF_CLAIMED,          /* Newly claimed interface. pb->chassis update not
>                               yet initiated. */
> +    OIF_WAITING_SB_COND,  /* Waiting for the Southbound database to update
> +                           * ovn-controller for a given datapath. We should
> +                           * only be waiting in this state when monitor_all
> +                           * is false AND it is the first time that we see
> +                           * a specific datapath. */
>      OIF_INSTALL_FLOWS,    /* Claimed interface with pb->chassis update sent 
> to
>                             * SB (but update notification not confirmed, so 
> the
>                             * update may be resent in any of the following
> @@ -87,6 +93,7 @@ enum if_state {
>  
>  static const char *if_state_names[] = {
>      [OIF_CLAIMED]          = "CLAIMED",
> +    [OIF_WAITING_SB_COND]  = "WAITING_SB_COND",
>      [OIF_INSTALL_FLOWS]    = "INSTALL_FLOWS",
>      [OIF_REM_OLD_OVN_INST] = "REM_OLD_OVN_INST",
>      [OIF_MARK_UP]          = "MARK_UP",
> @@ -114,7 +121,18 @@ static const char *if_state_names[] = {
>   * | |                 |  +--+                                           | | 
> |
>   * | |                 |                                                 | | 
> |
>   * | |                 | mgr_update(when sb is rw i.e. pb->chassis)      | | 
> |
> - * | |                 |            has been updated                     | | 
> |
> + * | |                 V            has been updated                     | | 
> |
> + * | |   +----------------------+                                        | | 
> |
> + * | |   |                      |                                        | | 
> |
> + * | |   |    WAITING_SB_COND   |                                        | | 
> |
> + * | |   |                      |                                        | | 
> |
> + * | |   |                      |                                        | | 
> |
> + * | |   +----------------------+                                        | | 
> |
> + * | |                 |                                                 | | 
> |
> + * | |                 |                                                 | | 
> |
> + * | |                 |   mgr_update(when sb_cond_seqno == expected)    | | 
> |
> + * | |                 |   - request seqno                               | | 
> |
> + * | |                 |                                                 | | 
> |
>   * | | release_iface   | - request seqno                                 | | 
> |
>   * | |                 |                                                 | | 
> |
>   * | |                 V                                                 | | 
> |
> @@ -335,6 +353,7 @@ if_status_mgr_claim_iface(struct if_status_mgr *mgr,
>  
>      switch (iface->state) {
>      case OIF_CLAIMED:
> +    case OIF_WAITING_SB_COND:
>      case OIF_INSTALL_FLOWS:
>      case OIF_REM_OLD_OVN_INST:
>      case OIF_MARK_UP:
> @@ -383,6 +402,7 @@ if_status_mgr_release_iface(struct if_status_mgr *mgr, 
> const char *iface_id)
>  
>      switch (iface->state) {
>      case OIF_CLAIMED:
> +    case OIF_WAITING_SB_COND:
>      case OIF_INSTALL_FLOWS:
>          /* Not yet fully installed interfaces:
>           * pb->chassis still need to be deleted.
> @@ -424,6 +444,7 @@ if_status_mgr_delete_iface(struct if_status_mgr *mgr, 
> const char *iface_id,
>  
>      switch (iface->state) {
>      case OIF_CLAIMED:
> +    case OIF_WAITING_SB_COND:
>      case OIF_INSTALL_FLOWS:
>          /* Not yet fully installed interfaces:
>           * pb->chassis still need to be deleted.
> @@ -500,6 +521,8 @@ if_status_mgr_update(struct if_status_mgr *mgr,
>                       const struct sbrec_chassis *chassis_rec,
>                       const struct ovsrec_interface_table *iface_table,
>                       const struct sbrec_port_binding_table *pb_table,
> +                     const struct hmap *local_datapaths,
> +                     const struct sset *waiting_sb_update,

I was going to suggest a more descriptive name here but (see comment in
ovn-controller.c below) given that I think we should be storing this set
in runtime_data I wonder now if we should just have a flag in 'struct
local_datapath' that tells us if the datapath is fully processed or if
we're "waiting_sb_update".

Then we wouldn't need this new sset/uuidset.

>                       bool ovs_readonly,
>                       bool sb_readonly)
>  {
> @@ -622,9 +645,7 @@ if_status_mgr_update(struct if_status_mgr *mgr,
>               * in if_status_handle_claims or if_status_mgr_claim_iface
>               */
>              if (iface->is_vif) {
> -                ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);
> -                iface->install_seqno = mgr->iface_seqno + 1;
> -                new_ifaces = true;
> +                ovs_iface_set_state(mgr, iface, OIF_WAITING_SB_COND);
>              } else {
>                  ovs_iface_set_state(mgr, iface, OIF_MARK_UP);
>              }
> @@ -639,6 +660,32 @@ if_status_mgr_update(struct if_status_mgr *mgr,
>          }
>      }
>  
> +    if (!sb_readonly) {
> +        HMAPX_FOR_EACH_SAFE (node,
> +                             &mgr->ifaces_per_state[OIF_WAITING_SB_COND]) {
> +            struct ovs_iface *iface = node->data;
> +            if (local_datapaths) {
> +                const struct sbrec_port_binding *pb =
> +                    sbrec_port_binding_table_get_for_uuid(pb_table,
> +                                                          &iface->pb_uuid);
> +                ovs_assert(pb);

Is there a chance that the port binding is deleted from the SB database
while the interface is still in OIF_WAITING_SB_COND state.  All other
port binding lookups in this file check for NULL gracefully (e.g., "if
(pb && ...)").  Should this use a NULL check instead?


> +                struct local_datapath *ld =
> +                    get_local_datapath(local_datapaths,
> +                                       pb->datapath->tunnel_key);
> +                if (!ld) {
> +                    continue;
> +                }
> +                char *uuid = uuid_to_string(&ld->datapath->header_.uuid);
> +                if (!sset_contains(waiting_sb_update, uuid)) {
> +                    ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);
> +                    iface->install_seqno = mgr->iface_seqno + 1;
> +                    new_ifaces = true;
> +                }
> +                free(uuid);
> +            }
> +        }
> +    }
> +

In if_status_mgr_update_bindings() we do local_binding_set_down() for
all ifaces that are in state OIF_INSTALL_FLOWS.  But now that we have
the new OIF_WAITING_SB_COND state (happening before transitioning to
OIF_INSTALL_FLOWS, potentially in different ovn-controller runs) should
we also call local_binding_set_down() for those too?

>      if (!sb_readonly) {
>          HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_UPDATE_PORT]) {
>              struct ovs_iface *iface = node->data;
> diff --git a/controller/if-status.h b/controller/if-status.h
> index d15ca3008..67018d113 100644
> --- a/controller/if-status.h
> +++ b/controller/if-status.h
> @@ -43,6 +43,8 @@ void if_status_mgr_update(struct if_status_mgr *, struct 
> local_binding_data *,
>                            const struct sbrec_chassis *chassis,
>                            const struct ovsrec_interface_table *iface_table,
>                            const struct sbrec_port_binding_table *pb_table,
> +                          const struct hmap *local_datapaths,
> +                          const struct sset *waiting_sb_update,
>                            bool ovs_readonly,
>                            bool sb_readonly);
>  void if_status_mgr_run(struct if_status_mgr *mgr, struct local_binding_data 
> *,
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index da43051ed..a25892056 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -1511,6 +1511,7 @@ en_runtime_data_clear_tracked_data(void *data_)
>  }
>  
>  static void *
> +

Nit: empty line not needed.

>  en_runtime_data_init(struct engine_node *node OVS_UNUSED,
>                       struct engine_arg *arg OVS_UNUSED)
>  {
> @@ -7596,6 +7597,8 @@ main(int argc, char *argv[])
>      int ovs_txn_status = 1;
>      bool sb_monitor_all = false;
>      struct tracked_acl_ids *tracked_acl_ids = NULL;
> +    struct sset waiting_sb_update;
> +    sset_init(&waiting_sb_update);

Nit: missing sset_destroy.

>      while (!exit_args.exiting) {
>          ovsrcu_quiesce_end();
>  
> @@ -7842,6 +7845,7 @@ main(int argc, char *argv[])
>  
>                      bool recompute_allowed = (ovnsb_idl_txn &&
>                                                !ofctrl_has_backlog());
> +

Nit: unrelated new line.

>                      engine_run(recompute_allowed);
>                      tracked_acl_ids = engine_get_data(&en_acl_id);
>  
> @@ -7951,6 +7955,9 @@ main(int argc, char *argv[])
>                                     
> sbrec_mirror_table_get(ovnsb_idl_loop.idl),
>                                     br_int,
>                                     &runtime_data->lbinding_data.bindings);
> +                        if (ovnsb_cond_seqno == ovnsb_expected_cond_seqno) {
> +                            sset_clear(&waiting_sb_update);
> +                        }
>                          /* Updating monitor conditions if runtime data or
>                           * logical datapath goups changed. */
>                          if (engine_node_changed(&en_runtime_data)
> @@ -7973,6 +7980,23 @@ main(int argc, char *argv[])
>                                   * a continuous reason for monitor updates. 
> */
>                                  daemon_started_recently_countdown();
>                              }
> +
> +                            if (!sb_monitor_all && runtime_data) {
> +                                struct hmap *tracked_dp_bindings =
> +                                    &runtime_data->tracked_dp_bindings;
> +                                struct tracked_datapath *tdp;
> +                                HMAP_FOR_EACH_SAFE (tdp,
> +                                                    node,
> +                                                    tracked_dp_bindings) {
> +                                    char *uuid =
> +                                        
> uuid_to_string(&tdp->dp->header_.uuid);
> +                                    if (tdp->tracked_type ==
> +                                        TRACKED_RESOURCE_NEW) {
> +                                        sset_add(&waiting_sb_update, uuid);

This should be an uuidset instead of a sset.

> +                                    }
> +                                    free(uuid);

It feels odd that we're doing this here.  Shouldn't the uuidset be part
of the runtime_data state (struct ed_type_runtime_data), updated in
add_local_datapath() -> tracked_datapath_add(.., TRACKED_RESOURCE_NEW)?

> +                                }
> +                            }
>                          }
>                          /* If there is no new expected seqno we have finished
>                           * loading all needed data from southbound. We then
> @@ -8017,6 +8041,10 @@ main(int argc, char *argv[])
>                                                      ovs_idl_loop.idl),
>                                           sbrec_port_binding_table_get(
>                                                      ovnsb_idl_loop.idl),
> +                                         runtime_data ?
> +                                               &runtime_data->local_datapaths
> +                                               : NULL,
> +                                         &waiting_sb_update,

With my comment above addressed, we'd pass
&runtime_data->waiting_sb_update here instead.

>                                           !ovs_idl_txn,
>                                           !ovnsb_idl_txn);
>                      stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
> diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
> index c98de9bc4..9dc7555ba 100644
> --- a/tests/ovn-controller.at
> +++ b/tests/ovn-controller.at
> @@ -3944,3 +3944,69 @@ OVN_CLEANUP([hv1], [hv2
>  /already has encap ip.*cannot duplicate on/d])
>  AT_CLEANUP
>  ])
> +
> +OVN_FOR_EACH_NORTHD([
> +AT_SETUP([ovn-installed])
> +ovn_start
> +
> +net_add n1
> +sim_add hv1
> +
> +as hv1
> +ovs-vsctl add-br br-phys
> +ovn_attach n1 br-phys 192.168.0.1
> +ovn-appctl vlog/set dbg
> +ovs-vsctl add-port br-int vif1 -- \
> +    set Interface vif1 external-ids:iface-id=lsp1
> +
> +check ovn-nbctl ls-add ls1
> +sleep_controller hv1
> +check ovn-nbctl --wait=sb lsp-add ls1 lsp1 -- \
> +                          lsp-set-addresses lsp1 "f0:00:00:00:00:01 10.0.0.1"
> +
> +sleep_sb
> +wake_up_controller hv1
> +
> +# Wait for pflow for lsp1

Nit: Comments should be sentences and end with dot.

> +OVS_WAIT_UNTIL([
> +    ofport=$(as hv1 ovs-vsctl --bare --columns ofport find Interface 
> name=vif1)
> +    echo "vif1 port=$ofport"
> +    test -n "$ofport" && test 1 -le $(as hv1 ovs-ofctl dump-flows br-int | 
> grep -c in_port=$ofport)
> +])
> +
> +# If ovn-installed in ovs, all flows should be installed.
> +# In that case, there should be at least one flow with lsp1 address.
> +OVS_WAIT_UNTIL([
> +    ovn_installed=$(as hv1 ovs-vsctl get Interface vif1 
> external_ids:ovn-installed)
> +    echo $ovn_installed

Should the echo be removed?

> +    flow_count=$(as hv1 ovs-ofctl dump-flows br-int | grep -Fc "10.0.0.1")
> +    # for the monitor-all=true case the flow gets installed because 
> ovn-controller is monitoring all
> +    # tables in OVN_SOUTHBOUND.
> +    if test -n "$ovn_installed"; then
> +        as hv1 ovs-ofctl dump-flows br-int > output

This too, was it for debugging purposes?

> +        test $flow_count -ge 1
> +    else
> +        true

Did you mean "false" here?  Otherwise, if ovn-installed is not present,
we break out of the OVS_WAIT_UNTIL() immediately, or am I missing something?

> +    fi
> +])
> +
> +wake_up_sb
> +# After the southbound db has woken up and can send the update to the
> +# ovn-controller not monitoring all tables in the southbound db it
> +# should be able to install the interface.
> +OVS_WAIT_UNTIL([
> +    ovn_installed=$(as hv1 ovs-vsctl get Interface vif1 
> external_ids:ovn-installed)
> +    flow_count=$(as hv1 ovs-ofctl dump-flows br-int | grep -Fc "10.0.0.1")
> +    echo "installed=$ovn_installed, count=$flow_count"
> +    if test -n "$ovn_installed"; then
> +        as hv1 ovs-ofctl dump-flows br-int > output
> +        test $flow_count -ge 1
> +    else
> +        false
> +    fi
> +])
> +wait_for_ports_up
> +
> +OVN_CLEANUP([hv1])
> +AT_CLEANUP
> +])

Regards,
Dumitru

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to