On 4/21/26 4:44 PM, Jacob Tanenbaum wrote:
> When ovn-monitor-all is set to false ovn-controller sets ovn-installed
> on OVS interfaces too early. ovn-controller needs to wait for the
> response from the southbound database with the updates to the newly
> monitored fields only then can it install flows and label the OVS
> interface as installed.
>
> Reported-at: https://redhat.atlassian.net/browse/FDP-2887
> Signed-off-by: Jacob Tanenbaum <[email protected]>
>
Hi Jacob,
Thanks for the new revision!
> ---
> v7->v8
> * removed printing in the system testcases that shouldn't be there and
> caused failure.
>
> v6->v7
> * added an sset that holds the uuid's of datapaths that are waiting for
> sb updates
> * added back the the state OIF_WAITING_SB_COND to the if_mgr state
> machine.
> * no longer hold if the datapath is updated in the local_datapaths
> struct as that is generated each transaction and cannot be relied
> upon.
> * removed some leftover logic from previous patch versions
>
> v5->v6
> * simplified the logic as requested, Ales saw that we did not need to
> save the seqno if we checked before the engine run.
> * removing the extra state from the state machine
> * removing some leftover logic that was seen.
>
> v4->v5
> * corrected a sanitizer error: used bool update_seqno without
> initializing
>
> v3->v4
> * Added state OIF_WAITING_SB_COND to the state machine that manages
> the adding of interfaces. This state waits until the Southbound has
> updated the ovn-controller of relevent information about ports related
> to it
> * Addressed several nits
>
> v2->v3
> * adding the ld->monitor_updated required the additiona of checking for
> monitor_all in update_sb_monitors. I didn't account for being able to
> toggle on monitor_all
>
> v1->v2
> * if_status_mgr_run() will run everytime the conditional seqno is
> changed so it should be safe to only skip when the expected_seqno and
> seqno returned from ovn are strictly not equal, that way we do not
> have to deal with overflow in the seqno. Additionally add a boolean to
> the local_datapath in the event that the seqno wraps around at the
> same time the datapath would go back into the state OIF_INSTALL_FLOWS.
> * remove setting the state to itself for OIF_INSTALL_FLOWS in
> if_status_mgr_update()
> * added assert(pb) in if_status_mgr_run()
> * removed a manual loop looking for the local_datapath and replaced with
> get_local_datapath() in if_status_mgr_run
> * remove a few nit spelling errors in the test case
>
> diff --git a/controller/if-status.c b/controller/if-status.c
> index ee9337e63..34875d23e 100644
> --- a/controller/if-status.c
> +++ b/controller/if-status.c
> @@ -18,6 +18,7 @@
> #include "binding.h"
> #include "if-status.h"
> #include "lib/ofctrl-seqno.h"
> +#include "local_data.h"
> #include "ovsport.h"
> #include "simap.h"
>
> @@ -58,6 +59,11 @@ VLOG_DEFINE_THIS_MODULE(if_status);
> enum if_state {
> OIF_CLAIMED, /* Newly claimed interface. pb->chassis update not
> yet initiated. */
> + OIF_WAITING_SB_COND, /* Waiting for the Southbound database to update
> + * ovn-controller for a given datapath. We should
> + * only be waiting in this state when monitor_all
> + * is false AND it is the first time that we see
> + * a specific datapath. */
> OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis update sent
> to
> * SB (but update notification not confirmed, so
> the
> * update may be resent in any of the following
> @@ -87,6 +93,7 @@ enum if_state {
>
> static const char *if_state_names[] = {
> [OIF_CLAIMED] = "CLAIMED",
> + [OIF_WAITING_SB_COND] = "WAITING_SB_COND",
> [OIF_INSTALL_FLOWS] = "INSTALL_FLOWS",
> [OIF_REM_OLD_OVN_INST] = "REM_OLD_OVN_INST",
> [OIF_MARK_UP] = "MARK_UP",
> @@ -114,7 +121,18 @@ static const char *if_state_names[] = {
> * | | | +--+ | |
> |
> * | | | | |
> |
> * | | | mgr_update(when sb is rw i.e. pb->chassis) | |
> |
> - * | | | has been updated | |
> |
> + * | | V has been updated | |
> |
> + * | | +----------------------+ | |
> |
> + * | | | | | |
> |
> + * | | | WAITING_SB_COND | | |
> |
> + * | | | | | |
> |
> + * | | | | | |
> |
> + * | | +----------------------+ | |
> |
> + * | | | | |
> |
> + * | | | | |
> |
> + * | | | mgr_update(when sb_cond_seqno == expected) | |
> |
> + * | | | - request seqno | |
> |
> + * | | | | |
> |
> * | | release_iface | - request seqno | |
> |
> * | | | | |
> |
> * | | V | |
> |
> @@ -335,6 +353,7 @@ if_status_mgr_claim_iface(struct if_status_mgr *mgr,
>
> switch (iface->state) {
> case OIF_CLAIMED:
> + case OIF_WAITING_SB_COND:
> case OIF_INSTALL_FLOWS:
> case OIF_REM_OLD_OVN_INST:
> case OIF_MARK_UP:
> @@ -383,6 +402,7 @@ if_status_mgr_release_iface(struct if_status_mgr *mgr,
> const char *iface_id)
>
> switch (iface->state) {
> case OIF_CLAIMED:
> + case OIF_WAITING_SB_COND:
> case OIF_INSTALL_FLOWS:
> /* Not yet fully installed interfaces:
> * pb->chassis still need to be deleted.
> @@ -424,6 +444,7 @@ if_status_mgr_delete_iface(struct if_status_mgr *mgr,
> const char *iface_id,
>
> switch (iface->state) {
> case OIF_CLAIMED:
> + case OIF_WAITING_SB_COND:
> case OIF_INSTALL_FLOWS:
> /* Not yet fully installed interfaces:
> * pb->chassis still need to be deleted.
> @@ -500,6 +521,8 @@ if_status_mgr_update(struct if_status_mgr *mgr,
> const struct sbrec_chassis *chassis_rec,
> const struct ovsrec_interface_table *iface_table,
> const struct sbrec_port_binding_table *pb_table,
> + const struct hmap *local_datapaths,
> + const struct sset *waiting_sb_update,
I was going to suggest a more descriptive name here but (see comment in
ovn-controller.c below) given that I think we should be storing this set
in runtime_data I wonder now if we should just have a flag in 'struct
local_datapath' that tells us if the datapath is fully processed or if
we're "waiting_sb_update".
Then we wouldn't need this new sset/uuidset.
> bool ovs_readonly,
> bool sb_readonly)
> {
> @@ -622,9 +645,7 @@ if_status_mgr_update(struct if_status_mgr *mgr,
> * in if_status_handle_claims or if_status_mgr_claim_iface
> */
> if (iface->is_vif) {
> - ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);
> - iface->install_seqno = mgr->iface_seqno + 1;
> - new_ifaces = true;
> + ovs_iface_set_state(mgr, iface, OIF_WAITING_SB_COND);
> } else {
> ovs_iface_set_state(mgr, iface, OIF_MARK_UP);
> }
> @@ -639,6 +660,32 @@ if_status_mgr_update(struct if_status_mgr *mgr,
> }
> }
>
> + if (!sb_readonly) {
> + HMAPX_FOR_EACH_SAFE (node,
> + &mgr->ifaces_per_state[OIF_WAITING_SB_COND]) {
> + struct ovs_iface *iface = node->data;
> + if (local_datapaths) {
> + const struct sbrec_port_binding *pb =
> + sbrec_port_binding_table_get_for_uuid(pb_table,
> + &iface->pb_uuid);
> + ovs_assert(pb);
Is there a chance that the port binding is deleted from the SB database
while the interface is still in OIF_WAITING_SB_COND state. All other
port binding lookups in this file check for NULL gracefully (e.g., "if
(pb && ...)"). Should this use a NULL check instead?
> + struct local_datapath *ld =
> + get_local_datapath(local_datapaths,
> + pb->datapath->tunnel_key);
> + if (!ld) {
> + continue;
> + }
> + char *uuid = uuid_to_string(&ld->datapath->header_.uuid);
> + if (!sset_contains(waiting_sb_update, uuid)) {
> + ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);
> + iface->install_seqno = mgr->iface_seqno + 1;
> + new_ifaces = true;
> + }
> + free(uuid);
> + }
> + }
> + }
> +
In if_status_mgr_update_bindings() we do local_binding_set_down() for
all ifaces that are in state OIF_INSTALL_FLOWS. But now that we have
the new OIF_WAITING_SB_COND state (happening before transitioning to
OIF_INSTALL_FLOWS, potentially in different ovn-controller runs) should
we also call local_binding_set_down() for those too?
> if (!sb_readonly) {
> HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_UPDATE_PORT]) {
> struct ovs_iface *iface = node->data;
> diff --git a/controller/if-status.h b/controller/if-status.h
> index d15ca3008..67018d113 100644
> --- a/controller/if-status.h
> +++ b/controller/if-status.h
> @@ -43,6 +43,8 @@ void if_status_mgr_update(struct if_status_mgr *, struct
> local_binding_data *,
> const struct sbrec_chassis *chassis,
> const struct ovsrec_interface_table *iface_table,
> const struct sbrec_port_binding_table *pb_table,
> + const struct hmap *local_datapaths,
> + const struct sset *waiting_sb_update,
> bool ovs_readonly,
> bool sb_readonly);
> void if_status_mgr_run(struct if_status_mgr *mgr, struct local_binding_data
> *,
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index da43051ed..a25892056 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -1511,6 +1511,7 @@ en_runtime_data_clear_tracked_data(void *data_)
> }
>
> static void *
> +
Nit: empty line not needed.
> en_runtime_data_init(struct engine_node *node OVS_UNUSED,
> struct engine_arg *arg OVS_UNUSED)
> {
> @@ -7596,6 +7597,8 @@ main(int argc, char *argv[])
> int ovs_txn_status = 1;
> bool sb_monitor_all = false;
> struct tracked_acl_ids *tracked_acl_ids = NULL;
> + struct sset waiting_sb_update;
> + sset_init(&waiting_sb_update);
Nit: missing sset_destroy.
> while (!exit_args.exiting) {
> ovsrcu_quiesce_end();
>
> @@ -7842,6 +7845,7 @@ main(int argc, char *argv[])
>
> bool recompute_allowed = (ovnsb_idl_txn &&
> !ofctrl_has_backlog());
> +
Nit: unrelated new line.
> engine_run(recompute_allowed);
> tracked_acl_ids = engine_get_data(&en_acl_id);
>
> @@ -7951,6 +7955,9 @@ main(int argc, char *argv[])
>
> sbrec_mirror_table_get(ovnsb_idl_loop.idl),
> br_int,
> &runtime_data->lbinding_data.bindings);
> + if (ovnsb_cond_seqno == ovnsb_expected_cond_seqno) {
> + sset_clear(&waiting_sb_update);
> + }
> /* Updating monitor conditions if runtime data or
> * logical datapath goups changed. */
> if (engine_node_changed(&en_runtime_data)
> @@ -7973,6 +7980,23 @@ main(int argc, char *argv[])
> * a continuous reason for monitor updates.
> */
> daemon_started_recently_countdown();
> }
> +
> + if (!sb_monitor_all && runtime_data) {
> + struct hmap *tracked_dp_bindings =
> + &runtime_data->tracked_dp_bindings;
> + struct tracked_datapath *tdp;
> + HMAP_FOR_EACH_SAFE (tdp,
> + node,
> + tracked_dp_bindings) {
> + char *uuid =
> +
> uuid_to_string(&tdp->dp->header_.uuid);
> + if (tdp->tracked_type ==
> + TRACKED_RESOURCE_NEW) {
> + sset_add(&waiting_sb_update, uuid);
This should be an uuidset instead of a sset.
> + }
> + free(uuid);
It feels odd that we're doing this here. Shouldn't the uuidset be part
of the runtime_data state (struct ed_type_runtime_data), updated in
add_local_datapath() -> tracked_datapath_add(.., TRACKED_RESOURCE_NEW)?
> + }
> + }
> }
> /* If there is no new expected seqno we have finished
> * loading all needed data from southbound. We then
> @@ -8017,6 +8041,10 @@ main(int argc, char *argv[])
> ovs_idl_loop.idl),
> sbrec_port_binding_table_get(
> ovnsb_idl_loop.idl),
> + runtime_data ?
> + &runtime_data->local_datapaths
> + : NULL,
> + &waiting_sb_update,
With my comment above addressed, we'd pass
&runtime_data->waiting_sb_update here instead.
> !ovs_idl_txn,
> !ovnsb_idl_txn);
> stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
> diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
> index c98de9bc4..9dc7555ba 100644
> --- a/tests/ovn-controller.at
> +++ b/tests/ovn-controller.at
> @@ -3944,3 +3944,69 @@ OVN_CLEANUP([hv1], [hv2
> /already has encap ip.*cannot duplicate on/d])
> AT_CLEANUP
> ])
> +
> +OVN_FOR_EACH_NORTHD([
> +AT_SETUP([ovn-installed])
> +ovn_start
> +
> +net_add n1
> +sim_add hv1
> +
> +as hv1
> +ovs-vsctl add-br br-phys
> +ovn_attach n1 br-phys 192.168.0.1
> +ovn-appctl vlog/set dbg
> +ovs-vsctl add-port br-int vif1 -- \
> + set Interface vif1 external-ids:iface-id=lsp1
> +
> +check ovn-nbctl ls-add ls1
> +sleep_controller hv1
> +check ovn-nbctl --wait=sb lsp-add ls1 lsp1 -- \
> + lsp-set-addresses lsp1 "f0:00:00:00:00:01 10.0.0.1"
> +
> +sleep_sb
> +wake_up_controller hv1
> +
> +# Wait for pflow for lsp1
Nit: Comments should be sentences and end with dot.
> +OVS_WAIT_UNTIL([
> + ofport=$(as hv1 ovs-vsctl --bare --columns ofport find Interface
> name=vif1)
> + echo "vif1 port=$ofport"
> + test -n "$ofport" && test 1 -le $(as hv1 ovs-ofctl dump-flows br-int |
> grep -c in_port=$ofport)
> +])
> +
> +# If ovn-installed in ovs, all flows should be installed.
> +# In that case, there should be at least one flow with lsp1 address.
> +OVS_WAIT_UNTIL([
> + ovn_installed=$(as hv1 ovs-vsctl get Interface vif1
> external_ids:ovn-installed)
> + echo $ovn_installed
Should the echo be removed?
> + flow_count=$(as hv1 ovs-ofctl dump-flows br-int | grep -Fc "10.0.0.1")
> + # for the monitor-all=true case the flow gets installed because
> ovn-controller is monitoring all
> + # tables in OVN_SOUTHBOUND.
> + if test -n "$ovn_installed"; then
> + as hv1 ovs-ofctl dump-flows br-int > output
This too, was it for debugging purposes?
> + test $flow_count -ge 1
> + else
> + true
Did you mean "false" here? Otherwise, if ovn-installed is not present,
we break out of the OVS_WAIT_UNTIL() immediately, or am I missing something?
> + fi
> +])
> +
> +wake_up_sb
> +# After the southbound db has woken up and can send the update to the
> +# ovn-controller not monitoring all tables in the southbound db it
> +# should be able to install the interface.
> +OVS_WAIT_UNTIL([
> + ovn_installed=$(as hv1 ovs-vsctl get Interface vif1
> external_ids:ovn-installed)
> + flow_count=$(as hv1 ovs-ofctl dump-flows br-int | grep -Fc "10.0.0.1")
> + echo "installed=$ovn_installed, count=$flow_count"
> + if test -n "$ovn_installed"; then
> + as hv1 ovs-ofctl dump-flows br-int > output
> + test $flow_count -ge 1
> + else
> + false
> + fi
> +])
> +wait_for_ports_up
> +
> +OVN_CLEANUP([hv1])
> +AT_CLEANUP
> +])
Regards,
Dumitru
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev