On 3/25/26 4:31 PM, Dumitru Ceara wrote: > On 3/12/26 8:35 PM, Mark Michelson via dev wrote: >> Hi Xavier, >> > > Hi Xavier, Mark, > >> It took a bit for me to look through all of this, but I believe this >> looks good to me. Thanks for adding the notes about the offline >> discussions, as I had not recalled all the details about those >> discussions between then and now. >> >> Acked-by: Mark Michelson <[email protected]> >> > > I think this change makes sense too. I do have some very small comments > below but I might be able to address them myself and just squash that > into this version of the patch before applying it to main. > > Please let me know what you think. >
Hmm, this actually failed in CI: https://github.com/dceara/ovn/actions/runs/23547684046/job/68553636598 Xavier, would you happen to have some time to look into it by any chance? Thanks, Dumitru >> On Mon, Mar 9, 2026 at 3:36 PM Xavier Simonart <[email protected]> wrote: >>> >>> If a server unexpectedly rebooted, OVS, when restarted, sets BFD >>> UP on bfd-enabled geneve tunnels. >>> However, if it takes time to restart OVN, an HA gw chassis >>> would attract the traffic while being unable to handle it >>> (as no flows), resulting in traffic loss. >>> >>> This is fixed by re-using ovs flow-restore-wait. >>> If set, OVS waits (prevents upcalls, ignores bfd, ...) until reset. >>> Once OVS receives the notification of flow-restore-wait being false, >>> it restarts handling upcalls, bfd... and ignores any new change to >>> flow-restore-wait. >>> >>> Hence, on chassis hosting ha gateways, OVN toggles flow-restore-wait: >>> it set it to false, waits for ack from OVS and then sets it back to true. >>> If server reboots, OVS will see flow-restore-wait being true. >>> >>> OVN also sets external_ids->ovn-managed-flow-restore-wait when setting >>> flow-restore-wait. When set, it tells that OVN once set flow-restore-wait. >>> >>> "ovs-ctl restart" also uses flow-restore-wait: when called, it saves the >>> flows, stops "ovs-vswitchd", sets "flow-restore-wait" to true, restarts >>> "ovs-vswitchd", restores the flows and finally removes "flow-restore-wait". >>> So OVS will wait either for "ovs-ctl restart" to remove "flow-restore-wait" >>> or for OVN to set "flow-restore-wait" to false. >>> >>> Reported-at: https://issues.redhat.com/browse/FDP-3075 >>> Signed-off-by: Xavier Simonart <[email protected]> >>> >>> --- >>> -v2 : - Updated based on Mark's feedback (commit message, comments). >>> - Avoid setting flow-restore-wait for computes. >>> - Add external_ids->ovn-managed-flow-restore-wait. >>> - Updated test: add test for compute update + nits (variable name >>> changes) >>> --- >>> controller/bfd.c | 5 +- >>> controller/bfd.h | 4 +- >>> controller/ovn-controller.8.xml | 11 + >>> controller/ovn-controller.c | 171 +++++++++- >>> tests/multinode-macros.at | 22 ++ >>> tests/multinode.at | 546 ++++++++++++++++++++++---------- >>> 6 files changed, 584 insertions(+), 175 deletions(-) >>> >>> diff --git a/controller/bfd.c b/controller/bfd.c >>> index 3b0c3f6da..56bfa4936 100644 >>> --- a/controller/bfd.c >>> +++ b/controller/bfd.c >>> @@ -117,13 +117,14 @@ bfd_calculate_active_tunnels(const struct >>> ovsrec_bridge *br_int, >>> * >>> * If 'our_chassis' is C5 then this function returns empty bfd set. >>> */ >>> -void >>> +bool >>> bfd_calculate_chassis( >>> const struct sbrec_chassis *our_chassis, >>> const struct sbrec_ha_chassis_group_table *ha_chassis_grp_table, >>> struct sset *bfd_chassis) >>> { >>> const struct sbrec_ha_chassis_group *ha_chassis_grp; >>> + bool chassis_is_ha_gw = false; >>> SBREC_HA_CHASSIS_GROUP_TABLE_FOR_EACH (ha_chassis_grp, >>> ha_chassis_grp_table) { >>> bool is_ha_chassis = false; >>> @@ -143,6 +144,7 @@ bfd_calculate_chassis( >>> sset_add(&grp_chassis, ha_ch->chassis->name); >>> if (our_chassis == ha_ch->chassis) { >>> is_ha_chassis = true; >>> + chassis_is_ha_gw = true; >>> bfd_setup_required = true; >>> } >>> } >>> @@ -178,6 +180,7 @@ bfd_calculate_chassis( >>> } >>> sset_destroy(&grp_chassis); >>> } >>> + return chassis_is_ha_gw; >>> } >>> >>> void >>> diff --git a/controller/bfd.h b/controller/bfd.h >>> index f8fece5a5..3e3384891 100644 >>> --- a/controller/bfd.h >>> +++ b/controller/bfd.h >>> @@ -16,6 +16,8 @@ >>> #ifndef OVN_BFD_H >>> #define OVN_BFD_H 1 >>> >>> +#include <stdbool.h> >>> + >>> struct hmap; >>> struct ovsdb_idl; >>> struct ovsdb_idl_index; >>> @@ -36,7 +38,7 @@ void bfd_run(const struct ovsrec_interface_table *, >>> const struct sbrec_sb_global_table *, >>> const struct ovsrec_open_vswitch_table *); >>> >>> -void bfd_calculate_chassis( >>> +bool bfd_calculate_chassis( >>> const struct sbrec_chassis *, >>> const struct sbrec_ha_chassis_group_table *, >>> struct sset *); >>> diff --git a/controller/ovn-controller.8.xml >>> b/controller/ovn-controller.8.xml >>> index 57e7cf5dd..33281a4d6 100644 >>> --- a/controller/ovn-controller.8.xml >>> +++ b/controller/ovn-controller.8.xml >>> @@ -531,6 +531,17 @@ >>> 65535. >>> </dd> >>> >>> + <dt> >>> + <code>external_ids:ovn-managed-flow-restore-wait</code> in the >>> + <code>Open_vSwitch</code> table >>> + </dt> >>> + <dd> >>> + When set to true, this key indicates that >>> <code>ovn-controller</code> >>> + has set the <code>other_config:flow-restore-wait</code> option. >>> + The key is set when <code>ovn-controller</code> enables >>> + flow-restore-wait and removed when it clears it. >>> + </dd> >>> + >>> <dt> >>> <code>external_ids:ct-zone-*</code> in the <code>Bridge</code> >>> table >>> </dt> >>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c >>> index 5815f1b92..0611b1767 100644 >>> --- a/controller/ovn-controller.c >>> +++ b/controller/ovn-controller.c >>> @@ -211,6 +211,150 @@ static char *get_file_system_id(void) >>> free(filename); >>> return ret; >>> } >>> + >>> +/* Set/unset flow-restore-wait, and inc ovs next_cfg if false >>> + * When set to true, also sets ovn-managed-flow-restore-wait to true to >>> + * indicate ownership */ >>> +static void set_flow_restore_wait(struct ovsdb_idl_txn *ovs_idl_txn, >>> + const struct ovsrec_open_vswitch *cfg, >>> + const struct smap *other_config, >>> + const char *val, bool ovn_managed) > > Nit: "static void" should be on a different line. > >>> +{ >>> + struct smap new_config; >>> + smap_clone(&new_config, other_config); >>> + smap_replace(&new_config, "flow-restore-wait", val); >>> + ovsrec_open_vswitch_set_other_config(cfg, &new_config); >>> + if (!strcmp(val, "true")) { > > I'd prefer we use an actual boolean here as 'val'. > > We'd only have to change: > > smap_replace(&new_config, "flow-restore-wait", val ? "true", "false"); > >>> + ovsrec_open_vswitch_update_external_ids_setkey( >>> + cfg, "ovn-managed-flow-restore-wait", "true"); >>> + } else if (ovn_managed) { >>> + ovsrec_open_vswitch_update_external_ids_delkey( >>> + cfg, "ovn-managed-flow-restore-wait"); >>> + } >>> + ovsdb_idl_txn_increment(ovs_idl_txn, &cfg->header_, >>> + &ovsrec_open_vswitch_col_next_cfg, true); >>> + smap_destroy(&new_config); >>> +} >>> + >>> +static void >>> +manage_flow_restore_wait(struct ovsdb_idl_txn *ovs_idl_txn, >>> + const struct ovsrec_open_vswitch *cfg, >>> + uint64_t ofctrl_cur_cfg, uint64_t ovs_next_cfg, >>> + int ovs_txn_status, bool is_ha_gw) >>> +{ >>> + enum flow_restore_wait_state { >>> + FRW_INIT, /* Initial state */ >>> + FRW_WAIT_TXN_COMPLETE, /* Sent false, waiting txn to complete */ >>> + FRW_TXN_SUCCESS, /* Txn completed. Waiting for OVS Ack. */ >>> + FRW_DONE /* Everything completed */ >>> + }; >>> + >>> + static int64_t frw_next_cfg; >>> + static enum flow_restore_wait_state frw_state; >>> + static bool ofctrl_was_connected = false; >>> + >>> + bool ofctrl_connected = ofctrl_is_connected(); >>> + >>> + if (!ovs_idl_txn || !cfg) { >>> + return; >>> + } >>> + >>> + /* If OVS is stopped/started, make sure flow-restore-wait is toggled */ > > Nit: comments should be sentences and end with a '.'. > >>> + if (ofctrl_connected && !ofctrl_was_connected) { >>> + frw_state = FRW_INIT; >>> + } >>> + ofctrl_was_connected = ofctrl_connected; >>> + >>> + if (!ofctrl_connected) { >>> + return; >>> + } >>> + >>> + bool frw = smap_get_bool(&cfg->other_config, "flow-restore-wait", >>> false); >>> + bool ovn_managed_once = smap_get_bool(&cfg->external_ids, >>> + "ovn-managed-flow-restore-wait", false); > > Nit: indentation. > >>> + >>> + if (frw && !ovn_managed_once) { >>> + /* frw has been set by ovs-ctl. Do not touch. */ >>> + return; >>> + } >>> + >>> + if (!is_ha_gw) { >>> + if (frw) { >>> + /* frw has once been set by OVN. We are now not an HA chassis >>> + * anymore, unset it. */ >>> + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, >>> + "false", ovn_managed_once); >>> + } >>> + /* else we are not an HA chassis and frw is false. Ignore it. */ >>> + return; >>> + } >>> + >>> + switch (frw_state) { >>> + case FRW_INIT: >>> + if (ofctrl_cur_cfg > 0) { >>> + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, >>> + "false", ovn_managed_once); >>> + frw_state = FRW_WAIT_TXN_COMPLETE; >>> + VLOG_INFO("Setting flow-restore-wait=false " >>> + "(cur_cfg=%"PRIu64")", ofctrl_cur_cfg); >>> + } >>> + break; >>> + >>> + case FRW_WAIT_TXN_COMPLETE: >>> + /* if (ovs_idl_txn != NULL), the transaction completed. >>> + * When the transaction completed, it either failed >>> + * (ovs_txn_status == 0) or succeeded (ovs_txn_status != 0) */ >>> + if (ovs_txn_status == 0) { >>> + /* Previous transaction failed. */ >>> + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, >>> + "false", ovn_managed_once); >>> + break; >>> + } >>> + /* txn succeeded, get next_cfg */ >>> + frw_next_cfg = ovs_next_cfg; >>> + frw_state = FRW_TXN_SUCCESS; >>> + /* fall through */ >>> + >>> + case FRW_TXN_SUCCESS: >>> + if (ovs_next_cfg < frw_next_cfg) { >>> + /* DB was reset, next_cfg went backwards */ >>> + VLOG_INFO("OVS DB reset (next_cfg %"PRId64" -> %"PRIu64"), " >>> + "resetting state", >>> + frw_next_cfg, ovs_next_cfg); >>> + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, >>> + "false", ovn_managed_once); >>> + frw_state = FRW_WAIT_TXN_COMPLETE; >>> + break; >>> + } >>> + >>> + if (!frw) { >>> + if (cfg->cur_cfg >= frw_next_cfg) { >>> + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, >>> + "true", ovn_managed_once); >>> + frw_state = FRW_DONE; >>> + VLOG_INFO("Setting flow-restore-wait=true"); >>> + } >>> + } else { >>> + /* The transaction to false succeeded but frw is true. >>> + * So, another task already set it to true */ >>> + frw_state = FRW_DONE; >>> + VLOG_INFO("flow-restore-wait was already true"); >>> + } >>> + break; >>> + case FRW_DONE: >>> + if (!frw) { >>> + /* frw has been removed (e.g. by ovs-ctl restart) or is false >>> + * (e.g. txn failed.) */ > > Nit: extra whitespace at the beginning of the line. > >>> + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, >>> + "false", ovn_managed_once); >>> + frw_state = FRW_WAIT_TXN_COMPLETE; >>> + VLOG_INFO("OVS frw cleared, restarting flow-restore-wait >>> sequence " >>> + "(cur_cfg=%"PRIu64")", ofctrl_cur_cfg); >>> + } >>> + break; >>> + } >>> +} >>> + >>> /* Only set monitor conditions on tables that are available in the >>> * server schema. >>> */ >>> @@ -3381,6 +3525,7 @@ en_mac_cache_cleanup(void *data) >>> >>> struct ed_type_bfd_chassis { >>> struct sset bfd_chassis; >>> + bool is_ha_gw; >>> }; >>> >>> static void * >>> @@ -3409,8 +3554,9 @@ en_bfd_chassis_run(struct engine_node *node, void >>> *data OVS_UNUSED) >>> = chassis_lookup_by_name(sbrec_chassis_by_name, chassis_id); >>> >>> sset_clear(&bfd_chassis->bfd_chassis); >>> - bfd_calculate_chassis(chassis, ha_chassis_grp_table, >>> - &bfd_chassis->bfd_chassis); >>> + bfd_chassis->is_ha_gw = bfd_calculate_chassis(chassis, >>> + ha_chassis_grp_table, >>> + >>> &bfd_chassis->bfd_chassis); >>> return EN_UPDATED; >>> } >>> >>> @@ -7117,6 +7263,7 @@ main(int argc, char *argv[]) >>> struct unixctl_server *unixctl; >>> struct ovn_exit_args exit_args = {0}; >>> struct br_int_remote br_int_remote = {0}; >>> + static uint64_t next_cfg = 0; >>> int retval; >>> >>> /* Read from system-id-override file once on startup. */ >>> @@ -7444,6 +7591,7 @@ main(int argc, char *argv[]) >>> >>> /* Main loop. */ >>> int ovnsb_txn_status = 1; >>> + int ovs_txn_status = 1; >>> bool sb_monitor_all = false; >>> struct tracked_acl_ids *tracked_acl_ids = NULL; >>> while (!exit_args.exiting) { >>> @@ -7545,6 +7693,11 @@ main(int argc, char *argv[]) >>> pinctrl_update_swconn(br_int_remote.target, >>> br_int_remote.probe_interval); >>> >>> + if (cfg && ovs_idl_txn && ovs_txn_status == -1) { >>> + /* txn was in progress and is now completed */ >>> + next_cfg = cfg->next_cfg; >>> + } >>> + >>> /* Enable ACL matching for double tagged traffic. */ >>> if (ovs_idl_txn && cfg) { >>> int vlan_limit = smap_get_int( >>> @@ -7894,6 +8047,13 @@ main(int argc, char *argv[]) >>> stopwatch_start(OFCTRL_SEQNO_RUN_STOPWATCH_NAME, >>> time_msec()); >>> ofctrl_seqno_run(ofctrl_get_cur_cfg()); >>> + if (ovs_idl_txn && bfd_chassis_data) { >>> + manage_flow_restore_wait(ovs_idl_txn, cfg, >>> + ofctrl_get_cur_cfg(), >>> + next_cfg, ovs_txn_status, >>> + >>> bfd_chassis_data->is_ha_gw); >>> + } >>> + >>> stopwatch_stop(OFCTRL_SEQNO_RUN_STOPWATCH_NAME, >>> time_msec()); >>> stopwatch_start(IF_STATUS_MGR_RUN_STOPWATCH_NAME, >>> @@ -7993,7 +8153,7 @@ main(int argc, char *argv[]) >>> OVS_NOT_REACHED(); >>> } >>> >>> - int ovs_txn_status = ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop); >>> + ovs_txn_status = ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop); >>> if (!ovs_txn_status) { >>> /* The transaction failed. */ >>> vif_plug_clear_deleted( >>> @@ -8012,6 +8172,9 @@ main(int argc, char *argv[]) >>> &vif_plug_deleted_iface_ids); >>> vif_plug_finish_changed( >>> &vif_plug_changed_iface_ids); >>> + if (cfg) { >>> + next_cfg = cfg->next_cfg; >>> + } >>> } else if (ovs_txn_status == -1) { >>> /* The commit is still in progress */ >>> } else { >>> @@ -8085,7 +8248,7 @@ loop_done: >>> } >>> >>> ovsdb_idl_loop_commit_and_wait(&ovnsb_idl_loop); >>> - int ovs_txn_status = >>> ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop); >>> + ovs_txn_status = ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop); >>> if (!ovs_txn_status) { >>> /* The transaction failed. */ >>> vif_plug_clear_deleted( >>> diff --git a/tests/multinode-macros.at b/tests/multinode-macros.at >>> index 4a74d5127..646ee2d79 100644 >>> --- a/tests/multinode-macros.at >>> +++ b/tests/multinode-macros.at >>> @@ -41,6 +41,28 @@ m4_define([M_START_TCPDUMP], >>> ] >>> ) >>> >>> +m4_define([_M_START_TCPDUMPS_RECURSIVE], [ > > I think we normally put the _ at the end, i.e., M_START_TCPDUMPS_RECURSIVE_ > >>> + m4_if(m4_eval($# > 3), [1], [dnl >>> + names="$names $3" >>> + echo "Running podman exec $1 tcpdump -l $2 >$3.tcpdump 2>$3.stderr" >>> + podman exec $1 tcpdump -l $2 >$3.tcpdump 2>$3.stderr & >>> + echo "podman exec $1 ps -ef | grep -v grep | grep tcpdump && >>> podman exec $1 killall tcpdump" >> cleanup >>> + _M_START_TCPDUMPS_RECURSIVE(m4_shift(m4_shift(m4_shift($@)))) >>> + ]) >>> + ] >>> +) >>> + >>> +# Start Multiple tcpdump. Useful to speed up when many tcpdump >>> +# must be started as waiting for "listening" takes usually 1 second. >>> +m4_define([M_START_TCPDUMPS], >>> + [ >>> + names="" >>> + _M_START_TCPDUMPS_RECURSIVE($@) >>> + for name in $names; do >>> + OVS_WAIT_UNTIL([grep -q "listening" ${name}.stderr]) >>> + done >>> + ] >>> +) >>> >>> # M_FORMAT_CT([ip-addr]) >>> # >>> diff --git a/tests/multinode.at b/tests/multinode.at >>> index 6b9614126..24d7ca27c 100644 >>> --- a/tests/multinode.at >>> +++ b/tests/multinode.at >>> @@ -2986,42 +2986,42 @@ AT_CLEANUP >>> >>> AT_SETUP([HA: Check for missing garp on leader when BFD goes back up]) >>> # Network topology >>> -# >>> ┌────────────────────────────────────────────────────────────────────────────────────────────────────────┐ >>> -# │ >>> │ >>> -# │ ┌───────────────────┐ ┌───────────────────┐ >>> ┌───────────────────┐ ┌───────────────────┐ │ >>> -# │ │ ovn-chassis-1 │ │ ovn-gw-1 │ │ ovn-gw-2 >>> │ │ ovn-chassis-2 │ │ >>> -# │ └─────────┬─────────┘ └───────────────────┘ >>> └───────────────────┘ └───────────────────┘ │ >>> -# │ ┌─────────┴─────────┐ >>> │ >>> -# │ │ inside1 │ >>> │ >>> -# │ │ 192.168.1.1/24 │ >>> │ >>> -# │ └─────────┬─────────┘ >>> │ >>> -# │ ┌─────────┴─────────┐ >>> │ >>> -# │ │ inside │ >>> │ >>> -# │ └─────────┬─────────┘ >>> │ >>> -# │ ┌─────────┴─────────┐ >>> │ >>> -# │ │ 192.168.1.254 │ >>> │ >>> -# │ │ R1 │ >>> │ >>> -# │ │ 192.168.0.254 │ >>> │ >>> -# │ └─────────┬─────────┘ >>> │ >>> -# │ └------eth1---------------┬--------eth1-----------┐ >>> │ >>> -# │ ┌──────────┴────────┐ >>> ┌─────────┴─────────┐ │ >>> -# │ │ 192.168.1.254 │ │ >>> 192.168.1.254 │ │ >>> -# │ │ R1 │ │ R1 >>> │ │ >>> -# │ │ 192.168.0.254 │ │ >>> 192.168.0.254 │ │ >>> -# │ └─────────┬─────────┘ >>> └─────────┬─────────┘ │ >>> -# │ │ │ >>> ┌───────────────────┐ │ >>> -# │ ┌─────────┴─────────┐ >>> ┌─────────┴─────────┐ │ 192.168.0.1 │ │ >>> -# │ │ outside │ │ >>> outside │ │ ext1 │ │ >>> -# │ └─────────┬─────────┘ >>> └─────────┬─────────┘ └─────────┬─────────┘ │ >>> -# │ ┌─────────┴─────────┐ >>> ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ >>> -# │ │ ln-outside │ │ >>> ln-outside │ │ ln-ext1 │ │ >>> -# │ └─────────┬─────────┘ >>> └─────────┬─────────┘ └─────────┬─────────┘ │ >>> -# │ ┌─────────┴─────────┐ >>> ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ >>> -# │ │ br-ex │ │ br-ex >>> │ │ br-ex │ │ >>> -# │ └─────────┬─────────┘ >>> └─────────┬─────────┘ └─────────┬─────────┘ │ >>> -# │ >>> └---------eth2-----------┴-------eth2-------------┘ │ >>> -# │ >>> │ >>> -# >>> └────────────────────────────────────────────────────────────────────────────────────────────────────────┘ >>> +# >>> ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ >>> +# │ >>> │ >>> +# │ ┌───────────────────┐ ┌───────────────────┐ >>> ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ │ >>> +# │ │ ovn-chassis-1 │ │ ovn-chassis-2 │ │ ovn-gw-1 >>> │ │ ovn-gw-2 │ │ ovn-chassis-3 │ │ >>> +# │ └─────────┬─────────┘ └─────────┬─────────┘ >>> └───────────────────┘ └───────────────────┘ └───────────────────┘ │ >>> +# │ ┌─────────┴─────────┐ ┌─────────┴─────────┐ >>> │ >>> +# │ │ inside1 │ │ inside2 │ >>> │ >>> +# │ │ 192.168.1.1/24 │ │ 192.168.1.2/24 │ >>> │ >>> +# │ └─────────┬─────────┘ └─────────┬─────────┘ >>> │ >>> +# │ ┌─┴────────────────────────┴─┐ >>> │ >>> +# │ │ inside │ >>> │ >>> +# │ └──────────────┬─────────────┘ >>> │ >>> +# │ ┌─────────┴─────────┐ >>> │ >>> +# │ │ 192.168.1.254 │ >>> │ >>> +# │ │ R1 │ >>> │ >>> +# │ │ 192.168.0.254 │ >>> │ >>> +# │ └─────────┬─────────┘ >>> │ >>> +# │ >>> └------eth1---------------------------┬--------eth1-----------┐ >>> │ >>> +# │ >>> ┌──────────┴────────┐ ┌─────────┴─────────┐ │ >>> +# │ │ >>> 192.168.1.254 │ │ 192.168.1.254 │ │ >>> +# │ │ R1 >>> │ │ R1 │ │ >>> +# │ │ >>> 192.168.0.254 │ │ 192.168.0.254 │ │ >>> +# │ >>> └─────────┬─────────┘ └─────────┬─────────┘ │ >>> +# │ │ >>> │ ┌───────────────────┐ │ >>> +# │ >>> ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ 192.168.0.1 │ │ >>> +# │ │ outside >>> │ │ outside │ │ ext1 │ │ >>> +# │ >>> └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ │ >>> +# │ >>> ┌─────────┴─────────┐ ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ >>> +# │ │ ln-outside >>> │ │ ln-outside │ │ ln-ext1 │ │ >>> +# │ >>> └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ │ >>> +# │ >>> ┌─────────┴─────────┐ ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ >>> +# │ │ br-ex >>> │ │ br-ex │ │ br-ex │ │ >>> +# │ >>> └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ │ >>> +# │ >>> └---------eth2-----------┴-------eth2-------------┘ │ >>> +# │ >>> │ >>> +# >>> └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ >>> >>> # The goal of this test is the check that GARP are properly generated by >>> higest priority traffic when >>> # BFD goes down, and back up, and this whether the BFD event is due either >>> to some bfd packet lost >>> @@ -3030,6 +3030,12 @@ AT_SETUP([HA: Check for missing garp on leader when >>> BFD goes back up]) >>> # So gw3 should in this test neither send garp or receive packets. >>> # >>> # Enable vconn so we can check the GARP from a log perspective. >>> +on_exit "podman exec ovn-gw-1 ovn-appctl vlog/set info" >>> +on_exit "podman exec ovn-gw-1 ovn-appctl vlog/enable-rate-limit" >>> +on_exit "podman exec ovn-gw-2 ovn-appctl vlog/set info" >>> +on_exit "podman exec ovn-gw-2 ovn-appctl vlog/enable-rate-limit" >>> +on_exit "podman exec ovn-gw-3 ovn-appctl vlog/set info" >>> +on_exit "podman exec ovn-gw-3 ovn-appctl vlog/enable-rate-limit" >>> m_as ovn-gw-1 ovn-appctl vlog/set vconn:dbg >>> m_as ovn-gw-2 ovn-appctl vlog/set vconn:dbg >>> m_as ovn-gw-3 ovn-appctl vlog/set vconn:dbg >>> @@ -3037,12 +3043,17 @@ m_as ovn-gw-1 ovn-appctl vlog/disable-rate-limit >>> m_as ovn-gw-2 ovn-appctl vlog/disable-rate-limit >>> m_as ovn-gw-3 ovn-appctl vlog/disable-rate-limit >>> >>> +# Decrease revalidation time on ovs switch simulating ToR. >>> +on_exit "OVS_RUNDIR= ovs-vsctl set Open_vSwitch . >>> other_config:max-revalidator=500" > > Isn't it better to just remove the key on_exit, just in case the default > ever changes from 500 to something else? I.e.: > > on_exit "OVS_RUNDIR= ovs-vsctl remove Open_vSwitch . other_config > max-revalidator" > >>> +OVS_RUNDIR= ovs-vsctl set Open_vSwitch . other_config:max-revalidator=100 >>> + >>> check_fake_multinode_setup >>> >>> # Delete the multinode NB and OVS resources before starting the test. >>> cleanup_multinode_resources >>> >>> ip_ch1=$(m_as ovn-chassis-1 ip a show dev eth1 | grep "inet " | awk >>> '{print $2}'| cut -d '/' -f1) >>> +ip_ch2=$(m_as ovn-chassis-2 ip a show dev eth1 | grep "inet " | awk >>> '{print $2}'| cut -d '/' -f1) >>> ip_gw1=$(m_as ovn-gw-1 ip a show dev eth1 | grep "inet " | awk '{print >>> $2}'| cut -d '/' -f1) >>> ip_gw2=$(m_as ovn-gw-2 ip a show dev eth1 | grep "inet " | awk '{print >>> $2}'| cut -d '/' -f1) >>> ip_gw3=$(m_as ovn-gw-3 ip a show dev eth1 | grep "inet " | awk '{print >>> $2}'| cut -d '/' -f1) >>> @@ -3050,25 +3061,35 @@ ip_gw3=$(m_as ovn-gw-3 ip a show dev eth1 | grep >>> "inet " | awk '{print $2}'| cut >>> from_gw1_to_gw2=$(m_as ovn-gw-1 ovs-vsctl --bare --columns=name find >>> interface options:remote_ip=$ip_gw2) >>> from_gw1_to_gw3=$(m_as ovn-gw-1 ovs-vsctl --bare --columns=name find >>> interface options:remote_ip=$ip_gw3) >>> from_gw1_to_ch1=$(m_as ovn-gw-1 ovs-vsctl --bare --columns=name find >>> interface options:remote_ip=$ip_ch1) >>> +from_gw1_to_ch2=$(m_as ovn-gw-1 ovs-vsctl --bare --columns=name find >>> interface options:remote_ip=$ip_ch2) >>> from_gw2_to_gw1=$(m_as ovn-gw-2 ovs-vsctl --bare --columns=name find >>> interface options:remote_ip=$ip_gw1) >>> from_gw2_to_gw3=$(m_as ovn-gw-2 ovs-vsctl --bare --columns=name find >>> interface options:remote_ip=$ip_gw3) >>> from_gw2_to_ch1=$(m_as ovn-gw-2 ovs-vsctl --bare --columns=name find >>> interface options:remote_ip=$ip_ch1) >>> +from_gw2_to_ch2=$(m_as ovn-gw-2 ovs-vsctl --bare --columns=name find >>> interface options:remote_ip=$ip_ch2) >>> from_ch1_to_gw1=$(m_as ovn-chassis-1 ovs-vsctl --bare --columns=name find >>> interface options:remote_ip=$ip_gw1) >>> from_ch1_to_gw2=$(m_as ovn-chassis-1 ovs-vsctl --bare --columns=name find >>> interface options:remote_ip=$ip_gw2) >>> +from_ch2_to_gw1=$(m_as ovn-chassis-2 ovs-vsctl --bare --columns=name find >>> interface options:remote_ip=$ip_gw1) >>> +from_ch2_to_gw2=$(m_as ovn-chassis-2 ovs-vsctl --bare --columns=name find >>> interface options:remote_ip=$ip_gw2) >>> >>> m_as ovn-chassis-1 ip link del hv1-vif1-p >>> -m_as ovn-chassis-2 ip link del ext1-p >>> +m_as ovn-chassis-2 ip link del hv2-vif1-p >>> +m_as ovn-chassis-3 ip link del ext1-p >>> >>> OVS_WAIT_UNTIL([m_as ovn-chassis-1 ip link show | grep -q genev_sys]) >>> OVS_WAIT_UNTIL([m_as ovn-chassis-2 ip link show | grep -q genev_sys]) >>> +OVS_WAIT_UNTIL([m_as ovn-chassis-3 ip link show | grep -q genev_sys]) >>> OVS_WAIT_UNTIL([m_as ovn-gw-1 ip link show | grep -q genev_sys]) >>> OVS_WAIT_UNTIL([m_as ovn-gw-2 ip link show | grep -q genev_sys]) >>> OVS_WAIT_UNTIL([m_as ovn-gw-3 ip link show | grep -q genev_sys]) >>> >>> +# Use "aggressive" bfd parameters >>> +check multinode_nbctl set NB_Global . options:"bfd-min-rx"=500 >>> +check multinode_nbctl set NB_Global . options:"bfd-min-tx"=100 >>> check multinode_nbctl ls-add inside >>> check multinode_nbctl ls-add outside >>> check multinode_nbctl ls-add ext >>> check multinode_nbctl lsp-add inside inside1 -- lsp-set-addresses inside1 >>> "f0:00:c0:a8:01:01 192.168.1.1" >>> +check multinode_nbctl lsp-add inside inside2 -- lsp-set-addresses inside2 >>> "f0:00:c0:a8:01:02 192.168.1.2" >>> check multinode_nbctl lsp-add ext ext1 -- lsp-set-addresses ext1 >>> "00:00:c0:a8:00:01 192.168.0.1" >>> >>> multinode_nbctl create Logical_Router name=R1 >>> @@ -3100,12 +3121,14 @@ m_as ovn-gw-3 ovs-vsctl remove open . external_ids >>> garp-max-timeout-sec >>> >>> m_as ovn-chassis-1 ovs-vsctl set open . >>> external-ids:ovn-bridge-mappings=public:br-ex >>> m_as ovn-chassis-2 ovs-vsctl set open . >>> external-ids:ovn-bridge-mappings=public:br-ex >>> +m_as ovn-chassis-3 ovs-vsctl set open . >>> external-ids:ovn-bridge-mappings=public:br-ex >>> m_as ovn-gw-1 ovs-vsctl set open . >>> external-ids:ovn-bridge-mappings=public:br-ex >>> m_as ovn-gw-2 ovs-vsctl set open . >>> external-ids:ovn-bridge-mappings=public:br-ex >>> m_as ovn-gw-3 ovs-vsctl set open . >>> external-ids:ovn-bridge-mappings=public:br-ex >>> >>> m_as ovn-chassis-1 /data/create_fake_vm.sh inside1 hv1-vif1 >>> f0:00:c0:a8:01:01 1500 192.168.1.1 24 192.168.1.254 2000::1/64 2000::a >>> -m_as ovn-chassis-2 /data/create_fake_vm.sh ext1 ext1 00:00:c0:a8:00:01 >>> 1500 192.168.0.1 24 192.168.0.254 1000::3/64 1000::a >>> +m_as ovn-chassis-2 /data/create_fake_vm.sh inside2 hv2-vif1 >>> f0:00:c0:a8:01:02 1500 192.168.1.2 24 192.168.1.254 2000::2/64 2000::a >>> +m_as ovn-chassis-3 /data/create_fake_vm.sh ext1 ext1 00:00:c0:a8:00:01 >>> 1500 192.168.0.1 24 192.168.0.254 1000::3/64 1000::a >>> >>> # There should be one ha_chassis_group with the name "R1_outside" >>> m_check_row_count HA_Chassis_Group 1 name=R1_outside >>> @@ -3160,53 +3183,67 @@ for chassis in $from_ch1_to_gw1 $from_ch1_to_gw2; do >>> wait_bfd_enabled ovn-chassis-1 $chassis >>> done >>> >>> +# check BFD enablement on tunnel ports from ovn-chassis-2 ########### >>> +for chassis in $from_ch2_to_gw1 $from_ch2_to_gw2; do >>> + echo "checking ovn-chassis-2 -> $chassis" >>> + wait_bfd_enabled ovn-chassis-2 $chassis >>> +done >>> + >>> # Make sure there is no nft table left. Do not use nft directly as might >>> not be installed in container. >>> gw1_pid=$(podman inspect -f '{{.State.Pid}}' ovn-gw-1) >>> nsenter --net=/proc/$gw1_pid/ns/net nft list tables | grep ovn-test && >>> nsenter --net=/proc/$gw1_pid/ns/net nft delete table ip ovn-test >>> -on_exit "nsenter --net=/proc/$gw1_pid/ns/net nft list tables | grep >>> ovn-test && nsenter --net=/proc/$gw1_pid/ns/net nft delete table ip >>> ovn-test" >>> +on_exit "if [[ -d "/proc/$gw1_pid" ]]; then nsenter >>> --net=/proc/$gw1_pid/ns/net nft list tables | grep ovn-test && nsenter >>> --net=/proc/$gw1_pid/ns/net nft delete table ip ovn-test; fi" >>> >>> -for chassis in $from_gw1_to_gw2 $from_gw1_to_gw3 $from_gw1_to_ch1; do >>> +for chassis in $from_gw1_to_gw2 $from_gw1_to_gw3 $from_gw1_to_ch1 >>> $from_gw1_to_ch2; do >>> wait_bfd_up ovn-gw-1 $chassis >>> done >>> -for chassis in $from_gw2_to_gw1 $from_gw2_to_gw3 $from_gw2_to_ch1; do >>> +for chassis in $from_gw2_to_gw1 $from_gw2_to_gw3 $from_gw2_to_ch1 >>> $from_gw2_to_ch2; do >>> wait_bfd_up ovn-gw-2 $chassis >>> done >>> for chassis in $from_ch1_to_gw1 $from_ch1_to_gw2; do >>> wait_bfd_up ovn-chassis-1 $chassis >>> done >>> +for chassis in $from_ch2_to_gw1 $from_ch2_to_gw2; do >>> + wait_bfd_up ovn-chassis-2 $chassis >>> +done >>> >>> m_wait_row_count Port_Binding 1 logical_port=cr-R1_outside >>> chassis=$gw1_chassis >>> check multinode_nbctl --wait=hv sync >>> >>> start_tcpdump() { >>> echo "$(date +%H:%M:%S.%03N) Starting tcpdump" >>> - M_START_TCPDUMP([ovn-chassis-1], [-neei hv1-vif1-p], [ch1]) >>> - M_START_TCPDUMP([ovn-chassis-2], [-neei eth2], [ch2]) >>> - M_START_TCPDUMP([ovn-gw-1], [-neei eth2], [gw1]) >>> - M_START_TCPDUMP([ovn-gw-1], [-neei eth2 -Q out], [gw1_out]) >>> - M_START_TCPDUMP([ovn-gw-2], [-neei eth2], [gw2]) >>> - M_START_TCPDUMP([ovn-gw-2], [-neei eth2 -Q out], [gw2_out]) >>> - M_START_TCPDUMP([ovn-gw-3], [-neei eth2], [gw3]) >>> - M_START_TCPDUMP([ovn-gw-3], [-neei eth2 -Q out], [gw3_out]) >>> + M_START_TCPDUMPS([ovn-chassis-1], [-neei hv1-vif1-p], [ch1], >>> + [ovn-chassis-2], [-neei hv2-vif1-p], [ch2], >>> + [ovn-chassis-3], [-neei eth2], [ch3], >>> + [ovn-gw-1], [-neei eth2], [gw1], >>> + [ovn-gw-1], [-neei eth2 -Q out], [gw1_out], >>> + [ovn-gw-2], [-neei eth2], [gw2], >>> + [ovn-gw-2], [-neei eth2 -Q out], [gw2_out], >>> + [ovn-gw-3], [-neei eth2], [gw3], >>> + [ovn-gw-3], [-neei eth2 -Q out], [gw3_out], >>> + [ovn-gw-1], [-neei eth1], [gw1_eth1], >>> + [ovn-gw-2], [-neei eth1], [gw2_eth1], >>> + [ovn-chassis-1], [-neei eth1], [ch1_eth1], >>> + [ovn-chassis-2], [-neei eth1], [ch2_eth1]) >>> } >>> >>> stop_tcpdump() { >>> echo "$(date +%H:%M:%S.%03N) Stopping tcpdump" >>> - m_kill 'ovn-gw-1 ovn-gw-2 ovn-gw-3 ovn-chassis-1 ovn-chassis-2' tcpdump >>> + m_kill 'ovn-gw-1 ovn-gw-2 ovn-gw-3 ovn-chassis-1 ovn-chassis-2 >>> ovn-chassis-3' tcpdump >>> } >>> >>> -# Send packets from chassis2 (ext1) to chassis1 >>> +# Send packets from ovn-chassis-3 (ext1) to ovn-chassis-1 >>> send_background_packets() { >>> echo "$(date +%H:%M:%S.%03N) Sending packets in Background" >>> start_tcpdump >>> - M_NS_DAEMONIZE([ovn-chassis-2], [ext1], [ping -f -i 0.1 192.168.1.1], >>> [ping.pid]) >>> + M_NS_DAEMONIZE([ovn-chassis-3], [ext1], [ping -f -i 0.1 192.168.1.1], >>> [ping.pid]) >>> } >>> >>> stop_sending_background_packets() { >>> echo "$(date +%H:%M:%S.%03N) Stopping Background process" >>> m_as ovn-chassis-1 ps -ef | grep -v grep | grep -q ping && \ >>> m_as ovn-chassis-1 echo "Stopping ping on ovn-chassis-1" && >>> killall ping >>> - m_as ovn-chassis-2 ps -ef | grep -v grep | grep -q ping && \ >>> + m_as ovn-chassis-3 ps -ef | grep -v grep | grep -q ping && \ >>> m_as ovn-chassis-2 echo "Stopping ping on ovn-chassis-2" && >>> killall ping >>> stop_tcpdump >>> } >>> @@ -3216,8 +3253,8 @@ check_for_new_garps() { >>> expecting_garp=$2 >>> n_new_garps=$(cat ${hv}_out.tcpdump | grep -c "f0:00:c0:a8:00:fe > >>> Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.0.254 >>> tell 192.168.0.254, length 28") >>> >>> - if [ "$expecting_garp" == "true" ]; then >>> - AS_BOX([$(date +%H:%M:%S.%03N) Waiting/checking for garp from $hv >>> - Starting with $n_new_garps]) >>> + if [[ "$expecting_garp" == "true" ]]; then >>> + echo "$(date +%H:%M:%S.%03N) Waiting/checking for garp from $hv - >>> Starting with $n_new_garps" >>> OVS_WAIT_UNTIL([ >>> n_garps=$n_new_garps >>> n_new_garps=$(cat ${hv}_out.tcpdump | grep -c >>> "f0:00:c0:a8:00:fe > Broadcast, ethertype ARP (0x0806), length 42: Request >>> who-has 192.168.0.254 tell 192.168.0.254, length 28") >>> @@ -3225,7 +3262,7 @@ check_for_new_garps() { >>> test "$n_garps" -ne "$n_new_garps" >>> ]) >>> else >>> - AS_BOX([$(date +%H:%M:%S.%03N) Checking no garp from ${hv}]) >>> + echo "$(date +%H:%M:%S.%03N) Checking no garp from ${hv}" >>> # Waiting a few seconds to get a chance to see unexpected garps. >>> sleep 3 >>> n_garps=$(cat ${hv}_out.tcpdump | grep -c "f0:00:c0:a8:00:fe > >>> Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.0.254 >>> tell 192.168.0.254, length 28") >>> @@ -3241,8 +3278,8 @@ check_for_new_echo_pkts() { >>> n_new_echo_req=$(cat ${hv}.tcpdump | grep -c "$mac_src > $mac_dst, >>> ethertype IPv4 (0x0800), length 98: 192.168.0.1 > 192.168.1.1: ICMP echo >>> request") >>> n_new_echo_rep=$(cat ${hv}.tcpdump | grep -c "$mac_dst > $mac_src, >>> ethertype IPv4 (0x0800), length 98: 192.168.1.1 > 192.168.0.1: ICMP echo >>> reply") >>> >>> - if [ "$expecting_pkts" == "true" ]; then >>> - AS_BOX([$(date +%H:%M:%S.%03N) Waiting/checking for echo pkts >>> through ${hv}]) >>> + if [[ "$expecting_pkts" == "true" ]]; then >>> + echo "$(date +%H:%M:%S.%03N) Waiting/checking for echo pkts >>> through ${hv}" >>> echo "Starting with $n_new_echo_req requests and $n_new_echo_rep >>> replies so far on ${hv}." >>> OVS_WAIT_UNTIL([ >>> n_echo_req=$n_new_echo_req >>> @@ -3253,7 +3290,7 @@ check_for_new_echo_pkts() { >>> test "$n_echo_req" -ne "$n_new_echo_req" && test "$n_echo_rep" >>> -ne "$n_new_echo_rep" >>> ]) >>> else >>> - AS_BOX([$(date +%H:%M:%S.%03N) Checking no pkts from ${hv}]) >>> + echo "$(date +%H:%M:%S.%03N) Checking no pkts from ${hv}" >>> # Waiting a few seconds to get a chance to see unexpected pkts. >>> sleep 3 >>> n_echo_req=$(cat ${hv}.tcpdump | grep -c "$mac_src > $mac_dst, >>> ethertype IPv4 (0x0800), length 98: 192.168.0.1 > 192.168.1.1: ICMP echo >>> request") >>> @@ -3271,22 +3308,44 @@ dump_statistics() { >>> ch1_rep=$(grep -c "ICMP echo reply" ch1.tcpdump) >>> ch2_req=$(grep -c "ICMP echo request" ch2.tcpdump) >>> ch2_rep=$(grep -c "ICMP echo reply" ch2.tcpdump) >>> + ch3_req=$(grep -c "ICMP echo request" ch3.tcpdump) >>> + ch3_rep=$(grep -c "ICMP echo reply" ch3.tcpdump) >>> gw1_req=$(grep -c "ICMP echo request" gw1.tcpdump) >>> gw1_rep=$(grep -c "ICMP echo reply" gw1.tcpdump) >>> gw2_req=$(grep -c "ICMP echo request" gw2.tcpdump) >>> gw2_rep=$(grep -c "ICMP echo reply" gw2.tcpdump) >>> gw3_req=$(grep -c "ICMP echo request" gw3.tcpdump) >>> gw3_rep=$(grep -c "ICMP echo reply" gw3.tcpdump) >>> - echo "$n1 claims in gw1, $n2 in gw2 and $n3 on gw3" >>> - echo "ch2_request=$ch2_req gw1_request=$gw1_req gw2_request=$gw2_req >>> gw3_request=$gw3_req ch1_request=$ch1_req ch1_reply=$ch1_rep >>> gw1_reply=$gw1_rep gw2_reply=$gw2_rep gw3_reply=$gw3_rep ch2_reply=$ch2_rep" >>> + echo "$n1 claims in gw1, $n2 in gw2 and $n3 on gw3" >&2 >>> + echo "ch3_req=$ch3_req gw_req=($gw1_req + $gw2_req +$gw3_req) >>> ch1_req=$ch1_req ch1_rep=$ch1_rep gw_rep=($gw1_rep + $gw2_rep + $gw3_rep) >>> ch3_rep=$ch3_rep ch2=($ch2_req+$ch2_rep)" >&2 >>> + echo "$((ch3_req - ch3_rep))" >>> } >>> >>> -check_migration_between_gw1_and_gw2() { >>> - action=$1 >>> - send_background_packets >>> +add_port() { >>> + bridge=$1 >>> + interface=$2 >>> + address=$3 >>> + echo "Adding $bridge $interface $address" >>> + >>> + pid=$(podman inspect -f '{{.State.Pid}}' ovn-gw-1) >>> + ln -sf /proc/$pid/ns/net /var/run/netns/$pid >>> + port=$(OVS_RUNDIR= ovs-vsctl --data=bare --no-heading --columns=name >>> find interface \ >>> + external_ids:container_id=ovn-gw-1 >>> external_ids:container_iface="$interface") >>> + port="${port:0:13}" >>> + ip link add "${port}_l" type veth peer name "${port}_c" >>> + ip link set "${port}_l" up >>> + ip link set "${port}_c" netns $pid >>> + ip netns exec $pid ip link set dev "${port}_c" name "$interface" >>> + ip netns exec $pid ip link set "$interface" up >>> + if [[ -n "$address" ]]; then >>> + ip netns exec $pid ip addr add "$address" dev "$interface" >>> + fi > > I might be wrong but I think nobody cleans up any of these ports created > in the ovn-gw-1 container. Do we need some on_exit() calls here? > >>> +} >>> >>> +prepare() { >>> + send_background_packets >>> # We make sure gw1 is leader since enough time that it generated all >>> its garps. >>> - AS_BOX([$(date +%H:%M:%S.%03N) Waiting all garps sent by gw1]) >>> + echo $(date +%H:%M:%S.%03N) Waiting all garps sent by gw1 >>> n_new_garps=$(cat gw1_out.tcpdump | grep -c "f0:00:c0:a8:00:fe > >>> Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.0.254 >>> tell 192.168.0.254, length 28") >>> OVS_WAIT_UNTIL([ >>> n_garps=$n_new_garps >>> @@ -3302,130 +3361,267 @@ check_migration_between_gw1_and_gw2() { >>> check_for_new_echo_pkts gw2 "00:00:c0:a8:00:01" "f0:00:c0:a8:00:fe" >>> "false" >>> check_for_new_echo_pkts gw3 "00:00:c0:a8:00:01" "f0:00:c0:a8:00:fe" >>> "false" >>> >>> + # All packets should go through gw1, and none through gw2 or gw3. >>> + check_packets "true" "false" "false" "true" >>> flap_count_gw_1=$(m_as ovn-gw-1 ovs-vsctl get interface >>> $from_gw1_to_gw2 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') >>> flap_count_gw_2=$(m_as ovn-gw-2 ovs-vsctl get interface >>> $from_gw2_to_gw1 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') >>> +} >>> >>> - if [ test "$action" == "stop_bfd" ]; then >>> - AS_BOX([$(date +%H:%M:%S.%03N) Blocking bfd on gw1 (from $ip_gw1 >>> to $ip_gw2)]) >>> - nsenter --net=/proc/$gw1_pid/ns/net nft add table ip ovn-test >>> - nsenter --net=/proc/$gw1_pid/ns/net nft 'add chain ip ovn-test >>> INPUT { type filter hook input priority 0; policy accept; }' >>> - # Drop BFD from gw-1 to gw-2: geneve port (6081), inner port 3784 >>> (0xec8), Session state Up, Init, Down. >>> - nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT >>> ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 >>> @th,472,8 == 0xc0 counter drop' >>> - nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT >>> ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 >>> @th,472,8 == 0x80 counter drop' >>> - nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT >>> ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 >>> @th,472,8 == 0x40 counter drop' >>> - >>> - # We do not check that packets go through gw2 as BFD between >>> chassis-2 and gw1 is still up >>> - fi >>> - >>> - if [ test "$action" == "kill_gw2" ]; then >>> - AS_BOX([$(date +%H:%M:%S.%03N) Killing gw2 ovn-controller]) >>> - on_exit 'm_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl >>> status || >>> - m_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl >>> start --system-id=ovn-gw-2' >>> - on_exit 'm_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl >>> status_controller || >>> - m_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl >>> start_controller ${CONTROLLER_SSL_ARGS}' >>> - >>> - m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat >>> /run/ovn/ovn-controller.pid) >>> - m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat >>> /run/openvswitch/ovs-vswitchd.pid) >>> - m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat >>> /run/openvswitch/ovsdb-server.pid) >>> - # Also delete datapath (flows) >>> - m_as ovn-gw-2 ovs-dpctl del-dp system@ovs-system >>> - fi >>> - >>> - if [ test "$action" == "kill_gw1" ]; then >>> - AS_BOX([$(date +%H:%M:%S.%03N) Killing gw1 ovn-controller]) >>> - on_exit 'm_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl >>> status || >>> - m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl >>> start --system-id=ovn-gw-1' >>> - on_exit 'm_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl >>> status_controller || >>> - m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl >>> start_controller ${CONTROLLER_SSL_ARGS}' >>> - >>> - m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat >>> /run/ovn/ovn-controller.pid) >>> - m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat >>> /run/openvswitch/ovs-vswitchd.pid) >>> - m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat >>> /run/openvswitch/ovsdb-server.pid) >>> - # Also delete datapath (flows) >>> - m_as ovn-gw-1 ovs-dpctl del-dp system@ovs-system >>> - fi >>> +check_loss_after_flap() >>> +{ >>> + dead=$1 >>> + max_expected_loss=$2 >>> >>> - if [ test "$action" == "kill_gw2" ]; then >>> - AS_BOX([$(date +%H:%M:%S.%03N) Waiting for flap count between gw1 >>> and gw2 to increase]) >>> + if [[ "$dead" == "gw2" ]]; then >>> + echo "$(date +%H:%M:%S.%03N) Waiting for flap count between gw1 >>> and gw2 to increase" >>> OVS_WAIT_UNTIL([ >>> new_flap_count=$(m_as ovn-gw-1 ovs-vsctl get interfac >>> $from_gw1_to_gw2 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') >>> echo "Comparing $new_flap_count versus $flap_count_gw_1" >>> test "$new_flap_count" -gt "$((flap_count_gw_1))" >>> ]) >>> else >>> - AS_BOX([$(date +%H:%M:%S.%03N) Waiting for flap count between gw2 >>> and gw1 to increase]) >>> + echo "$(date +%H:%M:%S.%03N) Waiting for flap count between gw2 >>> and gw1 to increase])" >>> OVS_WAIT_UNTIL([ >>> new_flap_count=$(m_as ovn-gw-2 ovs-vsctl get interfac >>> $from_gw2_to_gw1 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') >>> echo "Comparing $new_flap_count versus $flap_count_gw_2" >>> test "$new_flap_count" -gt "$((flap_count_gw_2))" >>> ]) >>> - >>> fi >>> - AS_BOX([$(date +%H:%M:%S.%03N) Flapped!]) >>> >>> + echo "$(date +%H:%M:%S.%03N) Flapped!" >>> # Wait a few more second for the fight. >>> + sleep 4 >>> + >>> + echo "$(date +%H:%M:%S.%03N) Statistics after flapping" >>> + lost=$(dump_statistics) >>> + echo "===> $lost packet lost while handling migration" >>> + AT_CHECK([test "$lost" -le "$max_expected_loss"]) >>> +} >>> + >>> +final_check() >>> +{ >>> + action=$1 >>> + lost=$2 >>> + max_expected_loss_after_restoration=$3 >>> + >>> + # Wait a little more to get packets while network is restored >>> sleep 2 >>> - AS_BOX([$(date +%H:%M:%S.%03N) Statistics after flapping]) >>> - dump_statistics >>> - >>> - if [ test "$action" == "stop_bfd" ]; then >>> - # gw1 still alive and gw2 tried to claim => gw1 should restart >>> generating garps. >>> - check_for_new_garps gw1 "true" >>> - check_for_new_garps gw2 "false" >>> - check_for_new_garps gw3 "false" >>> - check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe >>> "true" >>> - check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe >>> "false" >>> - check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe >>> "false" >>> - check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe f0:00:c0:a8:01:01 >>> "true" >>> - AS_BOX([$(date +%H:%M:%S.%03N) Unblocking bfd on gw1]) >>> - nsenter --net=/proc/$gw1_pid/ns/net nft -a list ruleset >>> - nsenter --net=/proc/$gw1_pid/ns/net nft delete table ip ovn-test >>> - fi >>> + echo "$(date +%H:%M:%S.%03N) Statistics after network restored (after >>> $action)" >>> + new_lost=$(dump_statistics) >>> + echo "===> $((new_lost - lost)) packets lost during network >>> restoration" >>> + AT_CHECK([test "$((new_lost - lost))" -le >>> "$max_expected_loss_after_restoration"]) >>> + stop_sending_background_packets >>> +} >>> >>> - if [ test "$action" == "kill_gw2" ]; then >>> - # gw1 still alive, but gw2 did not try to claim => gw1 should not >>> generate new garps. >>> - check_for_new_garps gw1 "false" >>> - check_for_new_garps gw2 "false" >>> - check_for_new_garps gw3 "false" >>> - check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe >>> "true" >>> - check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe >>> "false" >>> - check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe >>> "false" >>> - check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe f0:00:c0:a8:01:01 >>> "true" >>> - AS_BOX([$(date +%H:%M:%S.%03N) Restarting gw2 ovn-vswitchd]) >>> - m_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl start >>> --system-id=ovn-gw-2 >>> - >>> - AS_BOX([$(date +%H:%M:%S.%03N) Restarting gw2 ovn-controller]) >>> - m_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl start_controller >>> ${CONTROLLER_SSL_ARGS} >>> - fi >>> +check_garps() >>> +{ >>> + check_for_new_garps gw1 "$1" >>> + check_for_new_garps gw2 "$2" >>> + check_for_new_garps gw3 "$3" >>> +} >>> >>> - if [ test "$action" == "kill_gw1" ]; then >>> - # gw1 died => gw2 should generate garps. >>> - check_for_new_garps gw1 "false" >>> - check_for_new_garps gw2 "true" >>> - check_for_new_garps gw3 "false" >>> - check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe >>> "false" >>> - check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe >>> "true" >>> - check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe >>> "false" >>> - check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe f0:00:c0:a8:01:01 >>> "true" >>> - AS_BOX([$(date +%H:%M:%S.%03N) Restarting gw1 ovn-vswitchd]) >>> - m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl start >>> --system-id=ovn-gw-1 >>> - >>> - AS_BOX([$(date +%H:%M:%S.%03N) Restarting gw1 ovn-controller]) >>> - m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl start_controller >>> ${CONTROLLER_SSL_ARGS} >>> - fi >>> +check_packets() >>> +{ >>> + check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "$1" >>> + check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "$2" >>> + check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "$3" >>> + check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe f0:00:c0:a8:01:01 "$4" >>> +} >>> + >>> +check_migration_between_gw1_and_gw2_bfd_stop() >>> +{ >>> + AS_BOX([$(date +%H:%M:%S.%03N) Testing migration after bfd_stop]) >>> + max_expected_max_expected_loss1=$1 >>> + max_expected_max_expected_loss2=$2 >>> + prepare >>> + >>> + echo "$(date +%H:%M:%S.%03N) Blocking bfd on gw1 (from $ip_gw1 to >>> $ip_gw2)" >>> + nsenter --net=/proc/$gw1_pid/ns/net nft add table ip ovn-test >>> + nsenter --net=/proc/$gw1_pid/ns/net nft 'add chain ip ovn-test INPUT { >>> type filter hook input priority 0; policy accept; }' >>> + # Drop BFD from gw-1 to gw-2: geneve port (6081), inner port 3784 >>> (0xec8), Session state Up, Init, Down. >>> + nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT ip >>> daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 >>> @th,472,8 == 0xc0 counter drop' >>> + nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT ip >>> daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 >>> @th,472,8 == 0x80 counter drop' >>> + nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT ip >>> daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 >>> @th,472,8 == 0x40 counter drop' >>> + >>> + check_loss_after_flap "gw1" $max_expected_max_expected_loss1 >>> + >>> + # gw1 still alive and gw2 tried to claim => gw1 should restart >>> generating garps. >>> + check_garps "true" "false" "false" >>> + check_packets "true" "false" "false" "true" >>> + >>> + echo "$(date +%H:%M:%S.%03N) Unblocking bfd on gw1" >>> + nsenter --net=/proc/$gw1_pid/ns/net nft -a list ruleset >>> + nsenter --net=/proc/$gw1_pid/ns/net nft delete table ip ovn-test >>> >>> # The network is now restored => packets should go through gw1 and >>> reach chassis-1. >>> - check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "true" >>> - check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" >>> - check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" >>> - check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe f0:00:c0:a8:01:01 "true" >>> - AS_BOX([$(date +%H:%M:%S.%03N) Statistics after network restored]) >>> - dump_statistics >>> - stop_sending_background_packets >>> + check_packets "true" "false" "false" "true" >>> + final_check "bfd_stop" $lost $max_expected_max_expected_loss2 >>> +} >>> + >>> +check_migration_between_gw1_and_gw2_kill_gw2() { >>> + AS_BOX([$(date +%H:%M:%S.%03N) Check migration after killing gw2 >>> ovn-controller & vswitchd]) >>> + max_expected_loss1=$1 >>> + max_expected_loss2=$2 >>> + prepare >>> + >>> + on_exit 'm_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl status || >>> + m_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl start >>> --system-id=ovn-gw-2' >>> + on_exit 'm_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl >>> status_controller || >>> + m_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl start_controller >>> ${CONTROLLER_SSL_ARGS}' >>> + >>> + m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat /run/ovn/ovn-controller.pid) >>> + m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat >>> /run/openvswitch/ovs-vswitchd.pid) >>> + m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat >>> /run/openvswitch/ovsdb-server.pid) >>> + m_as ovn-gw-2 ovs-dpctl del-dp system@ovs-system >>> + >>> + check_loss_after_flap "gw2" $max_expected_loss1 >>> + >>> + # gw1 still alive, but gw2 did not try to claim => gw1 should not >>> generate new garps. >>> + check_garps "false" "false" "false" >>> + check_packets "true" "fals" "false" "true" >>> + >>> + echo "$(date +%H:%M:%S.%03N) Restarting gw2 ovn-vswitchd]" >>> + m_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl start >>> --system-id=ovn-gw-2 >>> + >>> + echo "$(date +%H:%M:%S.%03N) Restarting gw2 ovn-controller" >>> + m_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl start_controller >>> ${CONTROLLER_SSL_ARGS} >>> + >>> + # The network is now restored => packets should go through gw1 and >>> reach chassis-1. >>> + check_packets "true" "false" "false" "true" >>> + final_check "kill_gw2" $lost $max_expected_loss2 >>> +} >>> + >>> +check_migration_between_gw1_and_gw2_update_ovs() { >>> + AS_BOX([$(date +%H:%M:%S.%03N) Check migration after restarting gw1 >>> ovs-vswitchd ("update")]) >>> + max_expected_loss1=$1 >>> + max_expected_loss2=$2 >>> + prepare >>> + >>> + m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl restart >>> --system-id=ovn-gw-1 >>> + >>> + check_loss_after_flap "gw1" $max_expected_loss1 >>> + >>> + # The network is now restored => packets should go through gw1 and >>> reach chassis-1. >>> + check_packets "true" "false" "false" "true" >>> + final_check "ovs_update" $lost $max_expected_loss2 >>> +} >>> + >>> +check_migration_between_gw1_and_gw2_kill_gw1() { >>> + AS_BOX([$(date +%H:%M:%S.%03N) Killing gw1 ovn-controller and >>> ovs-vswitchd]) >>> + max_expected_loss1=$1 >>> + max_expected_loss2=$2 >>> + prepare >>> + >>> + on_exit 'm_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl status || >>> + m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl start >>> --system-id=ovn-gw-1' >>> + on_exit 'm_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl >>> status_controller || >>> + m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl start_controller >>> ${CONTROLLER_SSL_ARGS}' >>> + >>> + m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat /run/ovn/ovn-controller.pid) >>> + m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat >>> /run/openvswitch/ovs-vswitchd.pid) >>> + m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat >>> /run/openvswitch/ovsdb-server.pid) >>> + # Also delete datapath (flows) >>> + m_as ovn-gw-1 ovs-dpctl del-dp system@ovs-system >>> + >>> + check_loss_after_flap "gw1" $max_expected_loss1 >>> + >>> + # gw1 died => gw2 should generate garps. >>> + check_garps "false" "true" "false" >>> + check_packets "false" "true" "false" "true" >>> + echo "$(date +%H:%M:%S.%03N) Restarting gw1 ovn-vswitchd after killing >>> gw1" >>> + m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl start >>> --system-id=ovn-gw-1 >>> + >>> + # Wait some long time before restarting ovn-controller >>> + sleep 10 >>> + >>> + # gw2 should still be handling packets as OVN not restarted on gw1 >>> + check_packets "false" "true" "false" "true" >>> + >>> + echo "$(date +%H:%M:%S.%03N) Restarting gw1 ovn-controller after >>> killing gw1" >>> + m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl start_controller >>> ${CONTROLLER_SSL_ARGS} >>> + >>> + # The network is now restored => packets should go through gw1 and >>> reach chassis-1. >>> + check_packets "true" "false" "false" "true" >>> + final_check "kill_gw1" $lost $max_expected_loss2 >>> +} >>> + >>> +check_migration_between_gw1_and_gw2_reboot_gw1() { >>> + ip_gw1_eth1=$(podman exec ovn-gw-1 ip -brief address show eth1 | awk >>> '{print $3}' | cut -d/ -f1) >>> + cidr=$(podman exec ovn-gw-1 ip -brief address show eth1 | awk '{print >>> $3}' | cut -d/ -f2) >>> + AS_BOX([$(date +%H:%M:%S.%03N) Rebooting ovn-gw-1 with >>> $ip_gw1_eth1/$cidr]) >>> + max_expected_loss1=$1 >>> + max_expected_loss2=$2 >>> + prepare >>> + >>> + podman stop -t 0 ovn-gw-1 >>> + (exec 3>&- 4>&- 5>&- 6>&-; podman start ovn-gw-1) >>> + >>> + add_port br-ovn eth1 $ip_gw1_eth1/$cidr >>> + add_port br-ovn-ext eth2 >>> + M_START_TCPDUMPS([ovn-gw-1], [-neei eth2], [gw1], [ovn-gw-1], [-neei >>> eth1], [gw1_eth1], [ovn-gw-1], [-neei eth2 -Q out], [gw1_out]) >>> + check_loss_after_flap "gw1" $max_expected_loss1 >>> + >>> + # gw1 died => gw2 should generate garps. >>> + check_garps "false" "true" "false" >>> + check_packets "false" "true" "false" "true" >>> + >>> + echo "$(date +%H:%M:%S.%03N) Restarting gw1 ovn-vswitchd after >>> rebooting gw1" >>> + m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl start >>> --system-id=ovn-gw-1 >>> + >>> + # Wait some long time before restarting ovn-controller >>> + sleep 10 >>> + >>> + # gw2 should still be handling packets as OVN not restarted on gw1 >>> + check_packets "false" "true" "false" "true" >>> + >>> + echo "$(date +%H:%M:%S.%03N) Restarting gw1 ovn-controller after >>> rebooting gw1" >>> + m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl start_controller >>> ${CONTROLLER_SSL_ARGS} >>> + >>> + # The network is now restored => packets should go through gw1 and >>> reach chassis-1. >>> + check_packets "true" "false" "false" "true" >>> + final_check "kill_gw1" $lost $max_expected_loss2 >>> +} >>> + >>> +check_compute_restart() { >>> + AS_BOX([$(date +%H:%M:%S.%03N) Killing ovn-chassis-1 ovn-controller >>> and ovs-vswitchd]) >>> + max_expected_loss=$1 >>> + prepare >>> + >>> + # Kill ovn-chassis-1 >>> + echo "$(date +%H:%M:%S.%03N) Killing chassis-1" >>> + on_exit 'm_as ovn-chassis-1 /usr/share/openvswitch/scripts/ovs-ctl >>> status || >>> + m_as ovn-chassis-1 /usr/share/openvswitch/scripts/ovs-ctl >>> start --system-id=ovn-chassis-1' >>> + on_exit 'm_as ovn-chassis-1 /usr/share/ovn/scripts/ovn-ctl >>> status_controller || >>> + m_as ovn-chassis-1 /usr/share/ovn/scripts/ovn-ctl >>> start_controller ${CONTROLLER_SSL_ARGS}' >>> + >>> + m_as ovn-chassis-1 kill -9 $(m_as ovn-chassis-1 cat >>> /run/ovn/ovn-controller.pid) >>> + m_as ovn-chassis-1 kill -9 $(m_as ovn-chassis-1 cat >>> /run/openvswitch/ovs-vswitchd.pid) >>> + m_as ovn-chassis-1 kill -9 $(m_as ovn-chassis-1 cat >>> /run/openvswitch/ovsdb-server.pid) >>> + >>> + # Now restart chassis-1 >>> + flap_count=$(m_as ovn-gw-2 ovs-vsctl get interfac $from_gw2_to_ch1 >>> bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') >>> + echo "$(date +%H:%M:%S.%03N) Restarting ovn-chassis-1 ovn-vswitchd." >>> + m_as ovn-chassis-1 /usr/share/openvswitch/scripts/ovs-ctl start >>> --system-id=ovn-chassis-1 >>> + >>> + echo "$(date +%H:%M:%S.%03N) Waiting for flap count between gw-1 and >>> chassis-1 to increase" >>> + OVS_WAIT_UNTIL([ >>> + new_flap_count=$(m_as ovn-gw-1 ovs-vsctl get interfac >>> $from_gw1_to_ch1 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') >>> + echo "Comparing $new_flap_count versus $flap_count" >>> + test "$new_flap_count" -gt "$((flap_count))" >>> + ]) >>> + >>> + wait_bfd_up ovn-chassis-1 $from_ch1_to_gw1 >>> + >>> + echo "$(date +%H:%M:%S.%03N) Restarting ovn-chassis-1 ovn-controller." >>> + m_as ovn-chassis-1 /usr/share/ovn/scripts/ovn-ctl start_controller >>> ${CONTROLLER_SSL_ARGS} >>> + >>> + # Wait a long time to catch losses >>> + sleep 5 >>> + final_check "compute" 0 $max_expected_loss >>> } >>> >>> start_tcpdump >>> -AS_BOX([$(date +%H:%M:%S.%03N) Sending packet from hv1-vif1(inside1) to >>> ext1]) >>> +echo "$(date +%H:%M:%S.%03N) Sending packet from hv1-vif1(inside1) to ext1" >>> M_NS_CHECK_EXEC([ovn-chassis-1], [hv1-vif1], [ping -c3 -q -i 0.1 >>> 192.168.0.1 | FORMAT_PING], >>> [0], [dnl >>> 3 packets transmitted, 3 received, 0% packet loss, time 0ms >>> @@ -3433,7 +3629,7 @@ M_NS_CHECK_EXEC([ovn-chassis-1], [hv1-vif1], [ping >>> -c3 -q -i 0.1 192.168.0.1 | F >>> stop_tcpdump >>> >>> # It should have gone through gw1 and not gw2 >>> -AS_BOX([$(date +%H:%M:%S.%03N) Checking it went through gw1 and not gw2]) >>> +echo "$(date +%H:%M:%S.%03N) Checking it went through gw1 and not gw2" >>> AT_CHECK([cat gw2.tcpdump | grep "ICMP echo"], [1], [dnl >>> ]) >>> >>> @@ -3446,17 +3642,29 @@ f0:00:c0:a8:00:fe > 00:00:c0:a8:00:01, ethertype >>> IPv4 (0x0800), length 98: 192.1 >>> 00:00:c0:a8:00:01 > f0:00:c0:a8:00:fe, ethertype IPv4 (0x0800), length 98: >>> 192.168.0.1 > 192.168.1.1: ICMP echo reply, >>> ]) >>> >>> -# We stop bfd between gw1 & gw2, but keep gw1 & gw2 running. >>> -check_migration_between_gw1_and_gw2 "stop_bfd" >>> +# We stop bfd between gw1 & gw2, but keep gw1 & gw2 running. We should not >>> lose packets. >>> +check_migration_between_gw1_and_gw2_bfd_stop 1 1 >>> >>> # We simulate death of gw2. It should not have any effect. >>> -check_migration_between_gw1_and_gw2 "kill_gw2" >>> +check_migration_between_gw1_and_gw2_kill_gw2 1 1 >>> + >>> +# We simulate ovs update on gw1. When ovs is stopped, flows should still >>> be handled by Kernel datapath. >>> +# When OVS is restarted, BFD should go down immediately, and gw2 will >>> start handling packets. >>> +# There will be packet losses as gw2 will usually see BFD from gw1 up (and >>> hence relase port) before gw1 sees >>> +# BFD up (and claim port). >>> +check_migration_between_gw1_and_gw2_update_ovs 20 1 >>> + >>> +# We simulate restart of both OVS & OVN gw1. gw2 should take over. >>> +check_migration_between_gw1_and_gw2_kill_gw1 40 20 >>> >>> # We simulate death of gw1. gw2 should take over. >>> -check_migration_between_gw1_and_gw2 "kill_gw1" >>> +check_migration_between_gw1_and_gw2_reboot_gw1 40 20 >>> + >>> +# We simulate restart of ovn-chassis-1. We expct for ~2 sec losses as we >>> wait for bfd up before starting >>> +# ovn-controller. >>> +check_compute_restart 30 >>> >>> AT_CLEANUP >>> -]) >>> >>> AT_SETUP([ovn multinode bgp L2 EVPN]) >>> check_fake_multinode_setup >>> -- >>> 2.47.1 >>> >> > > Regards, > Dumitru > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
