On 3/27/26 2:31 PM, Xavier Simonart wrote: > Hi Dumitru > Hi Xavier,
> Thanks again for looking at this. > > On Fri, Mar 27, 2026 at 12:38 PM Dumitru Ceara <[email protected]> wrote: > >> On 3/26/26 5:16 PM, Xavier Simonart wrote: >>> If a server unexpectedly rebooted, OVS, when restarted, sets BFD >>> UP on bfd-enabled geneve tunnels. >>> However, if it takes time to restart OVN, an HA gw chassis >>> would attract the traffic while being unable to handle it >>> (as no flows), resulting in traffic loss. >>> >>> This is fixed by re-using ovs flow-restore-wait. >>> If set, OVS waits (prevents upcalls, ignores bfd, ...) until reset. >>> Once OVS receives the notification of flow-restore-wait being false, >>> it restarts handling upcalls, bfd... and ignores any new change to >>> flow-restore-wait. >>> >>> Hence, on chassis hosting ha gateways, OVN toggles flow-restore-wait: >>> it set it to false, waits for ack from OVS and then sets it back to true. >>> If server reboots, OVS will see flow-restore-wait being true. >>> >>> OVN also sets external_ids->ovn-managed-flow-restore-wait when setting >>> flow-restore-wait. When set, it tells that OVN once set >> flow-restore-wait. >>> >>> "ovs-ctl restart" also uses flow-restore-wait: when called, it saves the >>> flows, stops "ovs-vswitchd", sets "flow-restore-wait" to true, restarts >>> "ovs-vswitchd", restores the flows and finally removes >> "flow-restore-wait". >>> So OVS will wait either for "ovs-ctl restart" to remove >> "flow-restore-wait" >>> or for OVN to set "flow-restore-wait" to false. >>> >>> Reported-at: https://issues.redhat.com/browse/FDP-3075 >>> Signed-off-by: Xavier Simonart <[email protected]> >>> >>> --- >>> -v2 : - Updated based on Mark's feedback (commit message, comments). >>> - Avoid setting flow-restore-wait for computes. >>> - Add external_ids->ovn-managed-flow-restore-wait. >>> - Updated test: add test for compute update + nits (variable name >> changes) >>> -v3 : - Fix test failing on CI (using wrong host socket) >>> - Address nits from Dumitru + few more nits (related to comments). >>> - Increased tolerated loss (and add comment) to avoid flakiness. >>> - Rebased. >>> --- >> >> Hi Xavier, >> >> Thanks for the v3! >> [...] >>> + >>> +static void >>> +manage_flow_restore_wait(struct ovsdb_idl_txn *ovs_idl_txn, >>> + const struct ovsrec_open_vswitch *cfg, >>> + uint64_t ofctrl_cur_cfg, uint64_t ovs_next_cfg, >>> + int ovs_txn_status, bool is_ha_gw) >>> +{ >>> + enum flow_restore_wait_state { >>> + FRW_INIT, /* Initial state */ >>> + FRW_WAIT_TXN_COMPLETE, /* Sent false, waiting txn to complete */ >>> + FRW_TXN_SUCCESS, /* Txn completed. Waiting for OVS Ack. */ >>> + FRW_DONE /* Everything completed */ >>> + }; >>> + >>> + static int64_t frw_next_cfg; >>> + static enum flow_restore_wait_state frw_state; >>> + static bool ofctrl_was_connected = false; >>> + >>> + bool ofctrl_connected = ofctrl_is_connected(); >>> + >>> + if (!ovs_idl_txn || !cfg) { >>> + return; >>> + } >>> + >>> + /* If OVS is stopped/started, make sure flow-restore-wait is >> toggled. */ >>> + if (ofctrl_connected && !ofctrl_was_connected) { >>> + frw_state = FRW_INIT; >>> + } >>> + ofctrl_was_connected = ofctrl_connected; >>> + >>> + if (!ofctrl_connected) { >>> + return; >>> + } >>> + >>> + bool frw = smap_get_bool(&cfg->other_config, "flow-restore-wait", >> false); >>> + bool ovn_managed_once = smap_get_bool(&cfg->external_ids, >>> + >> "ovn-managed-flow-restore-wait", >>> + false); >>> + >>> + if (frw && !ovn_managed_once) { >>> + /* frw has been set by ovs-ctl. Do not touch. */ >>> + return; >>> + } >>> + >>> + if (!is_ha_gw) { >>> + if (frw) { >>> + /* frw has once been set by OVN. We are now not an HA >> chassis >>> + * anymore, unset it. */ >>> + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, >>> + false, ovn_managed_once); >>> + } >>> + /* else we are not an HA chassis and frw is false. Ignore it. */ >>> + return; >>> + } >>> + >>> + switch (frw_state) { >>> + case FRW_INIT: >>> + if (ofctrl_cur_cfg > 0) { >>> + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, >>> + false, ovn_managed_once); >>> + frw_state = FRW_WAIT_TXN_COMPLETE; >>> + VLOG_INFO("Setting flow-restore-wait=false " >>> + "(cur_cfg=%"PRIu64")", ofctrl_cur_cfg); >>> + } >>> + break; >>> + >>> + case FRW_WAIT_TXN_COMPLETE: >>> + /* if (ovs_idl_txn != NULL), the transaction completed. >>> + * When the transaction completed, it either failed >>> + * (ovs_txn_status == 0) or succeeded (ovs_txn_status != 0). */ >>> + if (ovs_txn_status == 0) { >>> + /* Previous transaction failed. */ >>> + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, >>> + false, ovn_managed_once); >>> + break; >>> + } >>> + /* txn succeeded, get next_cfg */ >>> + frw_next_cfg = ovs_next_cfg; >>> + frw_state = FRW_TXN_SUCCESS; >>> + /* fall through */ >> >> I also asked a helpful assistant :) (gemini-3) to review this code and >> it complained here with: >> >> "I think there is a potential issue here when `ovs_txn_status` is `-1`. >> In `main()`, `manage_flow_restore_wait` is called only when >> `ovs_idl_txn` is NOT NULL. >> >> If `ovs_txn_status` was `-1` in the previous iteration, it means we were >> waiting for a transaction. >> >> If `ovs_idl_txn` is now NOT NULL, it means `ovsdb_idl_loop_run()` >> processed the completion of that transaction. >> >> However, `ovs_txn_status` still holds `-1` because it's only updated at >> the end of the loop by `ovsdb_idl_loop_commit_and_wait()`. >> >> The current logic treats `-1` as success (since it's not `0`). If the >> transaction actually failed, we might move to `FRW_TXN_SUCCESS` >> prematurely. >> >> Should we wait for `ovs_txn_status` to be something other than `-1`? Or >> can we reliably determine success/failure from the IDL state?" >> " >> >> This does sound legit to me, ovs_txn_status may still be -1 because when >> we tried to commit frw=false we couldn't immediately asses whether the >> transaction would succeed or not (in most cases that's what happens) but >> then the transaction failed for some reason on the server side. >> >> At this point, in the new loop, "ovs_idl_txn" is not NULL because we can >> try a new transaction (nothing is in flight anymore) but we assume the >> previous one succeeded. >> >> Then we fall through.. >> >>> + >>> + case FRW_TXN_SUCCESS: >>> + if (ovs_next_cfg < frw_next_cfg) { >>> + /* DB was reset, next_cfg went backwards. */ >>> + VLOG_INFO("OVS DB reset (next_cfg %"PRId64" -> %"PRIu64"), " >>> + "resetting state", >>> + frw_next_cfg, ovs_next_cfg); >>> + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, >>> + false, ovn_managed_once); >>> + frw_state = FRW_WAIT_TXN_COMPLETE; >>> + break; >>> + } >>> + >> >> frw is still "true" because we failed to set it to "false" (the >> transaction failed server side). >> >>> + if (!frw) { >>> + if (cfg->cur_cfg >= frw_next_cfg) { >>> + set_flow_restore_wait(ovs_idl_txn, cfg, >> &cfg->other_config, >>> + true, ovn_managed_once); >>> + frw_state = FRW_DONE; >>> + VLOG_INFO("Setting flow-restore-wait=true"); >>> + } >>> + } else { >> >> So we enter here where we assume "another task already set it to true". >> >> But it was actually just never changed. >> >> Is this analysis correct? Do we need a way to get the ovs_txn_status in >> the beginning of the current loop iteration? >> > I do not think so because of how ovsdb_idl_loop_runs. > Loop 1 > ---------- > - We try to set frw to false > - We commit (at the end of the loop, ovsdb_idl_loop_commit_and_wait) > - We have txn_status = -1 (in_progress). > > Loop 2 > ---------- > - The txn fails in ovsdb_idl_loop_run. > - Looking at ovsdb_idl_loop_run: > - We call ovsdb_idl_run (which gets the status) > - The txn failed, so commiting_txn != NULL. > - We do not ovsdb_idl_try_commit_loop_txn (which would set committing_txn > to NULL) as txn failed. > - So loop->open_txn is NULL in this loop > - We do not see the txn as completed in this loop. > > End of the loop 2 (ovsdb_idl_loop_commit_and_wait) > - txn_status is set to 0. > - loop->committing_txn is set to NULL > > Loop 3 > ---------- > - We have a txn > - We still have status == 0, so in FRW_WAIT_TXN_COMPLETE we handle the > failure. > > So, it looks OK to me > I see, thanks for the detailed walk through! Acked-by: Dumitru Ceara <[email protected]> I'll wait with merging this until next week in case Mark wants to have a look at this new version too. Regards, Dumitru _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
