On 3/27/26 2:40 PM, Dumitru Ceara wrote: > On 3/27/26 2:31 PM, Xavier Simonart wrote: >> Hi Dumitru >> > > Hi Xavier, > >> Thanks again for looking at this. >> >> On Fri, Mar 27, 2026 at 12:38 PM Dumitru Ceara <[email protected]> wrote: >> >>> On 3/26/26 5:16 PM, Xavier Simonart wrote: >>>> If a server unexpectedly rebooted, OVS, when restarted, sets BFD >>>> UP on bfd-enabled geneve tunnels. >>>> However, if it takes time to restart OVN, an HA gw chassis >>>> would attract the traffic while being unable to handle it >>>> (as no flows), resulting in traffic loss. >>>> >>>> This is fixed by re-using ovs flow-restore-wait. >>>> If set, OVS waits (prevents upcalls, ignores bfd, ...) until reset. >>>> Once OVS receives the notification of flow-restore-wait being false, >>>> it restarts handling upcalls, bfd... and ignores any new change to >>>> flow-restore-wait. >>>> >>>> Hence, on chassis hosting ha gateways, OVN toggles flow-restore-wait: >>>> it set it to false, waits for ack from OVS and then sets it back to true. >>>> If server reboots, OVS will see flow-restore-wait being true. >>>> >>>> OVN also sets external_ids->ovn-managed-flow-restore-wait when setting >>>> flow-restore-wait. When set, it tells that OVN once set >>> flow-restore-wait. >>>> >>>> "ovs-ctl restart" also uses flow-restore-wait: when called, it saves the >>>> flows, stops "ovs-vswitchd", sets "flow-restore-wait" to true, restarts >>>> "ovs-vswitchd", restores the flows and finally removes >>> "flow-restore-wait". >>>> So OVS will wait either for "ovs-ctl restart" to remove >>> "flow-restore-wait" >>>> or for OVN to set "flow-restore-wait" to false. >>>> >>>> Reported-at: https://issues.redhat.com/browse/FDP-3075 >>>> Signed-off-by: Xavier Simonart <[email protected]> >>>> >>>> --- >>>> -v2 : - Updated based on Mark's feedback (commit message, comments). >>>> - Avoid setting flow-restore-wait for computes. >>>> - Add external_ids->ovn-managed-flow-restore-wait. >>>> - Updated test: add test for compute update + nits (variable name >>> changes) >>>> -v3 : - Fix test failing on CI (using wrong host socket) >>>> - Address nits from Dumitru + few more nits (related to comments). >>>> - Increased tolerated loss (and add comment) to avoid flakiness. >>>> - Rebased. >>>> --- >>> >>> Hi Xavier, >>> >>> Thanks for the v3! >>> > > [...] > >>>> + >>>> +static void >>>> +manage_flow_restore_wait(struct ovsdb_idl_txn *ovs_idl_txn, >>>> + const struct ovsrec_open_vswitch *cfg, >>>> + uint64_t ofctrl_cur_cfg, uint64_t ovs_next_cfg, >>>> + int ovs_txn_status, bool is_ha_gw) >>>> +{ >>>> + enum flow_restore_wait_state { >>>> + FRW_INIT, /* Initial state */ >>>> + FRW_WAIT_TXN_COMPLETE, /* Sent false, waiting txn to complete */ >>>> + FRW_TXN_SUCCESS, /* Txn completed. Waiting for OVS Ack. */ >>>> + FRW_DONE /* Everything completed */ >>>> + }; >>>> + >>>> + static int64_t frw_next_cfg; >>>> + static enum flow_restore_wait_state frw_state; >>>> + static bool ofctrl_was_connected = false; >>>> + >>>> + bool ofctrl_connected = ofctrl_is_connected(); >>>> + >>>> + if (!ovs_idl_txn || !cfg) { >>>> + return; >>>> + } >>>> + >>>> + /* If OVS is stopped/started, make sure flow-restore-wait is >>> toggled. */ >>>> + if (ofctrl_connected && !ofctrl_was_connected) { >>>> + frw_state = FRW_INIT; >>>> + } >>>> + ofctrl_was_connected = ofctrl_connected; >>>> + >>>> + if (!ofctrl_connected) { >>>> + return; >>>> + } >>>> + >>>> + bool frw = smap_get_bool(&cfg->other_config, "flow-restore-wait", >>> false); >>>> + bool ovn_managed_once = smap_get_bool(&cfg->external_ids, >>>> + >>> "ovn-managed-flow-restore-wait", >>>> + false); >>>> + >>>> + if (frw && !ovn_managed_once) { >>>> + /* frw has been set by ovs-ctl. Do not touch. */ >>>> + return; >>>> + } >>>> + >>>> + if (!is_ha_gw) { >>>> + if (frw) { >>>> + /* frw has once been set by OVN. We are now not an HA >>> chassis >>>> + * anymore, unset it. */ >>>> + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, >>>> + false, ovn_managed_once); >>>> + } >>>> + /* else we are not an HA chassis and frw is false. Ignore it. */ >>>> + return; >>>> + } >>>> + >>>> + switch (frw_state) { >>>> + case FRW_INIT: >>>> + if (ofctrl_cur_cfg > 0) { >>>> + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, >>>> + false, ovn_managed_once); >>>> + frw_state = FRW_WAIT_TXN_COMPLETE; >>>> + VLOG_INFO("Setting flow-restore-wait=false " >>>> + "(cur_cfg=%"PRIu64")", ofctrl_cur_cfg); >>>> + } >>>> + break; >>>> + >>>> + case FRW_WAIT_TXN_COMPLETE: >>>> + /* if (ovs_idl_txn != NULL), the transaction completed. >>>> + * When the transaction completed, it either failed >>>> + * (ovs_txn_status == 0) or succeeded (ovs_txn_status != 0). */ >>>> + if (ovs_txn_status == 0) { >>>> + /* Previous transaction failed. */ >>>> + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, >>>> + false, ovn_managed_once); >>>> + break; >>>> + } >>>> + /* txn succeeded, get next_cfg */ >>>> + frw_next_cfg = ovs_next_cfg; >>>> + frw_state = FRW_TXN_SUCCESS; >>>> + /* fall through */ >>> >>> I also asked a helpful assistant :) (gemini-3) to review this code and >>> it complained here with: >>> >>> "I think there is a potential issue here when `ovs_txn_status` is `-1`. >>> In `main()`, `manage_flow_restore_wait` is called only when >>> `ovs_idl_txn` is NOT NULL. >>> >>> If `ovs_txn_status` was `-1` in the previous iteration, it means we were >>> waiting for a transaction. >>> >>> If `ovs_idl_txn` is now NOT NULL, it means `ovsdb_idl_loop_run()` >>> processed the completion of that transaction. >>> >>> However, `ovs_txn_status` still holds `-1` because it's only updated at >>> the end of the loop by `ovsdb_idl_loop_commit_and_wait()`. >>> >>> The current logic treats `-1` as success (since it's not `0`). If the >>> transaction actually failed, we might move to `FRW_TXN_SUCCESS` >>> prematurely. >>> >>> Should we wait for `ovs_txn_status` to be something other than `-1`? Or >>> can we reliably determine success/failure from the IDL state?" >>> " >>> >>> This does sound legit to me, ovs_txn_status may still be -1 because when >>> we tried to commit frw=false we couldn't immediately asses whether the >>> transaction would succeed or not (in most cases that's what happens) but >>> then the transaction failed for some reason on the server side. >>> >>> At this point, in the new loop, "ovs_idl_txn" is not NULL because we can >>> try a new transaction (nothing is in flight anymore) but we assume the >>> previous one succeeded. >>> >>> Then we fall through.. >>> >>>> + >>>> + case FRW_TXN_SUCCESS: >>>> + if (ovs_next_cfg < frw_next_cfg) { >>>> + /* DB was reset, next_cfg went backwards. */ >>>> + VLOG_INFO("OVS DB reset (next_cfg %"PRId64" -> %"PRIu64"), " >>>> + "resetting state", >>>> + frw_next_cfg, ovs_next_cfg); >>>> + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, >>>> + false, ovn_managed_once); >>>> + frw_state = FRW_WAIT_TXN_COMPLETE; >>>> + break; >>>> + } >>>> + >>> >>> frw is still "true" because we failed to set it to "false" (the >>> transaction failed server side). >>> >>>> + if (!frw) { >>>> + if (cfg->cur_cfg >= frw_next_cfg) { >>>> + set_flow_restore_wait(ovs_idl_txn, cfg, >>> &cfg->other_config, >>>> + true, ovn_managed_once); >>>> + frw_state = FRW_DONE; >>>> + VLOG_INFO("Setting flow-restore-wait=true"); >>>> + } >>>> + } else { >>> >>> So we enter here where we assume "another task already set it to true". >>> >>> But it was actually just never changed. >>> >>> Is this analysis correct? Do we need a way to get the ovs_txn_status in >>> the beginning of the current loop iteration? >>> >> I do not think so because of how ovsdb_idl_loop_runs. >> Loop 1 >> ---------- >> - We try to set frw to false >> - We commit (at the end of the loop, ovsdb_idl_loop_commit_and_wait) >> - We have txn_status = -1 (in_progress). >> >> Loop 2 >> ---------- >> - The txn fails in ovsdb_idl_loop_run. >> - Looking at ovsdb_idl_loop_run: >> - We call ovsdb_idl_run (which gets the status) >> - The txn failed, so commiting_txn != NULL. >> - We do not ovsdb_idl_try_commit_loop_txn (which would set committing_txn >> to NULL) as txn failed. >> - So loop->open_txn is NULL in this loop >> - We do not see the txn as completed in this loop. >> >> End of the loop 2 (ovsdb_idl_loop_commit_and_wait) >> - txn_status is set to 0. >> - loop->committing_txn is set to NULL >> >> Loop 3 >> ---------- >> - We have a txn >> - We still have status == 0, so in FRW_WAIT_TXN_COMPLETE we handle the >> failure. >> >> So, it looks OK to me >> > > I see, thanks for the detailed walk through! > > Acked-by: Dumitru Ceara <[email protected]> > > I'll wait with merging this until next week in case Mark wants to have a > look at this new version too. >
I went ahead and applied this patch to the main branch. Thanks, Dumitru > Regards, > Dumitru > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
