On 3/27/26 2:31 PM, Xavier Simonart wrote:
> Hi Dumitru
> 

Hi Xavier,

> Thanks again for looking at this.
> 
> On Fri, Mar 27, 2026 at 12:38 PM Dumitru Ceara <[email protected]> wrote:
> 
>> On 3/26/26 5:16 PM, Xavier Simonart wrote:
>>> If a server unexpectedly rebooted, OVS, when restarted, sets BFD
>>> UP on bfd-enabled geneve tunnels.
>>> However, if it takes time to restart OVN, an HA gw chassis
>>> would attract the traffic while being unable to handle it
>>> (as no flows), resulting in traffic loss.
>>>
>>> This is fixed by re-using ovs flow-restore-wait.
>>> If set, OVS waits (prevents upcalls, ignores bfd, ...) until reset.
>>> Once OVS receives the notification of flow-restore-wait being false,
>>> it restarts handling upcalls, bfd... and ignores any new change to
>>> flow-restore-wait.
>>>
>>> Hence, on chassis hosting ha gateways, OVN toggles flow-restore-wait:
>>> it set it to false, waits for ack from OVS and then sets it back to true.
>>> If server reboots, OVS will see flow-restore-wait being true.
>>>
>>> OVN also sets external_ids->ovn-managed-flow-restore-wait when setting
>>> flow-restore-wait. When set, it tells that OVN once set
>> flow-restore-wait.
>>>
>>> "ovs-ctl restart" also uses flow-restore-wait: when called, it saves the
>>> flows, stops "ovs-vswitchd", sets "flow-restore-wait" to true, restarts
>>> "ovs-vswitchd", restores the flows and finally removes
>> "flow-restore-wait".
>>> So OVS will wait either for "ovs-ctl restart" to remove
>> "flow-restore-wait"
>>> or for OVN to set "flow-restore-wait" to false.
>>>
>>> Reported-at: https://issues.redhat.com/browse/FDP-3075
>>> Signed-off-by: Xavier Simonart <[email protected]>
>>>
>>> ---
>>> -v2 : - Updated based on Mark's feedback (commit message, comments).
>>>       - Avoid setting flow-restore-wait for computes.
>>>       - Add external_ids->ovn-managed-flow-restore-wait.
>>>       - Updated test: add test for compute update + nits (variable name
>> changes)
>>> -v3 : - Fix test failing on CI (using wrong host socket)
>>>       - Address nits from Dumitru + few more nits (related to comments).
>>>       - Increased tolerated loss (and add comment) to avoid flakiness.
>>>       - Rebased.
>>> ---
>>
>> Hi Xavier,
>>
>> Thanks for the v3!
>>

[...]

>>> +
>>> +static void
>>> +manage_flow_restore_wait(struct ovsdb_idl_txn *ovs_idl_txn,
>>> +                         const struct ovsrec_open_vswitch *cfg,
>>> +                         uint64_t ofctrl_cur_cfg, uint64_t ovs_next_cfg,
>>> +                         int ovs_txn_status, bool is_ha_gw)
>>> +{
>>> +    enum flow_restore_wait_state {
>>> +        FRW_INIT,              /* Initial state */
>>> +        FRW_WAIT_TXN_COMPLETE, /* Sent false, waiting txn to complete */
>>> +        FRW_TXN_SUCCESS,       /* Txn completed. Waiting for OVS Ack. */
>>> +        FRW_DONE               /* Everything completed */
>>> +    };
>>> +
>>> +    static int64_t frw_next_cfg;
>>> +    static enum flow_restore_wait_state frw_state;
>>> +    static bool ofctrl_was_connected = false;
>>> +
>>> +    bool ofctrl_connected = ofctrl_is_connected();
>>> +
>>> +    if (!ovs_idl_txn || !cfg) {
>>> +        return;
>>> +    }
>>> +
>>> +    /* If OVS is stopped/started, make sure flow-restore-wait is
>> toggled. */
>>> +    if (ofctrl_connected && !ofctrl_was_connected) {
>>> +        frw_state = FRW_INIT;
>>> +    }
>>> +    ofctrl_was_connected = ofctrl_connected;
>>> +
>>> +    if (!ofctrl_connected) {
>>> +        return;
>>> +    }
>>> +
>>> +    bool frw = smap_get_bool(&cfg->other_config, "flow-restore-wait",
>> false);
>>> +    bool ovn_managed_once = smap_get_bool(&cfg->external_ids,
>>> +
>> "ovn-managed-flow-restore-wait",
>>> +                                          false);
>>> +
>>> +    if (frw && !ovn_managed_once) {
>>> +        /* frw has been set by ovs-ctl. Do not touch. */
>>> +        return;
>>> +    }
>>> +
>>> +    if (!is_ha_gw) {
>>> +        if (frw) {
>>> +            /* frw has once been set by OVN. We are now not an HA
>> chassis
>>> +             * anymore, unset it. */
>>> +            set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config,
>>> +                                  false, ovn_managed_once);
>>> +        }
>>> +        /* else we are not an HA chassis and frw is false. Ignore it. */
>>> +        return;
>>> +    }
>>> +
>>> +    switch (frw_state) {
>>> +    case FRW_INIT:
>>> +        if (ofctrl_cur_cfg > 0) {
>>> +            set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config,
>>> +                                  false, ovn_managed_once);
>>> +            frw_state = FRW_WAIT_TXN_COMPLETE;
>>> +            VLOG_INFO("Setting flow-restore-wait=false "
>>> +                      "(cur_cfg=%"PRIu64")", ofctrl_cur_cfg);
>>> +        }
>>> +        break;
>>> +
>>> +    case FRW_WAIT_TXN_COMPLETE:
>>> +        /* if (ovs_idl_txn != NULL), the transaction completed.
>>> +         * When the transaction completed, it either failed
>>> +         * (ovs_txn_status == 0) or succeeded (ovs_txn_status != 0). */
>>> +        if (ovs_txn_status == 0) {
>>> +            /* Previous transaction failed. */
>>> +            set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config,
>>> +                                  false, ovn_managed_once);
>>> +            break;
>>> +        }
>>> +        /* txn succeeded, get next_cfg */
>>> +        frw_next_cfg = ovs_next_cfg;
>>> +        frw_state = FRW_TXN_SUCCESS;
>>> +        /* fall through */
>>
>> I also asked a helpful assistant :) (gemini-3) to review this code and
>> it complained here with:
>>
>> "I think there is a potential issue here when `ovs_txn_status` is `-1`.
>> In `main()`, `manage_flow_restore_wait` is called only when
>> `ovs_idl_txn` is NOT NULL.
>>
>> If `ovs_txn_status` was `-1` in the previous iteration, it means we were
>> waiting for a transaction.
>>
>> If `ovs_idl_txn` is now NOT NULL, it means `ovsdb_idl_loop_run()`
>> processed the completion of that transaction.
>>
>> However, `ovs_txn_status` still holds `-1` because it's only updated at
>> the end of the loop by `ovsdb_idl_loop_commit_and_wait()`.
>>
>> The current logic treats `-1` as success (since it's not `0`). If the
>> transaction actually failed, we might move to `FRW_TXN_SUCCESS`
>> prematurely.
>>
>> Should we wait for `ovs_txn_status` to be something other than `-1`? Or
>> can we reliably determine success/failure from the IDL state?"
>> "
>>
>> This does sound legit to me, ovs_txn_status may still be -1 because when
>> we tried to commit frw=false we couldn't immediately asses whether the
>> transaction would succeed or not (in most cases that's what happens) but
>> then the transaction failed for some reason on the server side.
>>
>> At this point, in the new loop, "ovs_idl_txn" is not NULL because we can
>> try a new transaction (nothing is in flight anymore) but we assume the
>> previous one succeeded.
>>
>> Then we fall through..
>>
>>> +
>>> +    case FRW_TXN_SUCCESS:
>>> +        if (ovs_next_cfg < frw_next_cfg) {
>>> +            /* DB was reset, next_cfg went backwards. */
>>> +            VLOG_INFO("OVS DB reset (next_cfg %"PRId64" -> %"PRIu64"), "
>>> +                      "resetting state",
>>> +                      frw_next_cfg, ovs_next_cfg);
>>> +            set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config,
>>> +                                  false, ovn_managed_once);
>>> +            frw_state = FRW_WAIT_TXN_COMPLETE;
>>> +            break;
>>> +        }
>>> +
>>
>> frw is still "true" because we failed to set it to "false" (the
>> transaction failed server side).
>>
>>> +        if (!frw) {
>>> +            if (cfg->cur_cfg >= frw_next_cfg) {
>>> +                set_flow_restore_wait(ovs_idl_txn, cfg,
>> &cfg->other_config,
>>> +                                      true, ovn_managed_once);
>>> +                frw_state = FRW_DONE;
>>> +                VLOG_INFO("Setting flow-restore-wait=true");
>>> +            }
>>> +        } else {
>>
>> So we enter here where we assume "another task already set it to true".
>>
>> But it was actually just never changed.
>>
>> Is this analysis correct?  Do we need a way to get the ovs_txn_status in
>> the beginning of the current loop iteration?
>>
> I do not think so because of how  ovsdb_idl_loop_runs.
> Loop 1
> ----------
> - We try to set frw to false
> - We commit (at the end of the loop, ovsdb_idl_loop_commit_and_wait)
> - We have txn_status = -1 (in_progress).
> 
> Loop 2
> ----------
> - The txn fails in ovsdb_idl_loop_run.
> - Looking at ovsdb_idl_loop_run:
>   - We call ovsdb_idl_run (which gets the status)
>   - The txn failed, so commiting_txn != NULL.
>   - We do not ovsdb_idl_try_commit_loop_txn (which would set committing_txn
> to NULL) as txn failed.
>   - So loop->open_txn is NULL in this loop
>   - We do not see the txn as completed in this loop.
> 
> End of the loop 2 (ovsdb_idl_loop_commit_and_wait)
> - txn_status is set to 0.
> - loop->committing_txn is set to NULL
> 
> Loop 3
> ----------
> - We have a txn
> - We still have status == 0, so in FRW_WAIT_TXN_COMPLETE we handle the
> failure.
> 
> So, it looks OK to me
> 

I see, thanks for the detailed walk through!

Acked-by: Dumitru Ceara <[email protected]>

I'll wait with merging this until next week in case Mark wants to have a
look at this new version too.

Regards,
Dumitru

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to