On 3/27/26 2:40 PM, Dumitru Ceara wrote:
> On 3/27/26 2:31 PM, Xavier Simonart wrote:
>> Hi Dumitru
>>
> 
> Hi Xavier,
> 
>> Thanks again for looking at this.
>>
>> On Fri, Mar 27, 2026 at 12:38 PM Dumitru Ceara <[email protected]> wrote:
>>
>>> On 3/26/26 5:16 PM, Xavier Simonart wrote:
>>>> If a server unexpectedly rebooted, OVS, when restarted, sets BFD
>>>> UP on bfd-enabled geneve tunnels.
>>>> However, if it takes time to restart OVN, an HA gw chassis
>>>> would attract the traffic while being unable to handle it
>>>> (as no flows), resulting in traffic loss.
>>>>
>>>> This is fixed by re-using ovs flow-restore-wait.
>>>> If set, OVS waits (prevents upcalls, ignores bfd, ...) until reset.
>>>> Once OVS receives the notification of flow-restore-wait being false,
>>>> it restarts handling upcalls, bfd... and ignores any new change to
>>>> flow-restore-wait.
>>>>
>>>> Hence, on chassis hosting ha gateways, OVN toggles flow-restore-wait:
>>>> it set it to false, waits for ack from OVS and then sets it back to true.
>>>> If server reboots, OVS will see flow-restore-wait being true.
>>>>
>>>> OVN also sets external_ids->ovn-managed-flow-restore-wait when setting
>>>> flow-restore-wait. When set, it tells that OVN once set
>>> flow-restore-wait.
>>>>
>>>> "ovs-ctl restart" also uses flow-restore-wait: when called, it saves the
>>>> flows, stops "ovs-vswitchd", sets "flow-restore-wait" to true, restarts
>>>> "ovs-vswitchd", restores the flows and finally removes
>>> "flow-restore-wait".
>>>> So OVS will wait either for "ovs-ctl restart" to remove
>>> "flow-restore-wait"
>>>> or for OVN to set "flow-restore-wait" to false.
>>>>
>>>> Reported-at: https://issues.redhat.com/browse/FDP-3075
>>>> Signed-off-by: Xavier Simonart <[email protected]>
>>>>
>>>> ---
>>>> -v2 : - Updated based on Mark's feedback (commit message, comments).
>>>>       - Avoid setting flow-restore-wait for computes.
>>>>       - Add external_ids->ovn-managed-flow-restore-wait.
>>>>       - Updated test: add test for compute update + nits (variable name
>>> changes)
>>>> -v3 : - Fix test failing on CI (using wrong host socket)
>>>>       - Address nits from Dumitru + few more nits (related to comments).
>>>>       - Increased tolerated loss (and add comment) to avoid flakiness.
>>>>       - Rebased.
>>>> ---
>>>
>>> Hi Xavier,
>>>
>>> Thanks for the v3!
>>>
> 
> [...]
> 
>>>> +
>>>> +static void
>>>> +manage_flow_restore_wait(struct ovsdb_idl_txn *ovs_idl_txn,
>>>> +                         const struct ovsrec_open_vswitch *cfg,
>>>> +                         uint64_t ofctrl_cur_cfg, uint64_t ovs_next_cfg,
>>>> +                         int ovs_txn_status, bool is_ha_gw)
>>>> +{
>>>> +    enum flow_restore_wait_state {
>>>> +        FRW_INIT,              /* Initial state */
>>>> +        FRW_WAIT_TXN_COMPLETE, /* Sent false, waiting txn to complete */
>>>> +        FRW_TXN_SUCCESS,       /* Txn completed. Waiting for OVS Ack. */
>>>> +        FRW_DONE               /* Everything completed */
>>>> +    };
>>>> +
>>>> +    static int64_t frw_next_cfg;
>>>> +    static enum flow_restore_wait_state frw_state;
>>>> +    static bool ofctrl_was_connected = false;
>>>> +
>>>> +    bool ofctrl_connected = ofctrl_is_connected();
>>>> +
>>>> +    if (!ovs_idl_txn || !cfg) {
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    /* If OVS is stopped/started, make sure flow-restore-wait is
>>> toggled. */
>>>> +    if (ofctrl_connected && !ofctrl_was_connected) {
>>>> +        frw_state = FRW_INIT;
>>>> +    }
>>>> +    ofctrl_was_connected = ofctrl_connected;
>>>> +
>>>> +    if (!ofctrl_connected) {
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    bool frw = smap_get_bool(&cfg->other_config, "flow-restore-wait",
>>> false);
>>>> +    bool ovn_managed_once = smap_get_bool(&cfg->external_ids,
>>>> +
>>> "ovn-managed-flow-restore-wait",
>>>> +                                          false);
>>>> +
>>>> +    if (frw && !ovn_managed_once) {
>>>> +        /* frw has been set by ovs-ctl. Do not touch. */
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    if (!is_ha_gw) {
>>>> +        if (frw) {
>>>> +            /* frw has once been set by OVN. We are now not an HA
>>> chassis
>>>> +             * anymore, unset it. */
>>>> +            set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config,
>>>> +                                  false, ovn_managed_once);
>>>> +        }
>>>> +        /* else we are not an HA chassis and frw is false. Ignore it. */
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    switch (frw_state) {
>>>> +    case FRW_INIT:
>>>> +        if (ofctrl_cur_cfg > 0) {
>>>> +            set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config,
>>>> +                                  false, ovn_managed_once);
>>>> +            frw_state = FRW_WAIT_TXN_COMPLETE;
>>>> +            VLOG_INFO("Setting flow-restore-wait=false "
>>>> +                      "(cur_cfg=%"PRIu64")", ofctrl_cur_cfg);
>>>> +        }
>>>> +        break;
>>>> +
>>>> +    case FRW_WAIT_TXN_COMPLETE:
>>>> +        /* if (ovs_idl_txn != NULL), the transaction completed.
>>>> +         * When the transaction completed, it either failed
>>>> +         * (ovs_txn_status == 0) or succeeded (ovs_txn_status != 0). */
>>>> +        if (ovs_txn_status == 0) {
>>>> +            /* Previous transaction failed. */
>>>> +            set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config,
>>>> +                                  false, ovn_managed_once);
>>>> +            break;
>>>> +        }
>>>> +        /* txn succeeded, get next_cfg */
>>>> +        frw_next_cfg = ovs_next_cfg;
>>>> +        frw_state = FRW_TXN_SUCCESS;
>>>> +        /* fall through */
>>>
>>> I also asked a helpful assistant :) (gemini-3) to review this code and
>>> it complained here with:
>>>
>>> "I think there is a potential issue here when `ovs_txn_status` is `-1`.
>>> In `main()`, `manage_flow_restore_wait` is called only when
>>> `ovs_idl_txn` is NOT NULL.
>>>
>>> If `ovs_txn_status` was `-1` in the previous iteration, it means we were
>>> waiting for a transaction.
>>>
>>> If `ovs_idl_txn` is now NOT NULL, it means `ovsdb_idl_loop_run()`
>>> processed the completion of that transaction.
>>>
>>> However, `ovs_txn_status` still holds `-1` because it's only updated at
>>> the end of the loop by `ovsdb_idl_loop_commit_and_wait()`.
>>>
>>> The current logic treats `-1` as success (since it's not `0`). If the
>>> transaction actually failed, we might move to `FRW_TXN_SUCCESS`
>>> prematurely.
>>>
>>> Should we wait for `ovs_txn_status` to be something other than `-1`? Or
>>> can we reliably determine success/failure from the IDL state?"
>>> "
>>>
>>> This does sound legit to me, ovs_txn_status may still be -1 because when
>>> we tried to commit frw=false we couldn't immediately asses whether the
>>> transaction would succeed or not (in most cases that's what happens) but
>>> then the transaction failed for some reason on the server side.
>>>
>>> At this point, in the new loop, "ovs_idl_txn" is not NULL because we can
>>> try a new transaction (nothing is in flight anymore) but we assume the
>>> previous one succeeded.
>>>
>>> Then we fall through..
>>>
>>>> +
>>>> +    case FRW_TXN_SUCCESS:
>>>> +        if (ovs_next_cfg < frw_next_cfg) {
>>>> +            /* DB was reset, next_cfg went backwards. */
>>>> +            VLOG_INFO("OVS DB reset (next_cfg %"PRId64" -> %"PRIu64"), "
>>>> +                      "resetting state",
>>>> +                      frw_next_cfg, ovs_next_cfg);
>>>> +            set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config,
>>>> +                                  false, ovn_managed_once);
>>>> +            frw_state = FRW_WAIT_TXN_COMPLETE;
>>>> +            break;
>>>> +        }
>>>> +
>>>
>>> frw is still "true" because we failed to set it to "false" (the
>>> transaction failed server side).
>>>
>>>> +        if (!frw) {
>>>> +            if (cfg->cur_cfg >= frw_next_cfg) {
>>>> +                set_flow_restore_wait(ovs_idl_txn, cfg,
>>> &cfg->other_config,
>>>> +                                      true, ovn_managed_once);
>>>> +                frw_state = FRW_DONE;
>>>> +                VLOG_INFO("Setting flow-restore-wait=true");
>>>> +            }
>>>> +        } else {
>>>
>>> So we enter here where we assume "another task already set it to true".
>>>
>>> But it was actually just never changed.
>>>
>>> Is this analysis correct?  Do we need a way to get the ovs_txn_status in
>>> the beginning of the current loop iteration?
>>>
>> I do not think so because of how  ovsdb_idl_loop_runs.
>> Loop 1
>> ----------
>> - We try to set frw to false
>> - We commit (at the end of the loop, ovsdb_idl_loop_commit_and_wait)
>> - We have txn_status = -1 (in_progress).
>>
>> Loop 2
>> ----------
>> - The txn fails in ovsdb_idl_loop_run.
>> - Looking at ovsdb_idl_loop_run:
>>   - We call ovsdb_idl_run (which gets the status)
>>   - The txn failed, so commiting_txn != NULL.
>>   - We do not ovsdb_idl_try_commit_loop_txn (which would set committing_txn
>> to NULL) as txn failed.
>>   - So loop->open_txn is NULL in this loop
>>   - We do not see the txn as completed in this loop.
>>
>> End of the loop 2 (ovsdb_idl_loop_commit_and_wait)
>> - txn_status is set to 0.
>> - loop->committing_txn is set to NULL
>>
>> Loop 3
>> ----------
>> - We have a txn
>> - We still have status == 0, so in FRW_WAIT_TXN_COMPLETE we handle the
>> failure.
>>
>> So, it looks OK to me
>>
> 
> I see, thanks for the detailed walk through!
> 
> Acked-by: Dumitru Ceara <[email protected]>
> 
> I'll wait with merging this until next week in case Mark wants to have a
> look at this new version too.
> 

I went ahead and applied this patch to the main branch.

Thanks,
Dumitru

> Regards,
> Dumitru
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to