Hi Mark and Ilya Thanks for the reply and the feedback
On Wed, Mar 4, 2026 at 9:30 PM Ilya Maximets <[email protected]> wrote: > On 3/3/26 10:21 AM, Xavier Simonart via dev wrote: > > If a server unexpectedly rebooted, OVS, when restarted, sets BFD > > UP on bfd-enabled geneve tunnels. > > However, if it takes time to restart OVN, an HA gw chassis > > would attract the traffic while being unable to handle it > > (as no flows), resulting in traffic loss. > > > > This is fixed by re-using ovs flow-restore-wait. > > If set, OVS waits (prevents upcalls, ignores bfd, ...) until reset. > > Once OVS receives the notification of flow-restore-wait being false, > > it restarts handling upcalls, bfd... and ignores any new change to > > flow-restore-wait. > > > > Hence OVN toggles flow-restore-wait: set it to false, waits for ack > > from OVS and then sets it back to true. > > If server reboots, OVS will see flow-restore-wait being true. > > > > "ovs-ctl restart" also uses flow-restore-wait. > > So OVS will wait either "ovs-ctl restart" or OVN sets flow-restore-wait > > to false. > > > > Reported-at: https://issues.redhat.com/browse/FDP-3075 > > Signed-off-by: Xavier Simonart <[email protected]> > > --- > > controller/ovn-controller.c | 133 +++++++++- > > tests/multinode-macros.at | 22 ++ > > tests/multinode.at | 504 ++++++++++++++++++++++++------------ > > 3 files changed, 488 insertions(+), 171 deletions(-) > > Hi, Xavier and Mark. > > I haven't looked at the code, but I have a concern about this change. > > If we go this way, then ovn-controller will not be able to become a primary > controller and will be stuck using the service connection (br-int.mgmt) as > it does today. This is because primary controller is not allowed while > flow-restore-wait is set. So, on node reboot OVS will get stuck with no > flows as ovn-ctl will not clear the flag and OVS will not allow > ovn-controller > to connect. And upgrading from a version of ovn-controller that sets > flow-restore-wait and uses a service connection to a version that is a > primary controller will likely be problematic. > > ovn-controller abusing the service connection has a few issues that came > up multiple times in the past: > > 1. Service connection is not configurable. E.g. it's not possible to set > inactivity probe interval. It's always the default 60 seconds. It > means > that full recompute can never be longer than 120 seconds in > ovn-controller. > Otherwise it gets into an infinite reconnect + recompute loop. This is > a > common issue that people run into at high scale. > > 2. There is a race between ovn-controller and ovs-ctl for configuring > OpenFlow. > When ovs-ctl restarts OVS: > - It saves the flows and other configuration. > - It sets flow-restore-wait. > - Restarts the OVS deamon. > - Configures all the OpenFlow stuff back via the service connection. > - Removes the flow-restore-wait. > However, since ovn-controller is not a primary controller, it also > connects > to OVS via the service connection and starts programming flows and tunnel > metadata and other things messing up the config and also breaking ovs-ctl > that would bail if one of the restoration commands fails. This issue > actually happened in OpenStack in the past. > > 3. There is no good way to get statistics for the OpenFlow connection, > because statistics for the service connection is not exposed through > the database. > > What we could do instead is to make ovn-controller a primary controller > first > and extend the flow-restore-wait mechanism in OVS to have a mode where > primary > controllers are allowed (switch it from boolean to a enum > true/false/controller), > so it can be used by ovn-controller without messing with the standard > ovs-ctl. > > > We discussed the approach off-list some time ago. Why did you choose not > to go with it? > Summarizing off-line discussions: You proposed: - New flow-restore-wait "controller" mode in OVS. - OVN becoming the primary controller, and using the "controller" flow-restore-wait. So this would be considered as a feature, and we would not be able to provide a release fixing the issue for quite some time. So we could do: - First, fix the issue in this FDP with this patch: an OVN-only bug fix. - Then, create and implement a JIRA ticket related to the ovn-controller becoming a primary controller, together with OVS supporting a new "controller" option for flow-restore-wait. Fixing both together would be "easier" (no complex code handling for the upgrade), but it would delay the release of the fix. One complexity about the upgrade is that, after OVN is upgraded, when ovn-controller sees `flow-restore-wait` set to true, It does not know whether ovs-ctl set it (in which case ovn-controller simply does nothing and doesn't install flows and not overwrite the option), or if the previous version of ovn-controller set it (in which case it must overwrite the option with "controller"). So we need extra code to handle that. I'll open a JIRA ticket describing the work needed in both ovs & ovn. Unrelated to this, I think that this patch has another potential issue and requires a new revision. When ovs is restarted, BFD goes down (admin_down). For HA gateways, this means another lower priority gateway will take over and handle the traffic. This patch somehow delays BFD from coming up after an unexpected server reboot (until OVN is restarted), which prevents traffic blackholing. It will also delay BFD from going up after an expected ovs-vswitchd restart. This also looks okay, as we have another gateway handling the traffic. However, on computes, it also delays BFD going up. After an unexpected server reboot, this would not change much: no flows exist (neither kernel nor OpenFlows are installed by ovs-ctl). But after an expected ovs-vswitchd restart, this would unnecessarily delay BFD from coming up. BFD went down, probably for a second or two, when ovs restarted. Kernel flows are still active, and we restored openflows using ovs-ctl. So, as soon as BFD comes up, the "compute" runs properly. There is no reason to keep BFD down and prevent the compute chassis from handling traffic. We could set the flow-restore-wait option only for chassis acting as HA gateway. Hence compute-only chassis would not see any new delay. Thanks Xavier > Best regards, Ilya Maximets. > > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
