On 9/5/25 5:48 PM, Han Zhou wrote: > On Fri, Sep 5, 2025 at 2:50 AM Dumitru Ceara <[email protected]> wrote: >> >> On 9/5/25 7:23 AM, Han Zhou wrote: >>> On Thu, Sep 4, 2025 at 10:22 PM Han Zhou <[email protected]> wrote: >>>> >>>> >>>> >>>> On Thu, Sep 4, 2025 at 4:15 PM Dumitru Ceara <[email protected]> wrote: >>>>> >>>>> On 9/4/25 8:51 PM, Han Zhou wrote: >>>>>> Hi everyone, >>>>>> >>>>> >>>>> Hi, >>>>> >>>>> I'm adding the rest of the ovn-kubernetes maintainers to the list of >>>>> recipients of this email too. >>>>> >>>>>> There is an issue raised in ovn-k8s community [0] due to a regression >>>>>> introduced by OVN commit [1]. The commit breaks the MEG (Multiple >>>>>> External Gateways) feature. The feature requires that src-route is >>>>>> preferred over dst-route. >>>>>> >>>>>> It was considered risky to change the default behavior with commit >>> [1], >>>>>> and so it had been hanging around for several months asking for >>>>>> feedback. Unfortunately this didn't get attention until now. >>>>>> >>>>>> The old behavior of OVN (before commit [1]): longest prefix length >>> route >>>>>> is preferred, regardless of src or dst. If the prefix length is the >>>>>> same, prefer dst over src route. This was essentially wrong because >>> src >>>>>> and dst IP are different fields and it is not reasonable to compare >>> the >>>>>> prefix length between them. This old behavior leads to unreasonable >>>>>> behavior for ovn-k8s central mode cluster router east-west traffic >>> when >>>>>> there are different node-subnet prefix lengths across nodes. The MEG >>>>>> feature of ovn-k8s happened to work because the src routes added for >>>>>> that feature were all /32. What the feature really requires is to >>> prefer >>>>>> src routes over dst routes. After commit [1], the unpredictable east- >>>>>> west traffic routing problem in the central mode cluster router is >>>>>> resolved, but the MEG feature is broken. >>>>>> >>>>>> Now the problem is, ovn-k8s' central mode cluster router requires dst >>>>>> routes over src routes, while the MEG feature requires src routes > over >>>>>> dst routes. For ovn-k8s IC mode, the cluster router src routes are > not >>>>>> required any more, so they can be removed, and the central mode is >>> going >>>>>> to be deprecated anyway. So ovn-k8s would prefer src-over-dst. >>>>>> >>>>>> At the same time, we are releasing OVN 25.09 tomorrow. Based on the >>>>>> above information, we have below options: >>>>>> >>>>>> Option 1: revert the commit [1] before the 25.09 release. This is the >>>>>> easiest, but IMHO it is not the right thing to do since we will go >>> back >>>>>> to the *wrong* behavior and continue encouraging the bad design of >>> CMS. >>>>>> >>>>> >>>>> My vote goes to Option 1 for now. The only reason I acked [1] >>>>> originally was because there was _no_ objection from ovn-kubernetes >>>>> maintainers and we were operating under the impression that _no_ >>>>> ovn-kubernetes features get broken by the change in behavior (see >>>>> original discussion >>>>> >>> > https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/#3420641 >>> ). >>>>> >>>>> That turned out to be a wrong assumption. In my opinion, we cannot >>>>> accept a behavior change the knowingly breaks users. So we should >>>>> revert [1]. >>>>> >>>>>> Option 2: keep the current behavior of OVN as default behavior in >>> 25.09 >>>>>> release. We can add an option to let users change the behavior so > that >>>>>> src route is preferred over dst route. In particular for ovn-k8s, it >>> can >>>>>> be configured to src-over-dst so that the MEG feature can be fixed, >>> but >>>>>> it should only be used for IC mode (and at the same time remove the >>> src >>>>>> routes for IC mode). For central mode probably there is no user for >>> the >>>>>> MEG feature and central mode will be deprecated, so we assume it is >>> not >>>>>> a problem to keep the dst-over-src behavior. This option can be added >>>>>> after the 25.09 release as a bug fix (backport to 25.09). >>>>>> >>>>> >>>>> However, I don't think I'd oppose backporting a new feature like this. >>>>> But it would have to be an opt-in feature, not an opt-out as suggested >>> here. >>>>> >>>>> That is, I think we should revert [1] and add an opt-in knob or option >>>>> to change behavior. We can backport this knob/option to 25.09.z. >>>>> >>>>> We already broke ovn-kubernetes [0] why would we risk breaking other >>>>> CMSs too? >>>>> >>>>>> Option 3: like option 2, making the behavior configurable, but > default >>>>>> to src-over-dst. The problems of this option are: >>>>>> - it would immediately break ovn-k8s central mode east-west traffic >>>>>> because the central mode relies on src-routes having lower priority. >>>>>> - we'd better make this change before the release, which is a little >>>>>> risky and may delay the release. Otherwise, we would end up with >>>>>> changing the default behavior again after the release. >>>>>> >>>>>> I personally prefer option 2. >>>>>> Thanks folks for the discussion. Opinions are welcome! >>>>>> >>>>>> Best, >>>>>> Han >>>>>> >>>>> >>>>> Regards, >>>>> Dumitru >>>>> >>>> >>>> Thanks Dumitru for the feedback. I sent a patch to revert the commit > [1]. >>>> If there are no other opinions, we may merge it and backport to 25.09 >>> before the release. >>> >> >> Hi Han, >> >> Ales and Surya shared their opinions [0] [1]. I went ahead and merged >> the revert. Would you have time to work on making the routing behavior >> configurable (default as it is on 25.03 and opt-in)? As discussed we >> can backport that to branch-25.09 after the release. >> > > Thanks for reviewing and merging it! Yes I can work on the configurable > behavior.
Hi Han, > Checking all the discussions so far from you, Ilya and Tim, it seems > keeping current behavior as default and adding an opt-in option for both > src-over-dst and dst-over-src (and keep the router policy stage untouched) > are the most practical approach. Any objections? > I also think that's a good approach and that can also be backported to 25.09. So +1 from me. But on the long term I think we should consider implementing the suggestion Ilya had: https://mail.openvswitch.org/pipermail/ovs-dev/2025-September/426021.html I think it would align OVN with other well established implementations potentially avoiding confusion and making it easier to use. As a set of separate features of course. :) Regards, Dumitru > Best, > Han > >> Regards, >> Dumitru >> >> [0] >> https://mail.openvswitch.org/pipermail/ovs-dev/2025-September/426008.html >> [1] >> https://mail.openvswitch.org/pipermail/ovs-dev/2025-September/426012.html >> >>> Sorry, forgot the link to the patch: >>> > https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/ >>> >>>> >>>> Best, >>>> Han >>>> >>>>>> [0] https://cloud-native.slack.com/archives/C08452HR8V6/ >>>>>> p1756994862428589 < >>> https://cloud-native.slack.com/archives/C08452HR8V6/ >>>>>> p1756994862428589> >>>>>> [1] https://github.com/ovn-org/ovn/ >>>>>> commit/27cc274e66acd9e0ed13525f9ea2597804107348 < >>> https://github.com/ovn- >>>>>> org/ovn/commit/27cc274e66acd9e0ed13525f9ea2597804107348> >>> >>>> >>> > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
