Hey folks, Thank you for the ping, and sorry for the inconvenience. We will try to prioritize that CI flake on our community meeting to unblock you, will update the assignee and any progress on the linked issue.
Kind regards, Nadia ________________________________ From: Dumitru Ceara <[email protected]> Sent: Wednesday, September 24, 2025 9:16 To: Mark Michelson <[email protected]> Cc: Nadia Pinaeva <[email protected]>; Tim Rozet <[email protected]>; Girish Moodalbail <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; Ales Musil <[email protected]> Subject: Re: [PATCH ovn] ci: ovn-kubernetes: Enable "External Gateway" ovn-kubernetes CI lanes. External email: Use caution opening links or attachments On 9/23/25 9:03 PM, Mark Michelson wrote: > Hi Dumitru, > Hi Mark, > I don't see anything wrong with the OVN changes. > > However, it's not great that we had to re-run the checks three times > in order to get a pass. Each of the three failed runs was triggered by > the external gateway tests added in this change. The github issue you > linked is ~15 months old and the only comments on it are from people > who have also observed the flakiness in their PRs/test runs. This > doesn't give confidence that fixing the flakes is a priority for the > ovn-k project. > > The goal of this change is to catch regressions in OVN code. But if > the tests are flaky, we can't rely on a test failure to be meaningful. > We could be seeing a regression, or more likely, it could just be the That's exactly why I was hoping we'd get help from the ovn-kubernetes community to address some of those flakes (please see my previous emails in this thread). That's in the interest of both communities: - ovn-kubernetes can have better, more stable, tests - ovn can catch regressions early > test being flaky. When people get used to the idea that the CI may be > flaky, they stop paying attention to the results. We therefore would > be just as likely to merge another regression into OVN, thinking it > was just the testsuite being flaky. My conclusion is that the idea > behind the change is good, but we should not merge it until the tests > are stabilized. > I agree, we should wait until ovn-kubernetes CI (or at least the subset we're using) is stable. I'll use this occasion to ping ovn-kubernetes maintainers (in cc) again to ask for help in investigating these flaky tests: https://github.com/ovn-kubernetes/ovn-kubernetes/issues/4432 Our failures were: Summarizing 1 Failure: [FAIL] External Gateway With Admin Policy Based External Route CRs BFD e2e non-vxlan external gateway through a dynamic hop Should validate TCP/UDP connectivity to an external gateway's loopback address via a pod with a dynamic hop [It] TCP ipv6 [Feature:ExternalGateway] /home/runner/work/ovn/ovn/src/github.com/ovn-kubernetes/ovn-kubernetes/test/e2e/external_gateways.go:2290 Summarizing 1 Failure: [FAIL] External Gateway With Admin Policy Based External Route CRs BFD e2e non-vxlan external gateway through a dynamic hop Should validate TCP/UDP connectivity to an external gateway's loopback address via a pod with a dynamic hop [It] TCP ipv4 [Feature:ExternalGateway] /home/runner/work/ovn/ovn/src/github.com/ovn-kubernetes/ovn-kubernetes/test/e2e/external_gateways.go:2290 Summarizing 1 Failure: [FAIL] External Gateway With Admin Policy Based External Route CRs BFD e2e non-vxlan external gateway through a dynamic hop Should validate TCP/UDP connectivity to an external gateway's loopback address via a pod with a dynamic hop [It] UDP ipv6 [Feature:ExternalGateway] /home/runner/work/ovn/ovn/src/github.com/ovn-kubernetes/ovn-kubernetes/test/e2e/external_gateways.go:2306 Regards, Dumitru > On Wed, Sep 17, 2025 at 10:10 AM Dumitru Ceara <[email protected]> wrote: >> >> On 9/8/25 10:52 AM, Dumitru Ceara wrote: >>> On 9/8/25 7:58 AM, Ales Musil wrote: >>>> On Fri, Sep 5, 2025 at 1:55 PM Dumitru Ceara <[email protected]> wrote: >>>> >>>>> Since [0] some feature tests have been moved out of the default >>>>> "control-plane" CI lane in ovn-kubernetes. In order to run those, they >>>>> have to be explicitly triggered. When that change happened the OVN CI >>>>> lanes were not updated so we lost test coverage for those features. >>>>> >>>>> This lead to regressions like [1] creeping in. >>>>> >>>>> This commit re-enables the tests. >>>>> >>>>> [0] https://github.com/ovn-kubernetes/ovn-kubernetes/commit/cf116ea >>>>> [1] >>>>> https://mail.openvswitch.org/pipermail/ovs-dev/2025-September/426000.html >>>>> >>>>> Signed-off-by: Dumitru Ceara <[email protected]> >>>>> --- >> >>>>> 2.50.1 >>>>> >>>>> >>>> Recheck-request: github-robot-_ovn-kubernetes >>>> >>> >>> Hi, >>> >>> Unfortunately the failure is known to happen every now and then: >>> https://github.com/ovn-kubernetes/ovn-kubernetes/issues/4432 >>> >>> It would be great to get some ovn-kubernetes help on this. >>> >> >> Bump. >> >> Also, just to get more CI signal: >> >> Recheck-request: github-robot-_ovn-kubernetes >> >> Regards, >> Dumitru >> > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
