Hi, from my perspective the patch works for all cases. My test environment runs with several k8s clusters and I haven't noticed any etcd failures so far.
Best regards Michael Von: Lazuardi Nasution <mrxlazuar...@gmail.com> Gesendet: Dienstag, 4. April 2023 09:41 An: Plato, Michael <michael.pl...@tu-berlin.de> Cc: ovs-discuss@openvswitch.org Betreff: Re: [ovs-discuss] ovs-vswitchd crashes serveral times a day Hi Michael, Is your patch working on the same subnet unreachable traffic too. In my case, crashes happen when too many unreachable replies even from the same subnet. For example, when one of the etcd instances is down, there will be huge reconnection attempts and then unreachable replies from the destination VM where the down etcd instance exists. Best regards. On Tue, Apr 4, 2023 at 1:06 PM Plato, Michael <michael.pl...@tu-berlin.de<mailto:michael.pl...@tu-berlin.de>> wrote: Hi, I have some news on this topic. Unfortunately I could not find the root cause. But I managed to implement a workaround (see patch in attachment). The basic idea is to mark the nat flows as invalid if there is no longer an associated connection. From my point of view it is a race condition. It can be triggered by many short-lived connections. With the patch we no longer have any crashes. I can't say if it has any negative effects though, as I'm not an expert. So far I haven't found any problems at least. Without this patch we had hundreds of crashes a day :/ Best regards Michael Von: Lazuardi Nasution <mrxlazuar...@gmail.com<mailto:mrxlazuar...@gmail.com>> Gesendet: Montag, 3. April 2023 13:50 An: ovs-discuss@openvswitch.org<mailto:ovs-discuss@openvswitch.org> Cc: Plato, Michael <michael.pl...@tu-berlin.de<mailto:michael.pl...@tu-berlin.de>> Betreff: Re: [ovs-discuss] ovs-vswitchd crashes serveral times a day Hi, Is this related to following glibc bug? I'm not so sure about this because when I check the glibc source of installed version (2.35), the proposed patch has been applied. https://sourceware.org/bugzilla/show_bug.cgi?id=12889 I can confirm that this problem only happen if I use statefull ACL which is related to conntrack. The racing situation happen when massive unreachable replies are received. For example, if I run etcd on VMs but one etcd node has been disabled which causes massive connection attempts and unreachable replies. Best regards. On Mon, Mar 20, 2023, 10:58 PM Lazuardi Nasution <mrxlazuar...@gmail.com<mailto:mrxlazuar...@gmail.com>> wrote: Hi Michael, Have you found the solution for this case? I find the same weird problem without any information about which conntrack entries are causing this issue. I'm using OVS 3.0.1 with DPDK 21.11.2 on Ubuntu 22.04. By the way, this problem is disappear after I remove some Kubernutes cluster VMs and some DB cluster VMs. Best regards. Date: Thu, 29 Sep 2022 07:56:32 +0000 From: "Plato, Michael" <michael.pl...@tu-berlin.de<mailto:michael.pl...@tu-berlin.de>> To: "ovs-discuss@openvswitch.org<mailto:ovs-discuss@openvswitch.org>" <ovs-discuss@openvswitch.org<mailto:ovs-discuss@openvswitch.org>> Subject: [ovs-discuss] ovs-vswitchd crashes serveral times a day Message-ID: <8e53d3d0674049e69b2b7f3c4b0b8...@tu-berlin.de<mailto:8e53d3d0674049e69b2b7f3c4b0b8...@tu-berlin.de>> Content-Type: text/plain; charset="us-ascii" Hi, we are about to roll out our new openstack infrastructure based on yoga and during our testing we observered that the openvswitch-switch systemd unit restarts several times a day, causing network interruptions for all VMs on the compute node in question. After some research we found that the ovs-vswitchd crashes with the following assertion failure: "2022-09-29T06:51:05.195Z|00003|util(pmd-c01/id:8)|EMER|../lib/conntrack.c:1095: assertion conn->conn_type == CT_CONN_TYPE_DEFAULT failed in conn_update_state()" To get more information about the connection that leads to this assertion failure, I added some debug code to conntrack.c . We have seen that we can trigger this issue when trying to connect from a VM to a destination which is unreachable. For example curl https://www.google.de:444 Shortly after that we get an assertion and the debug code says: conn_type=1 (may be CT_CONN_TYPE_UN_NAT) ? src ip 172.217.16.67 dst ip 141.23.xx.xx rev src ip 141.23.xx.xx rev dst ip 172.217.16.67 src/dst ports 444/46212 rev src/dst ports 46212/444 zone/rev zone 2/2 nw_proto/rev nw_proto 6/6 ovs-appctl dpctl/dump-conntrack | grep "444" tcp,orig=(src=141.23.xx.xx,dst=172.217.16.67,sport=46212,dport=444),reply=(src=172.217.16.67,dst=141.23.xx.xx,sport=444,dport=46212),zone=2,protoinfo=(state=SYN_SENT) Versions: ovs-vsctl --version ovs-vsctl (Open vSwitch) 2.17.2 DB Schema 8.3.0 ovn-controller --version ovn-controller 22.03.0 Open vSwitch Library 2.17.0 OpenFlow versions 0x6:0x6 SB DB Schema 20.21.0 DPDK 21.11.2 We are now unsure if this is a misconfiguration or if we hit a bug. Thanks for any feedback Michael
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss