Hi Alin, it can be reproduced by adding a nat on the adapter that the switch is bound to (Add-NetNat commend in windows) and then using OVN on top of it. It is the same issue we discussed 6 years ago, here was a gist to reproduce it with the Cloudbase driver:
https://gist.github.com/fw2568/b5fcbddebb83b2a7ce428a71704d5675#file-ovn_test_nat-ps1-L12. ovs-vsctl show: Bridge br-nat Port br-nat Interface br-nat type: internal Port patch-SN-externalNet-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e-default-default-br-nat-to-br-int Interface patch-SN-externalNet-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e-default-default-br-nat-to-br-int type: patch options: {peer=patch-br-int-to-SN-externalNet-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e-default-default-br-nat} Bridge br-int fail_mode: secure datapath_type: system Port br-int Interface br-int type: internal Port patch-br-int-to-SN-externalNet-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e-default-default-br-nat Interface patch-br-int-to-SN-externalNet-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e-default-default-br-nat type: patch options: {peer=patch-SN-externalNet-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e-default-default-br-nat-to-br-int} ovn-nbctl show: .\ovn-nbctl.exe show switch 9fcfdbb0-64cb-470e-a6a5-3241b716a2ec (52bad498-df5c-43c1-b690-39e83b0e2bee) port SR-52bad498-df5c-43c1-b690-39e83b0e2bee-project-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e type: router router-port: RS-project-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e-52bad498-df5c-43c1-b690-39e83b0e2bee switch f56b8d25-a2a7-4bd2-a9c1-09953b8b62ee (externalNet-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e-default) port SR-externalNet-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e-default-project-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e type: router router-port: RS-project-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e-externalNet-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e-default port SN-externalNet-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e-default-default-br-nat type: localnet addresses: ["unknown"] router 27b4ccd2-e97d-4437-81df-30f9478c1948 (project-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e) port RS-project-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e-externalNet-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e-default mac: "d2:ab:32:33:98:70" ipv6-lla: "fe80::d0ab:32ff:fe33:9870" networks: ["10.250.248.10/22"] port RS-project-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e-52bad498-df5c-43c1-b690-39e83b0e2bee mac: "d2:ab:7b:78:1f:89" ipv6-lla: "fe80::d0ab:7bff:fe78:1f89" networks: ["10.0.0.1/20"] nat 7109d3b8-ac32-4f11-8da9-bb289af5e2d6 external ip: "10.250.248.10" logical ip: "10.0.0.0/20" type: "snat" get-netnat Name : eryph_default_default ExternalIPInterfaceAddressPrefix : InternalIPInterfaceAddressPrefix : 10.250.248.0/22 IcmpQueryTimeout : 30 TcpEstablishedConnectionTimeout : 1800 TcpTransientConnectionTimeout : 120 TcpFilteringBehavior : AddressDependentFiltering UdpFilteringBehavior : AddressDependentFiltering UdpIdleSessionTimeout : 120 UdpInboundRefresh : False ipconfig Ethernet adapter br-nat: Connection-specific DNS Suffix . : IPv4 Address. . . . . . . . . . . : 10.250.248.1 Subnet Mask . . . . . . . . . . . : 255.255.252.0 Default Gateway . . . . . . . . . : Unfortunately I was not able to reproduce it without OVN, which makes it a bit complicated as the Cloudbase driver is completely outdated and we are now building our own OVN/OVS. So the easiest method is to use eryph with our driver for setup and then use your own build: I have prepared a VM in eryph (we call them catlets in eryph) that configures itself as a Hyper-V host with eryph. Please follow these steps to build the VM 1. get eryph (https://www.eryph.io/docs#installation) I will send you a private message with a invitation code that you need to install. 2. run in Powershell as admin: iwr https://gist.githubusercontent.com/fw2568/78b421b416b5c8d087761b41117957ea/raw/7c21366f858c17841d654c3d7eb8de25bfce94e8/ovsdriver-test.yaml | New-Catlet | Start-Catlet -Force You will be asked for email and invitation code, same as for the initial installation. (check the gist, if you are unsure, it is just the configuration for the catlet in a yaml) Configuring the VM will take some time (about 10 minutes on my host). You can login to the VM with admin / InitialPassw0rd On the VM's desktop is a script .\driver-switch.ps1 that switches between - our current signed release driver (OVS 3.3.90) - an unsigned experimental driver with OVS 3.5.0 - an unsigned experimental driver with OVS 3.5.0 without the BSOD patch. Without the patch, the host will die within seconds. If not, reboot and try again, as the driver is not always reloaded without a reboot. To test with your ovn driver you can replace the driver and ovs executables in "C:\Program Files\eryph\zero\ovspackage-3.5.0-exp-unpatched.zip" or modify the driver switch script. Eryph detects the vswitch extension by a unique extension name, so you will have to add this patch: https://gist.github.com/fw2568/556108265945fd4e9d235691e9109206 for your own build. Eryph runs OVS executables from C:\programdata\eryph\ovs\run_XX\ (XX is generated) and C:\programdata\OpenVswitch for sockets, logs and dbs. ________________________________________ Von: Frank Wagner <[email protected]> Gesendet: Mittwoch, 5. März 2025 00:42 An: Alin Serdean <[email protected]> Cc: Mike Pattrick <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]> Betreff: Re: [ovs-dev] [PATCH] windows: Fixed BSOD in kernel driver. I had to fix this in our fork for eryph - https://www.eryph.io. Eryph uses OVN with NAT to host on a internal switch as default network provider. But not sure if this is related, as I have fixed it some time ago. I will check if I can build a environment where we can reproduce the BSOD. Gesendet von Outlook für Android ________________________________________ From: Alin Serdean <[email protected]> Sent: Wednesday, March 5, 2025 12:31:27 AM To: Frank Wagner <[email protected]> Cc: Mike Pattrick <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]> Subject: Re: [ovs-dev] [PATCH] windows: Fixed BSOD in kernel driver. My 2cents on this one, this looks like a symptom if its not always reproducible. There could be another underlying issue that is corrupting the memory and triggering the aforementioned symptom. Can you please add more details on how this can be reproduced (which type of traffic, topology etc)? On Tue, Mar 4, 2025 at 10:24 PM Frank Wagner <[email protected]> wrote: At the very least, it CAN happen that no OVS key attribute is found in the keyAttr array. I don't know exactly what causes this as it is difficult to reproduce as it always ends in a BSOD on the host. But maybe Alin can explain it better. I only found this by going through the kernel debug log. What I have seen in this case is that the packet is just not sent but without OVS attributes it never had a chance to be sent. And yes, all ASSERTs in the kernel datapath are useless in the release build because they are not included in the release build. I'm not a kernel dev expert, but I would say that relying only on an assert instead of explicitly checking for NULL is always a risk. So, removing the assert would make sense I would say. ________________________________________ Von: Mike Pattrick <[email protected]> Gesendet: Dienstag, 4. März 2025 15:31 An: Frank Wagner <[email protected]> Cc: [email protected] <[email protected]> Betreff: Re: [ovs-dev] [PATCH] windows: Fixed BSOD in kernel driver. On Sat, Mar 1, 2025 at 7:35 AM Frank Wagner <[email protected]> wrote: > > It can happen that ovs key attributes are not in keyAttrs of port. > In this case the call of NlAttrGetU32 will cause a BSOD in Release builds. > > Signed-off-by: Frank Wagner <[email protected]> > > --- > datapath-windows/ovsext/User.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/datapath-windows/ovsext/User.c b/datapath-windows/ovsext/User.c > index c4563b28b..b52124abf 100644 > --- a/datapath-windows/ovsext/User.c > +++ b/datapath-windows/ovsext/User.c > @@ -407,7 +407,9 @@ _MapNlAttrToOvsPktExec(PNL_MSG_HDR nlMsgHdr, PNL_ATTR > *nlAttrs, > execute->actionsLen = NlAttrGetSize(nlAttrs[OVS_PACKET_ATTR_ACTIONS]); > > ASSERT(keyAttrs[OVS_KEY_ATTR_IN_PORT]); > - execute->inPort = NlAttrGetU32(keyAttrs[OVS_KEY_ATTR_IN_PORT]); > + if (keyAttrs[OVS_KEY_ATTR_IN_PORT]) { > + execute->inPort = NlAttrGetU32(keyAttrs[OVS_KEY_ATTR_IN_PORT]); > + } Hello Frank, Is this expected behaviour? If so then the assert above should probably be removed. What is the InPort expected to be set to when this attribute is missing? What would cause InPort to be NULL in this case? Cheers, M > execute->keyAttrs = keyAttrs; > > if (nlAttrs[OVS_PACKET_ATTR_MRU]) { > -- > 2.48.1 > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
