Hi Alin,

it can be reproduced by adding a nat on the adapter that the switch is bound to 
(Add-NetNat commend in windows) and then using OVN on top of it.  It is the 
same issue we discussed 6 years ago, here was a gist to reproduce it with the 
Cloudbase driver:

https://gist.github.com/fw2568/b5fcbddebb83b2a7ce428a71704d5675#file-ovn_test_nat-ps1-L12.

ovs-vsctl show:
   Bridge br-nat
        Port br-nat
            Interface br-nat
                type: internal
        Port 
patch-SN-externalNet-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e-default-default-br-nat-to-br-int
            Interface 
patch-SN-externalNet-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e-default-default-br-nat-to-br-int
                type: patch
                options: 
{peer=patch-br-int-to-SN-externalNet-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e-default-default-br-nat}
    Bridge br-int
        fail_mode: secure
        datapath_type: system
        Port br-int
            Interface br-int
                type: internal
        Port 
patch-br-int-to-SN-externalNet-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e-default-default-br-nat
            Interface 
patch-br-int-to-SN-externalNet-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e-default-default-br-nat
                type: patch
                options: 
{peer=patch-SN-externalNet-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e-default-default-br-nat-to-br-int}

ovn-nbctl show:

.\ovn-nbctl.exe show
switch 9fcfdbb0-64cb-470e-a6a5-3241b716a2ec 
(52bad498-df5c-43c1-b690-39e83b0e2bee)
    port 
SR-52bad498-df5c-43c1-b690-39e83b0e2bee-project-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e
        type: router
        router-port: 
RS-project-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e-52bad498-df5c-43c1-b690-39e83b0e2bee
switch f56b8d25-a2a7-4bd2-a9c1-09953b8b62ee 
(externalNet-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e-default)
    port 
SR-externalNet-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e-default-project-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e
        type: router
        router-port: 
RS-project-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e-externalNet-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e-default
    port 
SN-externalNet-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e-default-default-br-nat
        type: localnet
        addresses: ["unknown"]
router 27b4ccd2-e97d-4437-81df-30f9478c1948 
(project-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e)
    port 
RS-project-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e-externalNet-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e-default
        mac: "d2:ab:32:33:98:70"
        ipv6-lla: "fe80::d0ab:32ff:fe33:9870"
        networks: ["10.250.248.10/22"]
    port 
RS-project-4b4a3fcf-b5ed-4a9a-ab6e-03852752095e-52bad498-df5c-43c1-b690-39e83b0e2bee
        mac: "d2:ab:7b:78:1f:89"
        ipv6-lla: "fe80::d0ab:7bff:fe78:1f89"
        networks: ["10.0.0.1/20"]
    nat 7109d3b8-ac32-4f11-8da9-bb289af5e2d6
        external ip: "10.250.248.10"
        logical ip: "10.0.0.0/20"
        type: "snat"

get-netnat

Name                             : eryph_default_default
ExternalIPInterfaceAddressPrefix :
InternalIPInterfaceAddressPrefix : 10.250.248.0/22
IcmpQueryTimeout                 : 30
TcpEstablishedConnectionTimeout  : 1800
TcpTransientConnectionTimeout    : 120
TcpFilteringBehavior             : AddressDependentFiltering
UdpFilteringBehavior             : AddressDependentFiltering
UdpIdleSessionTimeout            : 120
UdpInboundRefresh                : False

ipconfig

Ethernet adapter br-nat:

   Connection-specific DNS Suffix  . :
   IPv4 Address. . . . . . . . . . . : 10.250.248.1
   Subnet Mask . . . . . . . . . . . : 255.255.252.0
   Default Gateway . . . . . . . . . :

Unfortunately I was not able to reproduce it without OVN, which makes it a bit 
complicated as the Cloudbase driver is completely outdated and we are now 
building our own OVN/OVS. 

So the easiest method is to use eryph with our driver for setup and then use 
your own build:

I have prepared a VM in eryph (we call them catlets in eryph) that configures 
itself as a Hyper-V host with eryph. 

Please follow these steps to build the VM

1. get eryph (https://www.eryph.io/docs#installation)
I will send you a private message with a invitation code that you need to 
install. 

2. run in Powershell as admin: 
iwr 
https://gist.githubusercontent.com/fw2568/78b421b416b5c8d087761b41117957ea/raw/7c21366f858c17841d654c3d7eb8de25bfce94e8/ovsdriver-test.yaml
 | New-Catlet | Start-Catlet -Force 

You will be asked for email and invitation code, same as for the initial 
installation. 

(check the gist, if you are unsure, it is just the configuration for the catlet 
in a yaml)

Configuring the VM will take some time (about 10 minutes on my host). 
You can login to the VM with admin / InitialPassw0rd

On the VM's desktop is a script .\driver-switch.ps1 that switches between 
- our current signed release driver (OVS 3.3.90)
- an unsigned experimental driver with OVS 3.5.0
- an unsigned experimental driver with OVS 3.5.0 without the BSOD patch.

Without the patch, the host will die within seconds. If not, reboot and try 
again, as the driver is not always reloaded without a reboot. 

To test with your ovn driver you can replace the driver and ovs executables in 
"C:\Program Files\eryph\zero\ovspackage-3.5.0-exp-unpatched.zip" or modify the 
driver switch script. 
Eryph detects the vswitch extension by a unique extension name, so you will 
have to add this patch: 
https://gist.github.com/fw2568/556108265945fd4e9d235691e9109206 for your own 
build. 

Eryph runs OVS executables from C:\programdata\eryph\ovs\run_XX\ (XX is 
generated) and C:\programdata\OpenVswitch for sockets, logs and dbs.

________________________________________
Von: Frank Wagner <[email protected]>
Gesendet: Mittwoch, 5. März 2025 00:42
An: Alin Serdean <[email protected]>
Cc: Mike Pattrick <[email protected]>; [email protected] 
<[email protected]>; [email protected] <[email protected]>
Betreff: Re: [ovs-dev] [PATCH] windows: Fixed BSOD in kernel driver.
 
I had to fix this in our fork for eryph - https://www.eryph.io. 
Eryph uses OVN with NAT to host on a internal switch as default network 
provider. But not sure if this is related, as I have fixed it some time ago. 
I will check if I can build a environment where we can reproduce the BSOD.

Gesendet von Outlook für Android
________________________________________
From: Alin Serdean <[email protected]>
Sent: Wednesday, March 5, 2025 12:31:27 AM
To: Frank Wagner <[email protected]>
Cc: Mike Pattrick <[email protected]>; [email protected] 
<[email protected]>; [email protected] <[email protected]>
Subject: Re: [ovs-dev] [PATCH] windows: Fixed BSOD in kernel driver.
 
My 2cents on this one, this looks like a symptom if its not always 
reproducible. There could be another underlying issue that is corrupting the 
memory and triggering the aforementioned symptom.

Can you please add more details on how this can be reproduced (which type of 
traffic, topology etc)?

On Tue, Mar 4, 2025 at 10:24 PM Frank Wagner <[email protected]> wrote:
At the very least, it CAN happen that no OVS key attribute is found in the 
keyAttr array. I don't know exactly what causes this as it is difficult to 
reproduce as it always ends in a BSOD on the host. But maybe Alin can explain 
it better. 

I only found this by going through the kernel debug log. What I have seen in 
this case is that the packet is just not sent but without OVS attributes it 
never had a chance to be sent. 

And yes, all ASSERTs in the kernel datapath are useless in the release build 
because they are not included in the release build. I'm not a kernel dev 
expert, but I would say that relying only on an assert instead of explicitly 
checking for NULL is always a risk. So, removing the assert would make sense I 
would say. 
________________________________________
Von: Mike Pattrick <[email protected]>
Gesendet: Dienstag, 4. März 2025 15:31
An: Frank Wagner <[email protected]>
Cc: [email protected] <[email protected]>
Betreff: Re: [ovs-dev] [PATCH] windows: Fixed BSOD in kernel driver.
 
On Sat, Mar 1, 2025 at 7:35 AM Frank Wagner <[email protected]> wrote:
>
> It can happen that ovs key attributes are not in keyAttrs of port.
> In this case the call of NlAttrGetU32 will cause a BSOD in Release builds.
>
> Signed-off-by: Frank Wagner <[email protected]>
>
> ---
>  datapath-windows/ovsext/User.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/datapath-windows/ovsext/User.c b/datapath-windows/ovsext/User.c
> index c4563b28b..b52124abf 100644
> --- a/datapath-windows/ovsext/User.c
> +++ b/datapath-windows/ovsext/User.c
> @@ -407,7 +407,9 @@ _MapNlAttrToOvsPktExec(PNL_MSG_HDR nlMsgHdr, PNL_ATTR 
> *nlAttrs,
>      execute->actionsLen = NlAttrGetSize(nlAttrs[OVS_PACKET_ATTR_ACTIONS]);
>
>      ASSERT(keyAttrs[OVS_KEY_ATTR_IN_PORT]);
> -    execute->inPort = NlAttrGetU32(keyAttrs[OVS_KEY_ATTR_IN_PORT]);
> +    if (keyAttrs[OVS_KEY_ATTR_IN_PORT]) {
> +        execute->inPort = NlAttrGetU32(keyAttrs[OVS_KEY_ATTR_IN_PORT]);
> +    }

Hello Frank,

Is this expected behaviour? If so then the assert above should
probably be removed. What is the InPort expected to be set to when
this attribute is missing?

What would cause InPort to be NULL in this case?

Cheers,
M

>      execute->keyAttrs = keyAttrs;
>
>      if (nlAttrs[OVS_PACKET_ATTR_MRU]) {
> --
> 2.48.1
>
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to