From: Wilson Peng <pweis...@vmware.com>

v2-v3 change: Remove the unneeded sanity check and just correct 3 failure
for function ObReferenceObjectByHandle, zoneInfo allocated failed and OvsNatInit
is failed on OvsInitConntrack.

While deploying Tanzu Kubernetes(Antrea based solution) in Broadcom customer,
Sometimes it is found that the kernel thread OvsConntrackEntryCleaner is not
Started after the Windows node is rebooted on unexpected condition.  It could
Be also observed a similar issue in local Antrea setup via 
Clean-AntreaNetwork.ps1
Which will Remove-VMSwitch and re-create it on Windows node.

After checking the local conntrack dump, OVS doesn’t remove the connection
Entries even though the time is overdue, we could find the connection entries
Created several hours ago in the dump, within a state (TIME_WAIT) that was
Supposed to be deleted earlier. At that time, the count of the existing entries
In the OVS conntrack zone is far from the up limit, the actual number is 19k.
Then we tried to flush the conntrack with CMD "ovs-dpctl.exe flush-conntrack"
And all the conntrack entries could be removed.

In this patch, it is correcting 3 possible wrong failure processing.The 1st is 
for
Failed call ObReferenceObjectByHandle and the 2nd one is when zoneInfo is 
allocated
Failed it should also stop the cleanup thread firstly. The 3rd one failure is 
the
Possible wrong return value when OvsNatInit is failed to call on 
OvsInitConntrack.

Antrea team does help do the regression test with build including the patch
And it could PASS the testing. And it is not find the Connectract not timeout
Essue with same reproducing condition.

It is good to backport the fix to main and backported until 2.17.

Signed-off-by: Wilson Peng <pweis...@vmware.com>
---
 datapath-windows/ovsext/Conntrack-nat.c |  2 +-
 datapath-windows/ovsext/Conntrack.c     | 24 ++++++++++++++++++++----
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/datapath-windows/ovsext/Conntrack-nat.c 
b/datapath-windows/ovsext/Conntrack-nat.c
index 497354ec8..a222f6aea 100644
--- a/datapath-windows/ovsext/Conntrack-nat.c
+++ b/datapath-windows/ovsext/Conntrack-nat.c
@@ -108,6 +108,7 @@ NTSTATUS OvsNatInit()
         OVS_CT_POOL_TAG);
     if (ovsUnNatTable == NULL) {
         OvsFreeMemoryWithTag(ovsNatTable, OVS_CT_POOL_TAG);
+        ovsNatTable = NULL;
         return STATUS_INSUFFICIENT_RESOURCES;
     }
 
@@ -157,7 +158,6 @@ VOID OvsNatFlush(UINT16 zone)
 VOID OvsNatCleanup()
 {
     if (ovsNatTable == NULL) {
-       NdisFreeSpinLock(&ovsCtNatLock);
        return;
     }
 
diff --git a/datapath-windows/ovsext/Conntrack.c 
b/datapath-windows/ovsext/Conntrack.c
index 39ba5cc10..156861d6c 100644
--- a/datapath-windows/ovsext/Conntrack.c
+++ b/datapath-windows/ovsext/Conntrack.c
@@ -94,16 +94,28 @@ OvsInitConntrack(POVS_SWITCH_CONTEXT context)
     if (status != STATUS_SUCCESS) {
         goto freeBucketLock;
     }
-
-    ObReferenceObjectByHandle(threadHandle, SYNCHRONIZE, NULL, KernelMode,
-                              &ctThreadCtx.threadObject, NULL);
+    ctThreadCtx.exit = 0;
+    status = ObReferenceObjectByHandle(threadHandle, SYNCHRONIZE, NULL, 
KernelMode,
+                                       &ctThreadCtx.threadObject, NULL);
     ZwClose(threadHandle);
     threadHandle = NULL;
 
+    if (!NT_SUCCESS(status)) {
+        ctThreadCtx.exit = 1;
+        KeSetEvent(&ctThreadCtx.event, 0, FALSE);
+        KeWaitForSingleObject(ctThreadCtx.threadObject, Executive,
+                               KernelMode, FALSE, NULL);
+        goto freeBucketLock;
+    }
     zoneInfo = OvsAllocateMemoryWithTag(sizeof(OVS_CT_ZONE_INFO) *
                                         CT_MAX_ZONE, OVS_CT_POOL_TAG);
     if (zoneInfo == NULL) {
         status = STATUS_INSUFFICIENT_RESOURCES;
+        ctThreadCtx.exit = 1;
+        KeSetEvent(&ctThreadCtx.event, 0, FALSE);
+        KeWaitForSingleObject(ctThreadCtx.threadObject, Executive,
+                               KernelMode, FALSE, NULL);
+        ObDereferenceObject(ctThreadCtx.threadObject);
         goto freeBucketLock;
     }
 
@@ -119,7 +131,7 @@ OvsInitConntrack(POVS_SWITCH_CONTEXT context)
     if (status != STATUS_SUCCESS) {
         OvsCleanupConntrack();
     }
-    return STATUS_SUCCESS;
+    return status;
 
 freeBucketLock:
     for (UINT32 i = 0; i < numBucketLocks; i++) {
@@ -172,6 +184,7 @@ OvsCleanupConntrack(VOID)
     NdisFreeSpinLock(&ovsCtZoneLock);
     if (zoneInfo) {
         OvsFreeMemoryWithTag(zoneInfo, OVS_CT_POOL_TAG);
+        zoneInfo = NULL;
     }
 }
 
@@ -1520,6 +1533,8 @@ OvsConntrackEntryCleaner(PVOID data)
     LOCK_STATE_EX lockState;
     BOOLEAN success = TRUE;
 
+    OVS_LOG_INFO("Start the OVS ConntrackEntry Cleaner system thread,"
+                 " context: %p", context);
     while (success) {
         if (context->exit) {
             break;
@@ -1541,6 +1556,7 @@ OvsConntrackEntryCleaner(PVOID data)
         KeWaitForSingleObject(&context->event, Executive, KernelMode,
                               FALSE, (LARGE_INTEGER *)&threadSleepTimeout);
     }
+    OVS_LOG_INFO("Terminate the OVS ConntrackEntry Cleaner system thread");
 
     PsTerminateSystemThread(STATUS_SUCCESS);
 }
-- 
2.39.2 (Apple Git-143)

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to