Hi all,

Please hold review of this patch for now.

While validating this approach, I found a race condition in ENABLE/DISABLE 
forwarding behavior. In the DISABLE path, a delayed forwarded secondary (S2) 
can crash if requester-side teardown in S1 (the secondary that initiates 
DISABLE) proceeds first and shared resources are released.

I reproduced this with dpdk-dumpcap by injecting delay on S2. In my setup, it 
appears around the 5-second timeout mark (similar to MP_TIMEOUT_S), but the 
issue is about ordering and lifecycle guarantees, not a specific delay value or 
application.

The same teardown-safety risk can also exist in the original behavior if a 
secondary handles DISABLE late enough that the control plane fails or times 
out, and requester-side teardown still proceeds.

The root issue is:
- If S2 is slow/unresponsive on DISABLE, requester-side DISABLE can 
fail/timeout.
- If S1 app ignores that failure and frees shared capture resources anyway, S2 
may still touch stale pointers and crash.
- So the root issue is teardown safety after failed/partial DISABLE completion, 
not only async forwarding itself.

I am pausing this patch to investigate a cleaner lifecycle fix for the DISABLE 
path. I will send a v2 after a more robust solution is verified.

Suggestions and feedback are very welcome.

Best regards,
Pushpendra

Reply via email to