> > --- > > This is is a rather complex set of state transitions so admit to relying on > AI as > backup for tracking this. It still sees some errors here. > > Worth asking the question, "what does mlx5 do?" and "should DPDK EAL be > doing this at the PCI layer instead?"
MLX5 doesn't have this device service model and will not attempt device recovery, it just quits and tells the app the device is removed via sending out RTE_ETH_EVENT_INTR_RMV. This service event has very limited visibility over the PCI, as the device is servicing itself while keeping all the PCI endpoints alive. PCI device persists during the reset. Only the DMA queues need to be recreated. I think processing the service events entirely in the mana driver is the right thing to do. > > --- > > Much better - this addresses the RCU, the macros, the thread-safety-analysis > suppressions, and the callback-under-lock deadlock. The single-variable > burst_state CAS is a clean way to do the drain and the acquire/release > reasoning checks out. > > One structural thing remains. The enter phase still runs the heavy teardown on > the EAL interrupt thread under reset_ops_lock: dev_stop, then > mana_mp_req_on_rxtx(RESET_ENTER) which is a blocking > rte_mp_request_sync with a multi-second timeout, then dev_close with its ibv > calls. You already moved the exit phase to a control thread because > intr_callback_unregister cannot run on the interrupt thread; the same > argument applies to blocking IPC and verbs teardown. A slow or absent > secondary will stall the interrupt thread for the MP timeout, and this is the > blocking-under-a-sleeping-mutex pattern. Please have the interrupt handler > just set state and drain, then hand the rest of the teardown to the control > thread. That also removes the last lock hand-off between functions/threads, > so each function can own its lock. Yes, we should move all the blocking calls from the system interrupt thread to the MANA control thread. > > Smaller points: > - RECOVERY_SUCCESS/FAILED are emitted from the reset thread. If the > callback > calls dev_stop/dev_close, mana_join_reset_thread() joins the current thread > (EDEADLK, leaked handle). INTR_RMV is fine since it runs on the dev-event > thread. > - The burst_state comment says bits 1+ encode device state, but only > RESET_ENTER<<1 is ever stored - it is effectively a single "blocked" flag.

