However I guess NFS/RDMA is behind the RDMA CM, which is supposed to
handle device removal.  In that code it seems to end up in
cma_process_remove(), which appears at first glance to do the right
things to destroy all connections etc.


Function cma_process_remove() calls cma_remove_id_dev() for each cm_id bound to the device being removed. Function cma_remove_id_dev() calls the event handler function for each cm_id and passes a RDMA_CM_EVENT_DEVICE_REMOVAL event. The NFSRDMA server marks the RPC transport as XPT_CLOSE, but doesn't immediately destroy the cm_id in the event handler function. This is in net/sunrpc/xprtrdma/svc_rdma_transport.c / rdma_cma_handler(). That's the issue methinks. Each RDMA kernel user must destroy all the resources in the event handler function itself. These cannot be scheduled or deferred in any way given the current design.


Steve.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to