James> The device could still be used after it's gone. For James> example:
James> - the user is configuring SRP via sysfs. The thread in James> srp_create_target() has just called ib_sa_path_rec_get() James> [srp.c line 1209] and is waiting for the path record query James> to complete in wait_for_completion() - the SA callback, James> srp_path_rec_completion(), is called. This callback thread James> will make several verb calls (ib_create_cq, James> ib_req_notify_cq, ib_create_qp, ...) without any James> coordination with the hotplug device removal callback, James> srp_remove_one I don't think this can happen. How could srp_remove_one get past wait_for_completion(&host->released); if the sysfs file is still in use? James> Notice that if the SA client's hotplug removal function, James> ib_sa_remove_one(), ensured that all callbacks had James> completed before returning the problem would be fixed. This James> would protect all ULPs from having to deal with hotplug James> races in their SA callback function. The fix belongs in the James> SA client (the core stack), not in SRP. All SA client callbacks are driven by the MAD layer. And ib_sa_remove_one() does ib_unregister_mad_agent(), which should wait for all callbacks to finish. So I think we already do the best we can here. Unfortunately the SA client code must clean up after all the ULPs that depend on it, because ULPs can use the SA up until they know the device is gone. But I don't see a way around that. James> All the ULPs are deficient with respect to their hotplug James> synchronization. Given that there is a common problem, James> doesn't it make sense to try and solve it in a generic way James> instead of in each ULP? Yes, but what is the generic way? - R. _______________________________________________ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general