>But what is the issue? some kind of race? If we look at just the ib_multicast patches as an example...
Calling ib_join_multicast allocates a struct ib_multicast that must be freed. Here's the relevant portion of ipoib's join callback: @@ -325,11 +328,10 @@ ipoib_mcast_sendonly_join_complete(int s /* Clear the busy flag so we try again */ + status = test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, + &mcast->flags); } + return status; } The callback clears the busy flag, and frees the structure by returning a non-zero value from the callback. (This is convenient for error handling.) Let the callback thread hang around right at the return statement for a while. When ipoib is unloaded, one of the calls it makes during cleanup is ipoib_mcast_leave(), which does: if(test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags)) ib_free_multicast(mcast->mc); If ipoib_mcast_leave() is called at the same time that an error is reported through the callback, it's possible that the struct ib_multicast will be freed by the callback thread. But there's nothing to prevent the callback thread from executing in the ipoib code after unload has occurred. Similar issues can apply to ib_cm and rdma_cm. - Sean _______________________________________________ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general