>But what is the issue? some kind of race?

If we look at just the ib_multicast patches as an example...

Calling ib_join_multicast allocates a struct ib_multicast that must be freed.
Here's the relevant portion of ipoib's join callback:

@@ -325,11 +328,10 @@ ipoib_mcast_sendonly_join_complete(int s
                /* Clear the busy flag so we try again */
+               status = test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY,
+                                           &mcast->flags);
        }
+       return status;
 }

The callback clears the busy flag, and frees the structure by returning a
non-zero value from the callback.  (This is convenient for error handling.)  Let
the callback thread hang around right at the return statement for a while.

When ipoib is unloaded, one of the calls it makes during cleanup is
ipoib_mcast_leave(), which does:

        if(test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags))
                ib_free_multicast(mcast->mc);

If ipoib_mcast_leave() is called at the same time that an error is reported
through the callback, it's possible that the struct ib_multicast will be freed
by the callback thread.  But there's nothing to prevent the callback thread from
executing in the ipoib code after unload has occurred.

Similar issues can apply to ib_cm and rdma_cm.

- Sean

_______________________________________________
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to