I see, however, there is no second join here, its a leave and join where the group refcount climbs to 2 since the the join code inc it on its synchronous part which is executed before the thread handles the processing of the leave request.
The refcount is only used to ensure that the group structure continues to exist. The code must be able to handle multiple users calling join/free at the same time, including a single user calling free before its previous call to join has completed. All MADs sent for the same multicast group must also be serialized to prevent join and leave requests for the same group from reaching the SA out of order.
If you walk through ib_sa_free_multicast(), the group membership is decremented. A reference is held on the group because a work item has just been queued on the group for processing. We cannot remove this reference unless we avoid queuing the work item. And the work item is queued to ensure that the leave request to the SA is serialized with possible future join requests.
I am not sure this is what we want from the core design. Say the consumer has some flexibility in the join request (eg through future api change), such that they can join a group, leave it, then join again this group with different "attributes". Then if the join crosses the leave in a way that causes the core code not to issue sa leave/join queries, its a bug from the perspective of the user.
If the attributes from a subsequent join differ from an existing join, the subsequent join operation will fail. The only way I can think of to make this situation work is to add an asynchronous ib_sa_leave_multicast() routine that provides a callback after the leave completes, in addition to the existing free call.
This could be a fairly difficult case to make work anyway, since it requires destroying the group at the SA before it can be re-created with the different attributes. It requires coordination across the group that's beyond the control of the local multicast module. (A single group creator could handle this fairly easily.)
OK, on this specific host system there was no port down event! so the only event that the multicast and ipoib code got was port active. This is why the patch I sent solves (hides) the problem, it causes the multicast code to transition the group into the error state, so the ipoib join that follows causes an sa join query to be actually sent.
There were two port active events delivered back to back with no other events in between? If so, is this something that can or should occur? The patch itself looks fine to me; I'm trying to determine if there are other refcount problems in the multicast module. I'm not convinced that there are at this point.
I don't think there's a problem in ipoib, it just does not rely on multicast error notifications but rather on port events. Do you think its less robust, and if yes, why?
As long as the multicast module gets the event notification first, which I believe is the case, then I don't think there's any problems.
- Sean _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
