It is possible for the multicast consumer to call ib_sa_free_multicast() where
this leave request is queued to be later processed by the workqueue thread, and
then call ib_sa_join_multicast() which calls acquire_group() --before-- the 
leave
request was excecuted by the thread. So the lookup done by acquire_group() 
succeeds,
the code goes to the found: label and the group reference count climbs to (eg) 
2.

Yes - this is possible. Note that although the group reference count is 2, joins are tracked in different lists: active_list or pending_list. The second join doesn't move to the active_list until it's processed by the callback thread, to synchronize against errors and leaves.

Following that the leave work-element causes the thread to just dec the
reference count to 1 in release_group() and do nothing else, and the join
work-element causes the thread to return the cached address-handle attributes
to the consumer. So no sa query is being sent to the SA.

This sounds like the correct behavior.

We saw the bug on a uni processor system running the ipath driver, where the
consumer is ipoib and the group being the IPv4 broadcast. When we take down
the link of the switch port connected to the device across the cable, ipoib
rushes to leave the group and then join it. On this system the join "crosses
the leave" and the SA does not take into account the node when computing the
multicast routing of the group --> the node does not get the broadcast traffic.

Does the SA remove the node from the multicast group? If the HCA port goes down, the multicast code will transition all existing multicast groups to the error state. An error will be reported on active joins. Pending joins will be processed normally after error handling has completed.

For now we have applied a work around which causes the multicast code to
call release_group() from ib_sa_free_multicast(). The workaround is
implemented by using the patch below which causes mcast_groups_lost()
to be called also when the port actually goes up, and set the group state
to MCAST_ERROR such that the call to release_group() is not deferred (ipoib
does leave/join for every event, namely both on link down and up).

I'm wondering if the problem isn't in ipoib. When an error occurs on a multicast group, the group transitions into the error state, and the user is called back to let them know that they need to rejoin the group. Since ipoib responds directly to port events and not multicast callback errors, is there a chance ipoib missed the error notification?

In short, I'm still not sure where the problem lies.

- Sean
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to