>It would make some sense to keep this API separate if the mcast module would >only be targeted at userspace users - kernel consumers likely >do not share mcast groups, so they could avoid the overhead. >But since this patch seems to move all kernels users to the new API to - >why does this need to be a separate module at all?
I view the ib_sa API as simply sending the MAD to the SA and routing back the response. That functionality is needed, and is distinct enough, that I would keep it separate from the multicast module. Kernel consumers can only avoid the overhead if we never allow a userspace application to share the same multicast group. I don't think that we need to do that, as long as the app has the right permissions. The rdma_cm also uses the cached copy of the ipoib multicast group when forming new groups. >And I'm worried about the extensive use of atomic operations >this patch introduces - both performance and race-condition-wise. >Can't we stick to simple locking? Replacing completion with a single >BUSY bit looks especially scary. I reworked the locking for this 3-4 times before I came up with something that I felt was simple, yet could still do the job. The difficulty is that multiple threads can try to join the same group at once, each with different join parameters, while existing members leave it. Or a user can leave a group before their initial join request even completes. The latter is a problem even for a single user, such as ipoib, which it doesn't handle. I really don't think the performance of the code is as much an issue versus the time required to configure the fabric. >> - mutex_lock(&mcast_mutex); >> + /* Clear the busy flag so we try again */ >> + status = test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags); >> >> + mutex_lock(&mcast_mutex); >> spin_lock_irq(&priv->lock); >> - mcast->query = NULL; >> - >> - if (test_bit(IPOIB_MCAST_RUN, &priv->flags)) { >> - if (status == -ETIMEDOUT) >> - queue_work(ipoib_workqueue, &priv->mcast_task); >> - else >> - queue_delayed_work(ipoib_workqueue, &priv->mcast_task, >> - mcast->backoff * HZ); >> - } else >> - complete(&mcast->done); >> + if (test_bit(IPOIB_MCAST_RUN, &priv->flags)) >> + queue_delayed_work(ipoib_workqueue, &priv->mcast_task, >> + mcast->backoff * HZ); >> spin_unlock_irq(&priv->lock); >> mutex_unlock(&mcast_mutex); >> >> - return; >> + return status; >> } > >We used to do complete last thing on mcast object, now you are >touching the object after IPOIB_MCAST_FLAG_BUSY is cleared, apparently. The patch removes the need to check both mcast->query and a flag to determine state. It only uses the flag. Can you clarify what issue you see with the above code? >What prevents us from exiting while callbacks are in progress? >Basically same applies wherever we used to call wait_for_mcast_join. ib_free_multicast() blocks while any callbacks are running. >> +struct ib_multicast *ib_join_multicast(struct ib_device *device, u8 >port_num, >> + struct ib_sa_mcmember_rec *rec, >> + ib_sa_comp_mask comp_mask, gfp_t gfp_mask, >> + int (*callback)(int status, >> + struct ib_multicast >> + *multicast), >> + void *context); >> + > >Is this re-introducing module unload races we had with sa all over again? The call returns a structure that must be freed. If the structure is freed by returning a non-zero call to the callback, then we have the same problem that ib_cm and rdma_cm have. Not allowing a return value from the callback is an easy fix for that though. >I also started to wander why do we need a new API for this at all? >Can't the sa module be fixed to refcount the mcast joins properly for us, >with minor or no API changes? The difference is that this allows the free to match up with exactly one join request. This will be needed for userspace support. Additionally, the callback remains active beyond the initial join, so that the multicast module can notify the user when an errors occur on the multicast group that requires re-joining. - Sean _______________________________________________ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general