On Thu, 2006-03-16 at 00:39 -0800, Sean Hefty wrote: > >On Wed, 2006-03-15 at 16:34 -0800, Sean Hefty wrote: > >> Tom Tucker wrote: > >> > +static inline void iwcm_deref_id(struct iwcm_id_private *cm_id_priv) > >> > +{ > >> > + if (atomic_dec_and_test(&cm_id_priv->refcount)) > >> > + kfree(cm_id_priv); > >> > +} > >> > >> I'm wary of code that does this. This can typically result in a race > >condition > >> where the user can receive a callback after destroy returns. > > > >Yep. But the alternative is a deadlock in the event thread (the event > >that removes the last reference is behind the event that blocks trying > >to destroy the id). This happens because when both peers try to > >disconnect concurrently. > > The IB CM has to deal with peer disconnects as well. If you can deadlock > because a user calls destroy from the event thread, then we need to disallow > that. (As a general rule, a user can never call destroy on an object while > in a > callback associated with that object.) This is the reason why the IB CM and > CMA > serialize all events to a single cm_id and allow returning a non-zero value > from > a callback to signal destruction of the cm_id.
The iWARP CM prevents this from happening by having a state (DESTROYING) that prevents events from being delivered after iw_destroy_cm_id has returned. This approach avoids having either the kernel application thread or the event thread blocked while waiting for the last reference to be released. Unlike the IB side, the iWARP side has orderly shutdown semantics that can delay the delivery of the CLOSE event for minutes. With this implementation, life goes on and the object simply stays around until the last reference is removed. This implementation is consistent with the way many other network objects are managed (e.g. sk_buff). > > >There is an iWARP provider requirement that CLOSE is absolutely the last > >event...I was wary too, so to test my implementation I ran 6 threads 3 > >IB, 3 iWARP on two dual CPU boxes. > > Based on a code review, I'm fairly certain that this issue exists and needs to > be fixed. A client has the potential to receive a callback after destroy has > returned, which can lead to accessing an invalid context. > Please look at the handling of events in cm_event_handler. If the state is DESTROYING, events are not queued for delivery. This handles events that are generated by the provider after iw_destroy_cm_id has returned. Then please look at cm_work_handler, the check for DESTROYING state handles events that are already queued for delivery, but were not yet delivered at the time iw_destroy_cm_id returned. I think these checks avoid the condition you are concerned about. > - Sean > _______________________________________________ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general