On Sun, May 31, 2009 at 09:41:54AM +0300, Or Gerlitz wrote: > [email protected] wrote @ > http://lists.openfabrics.org/pipermail/general/2009-May/059730.html > > What would prevent a race between a tx completion (with an > > error) and the cleanup of a neighbour? > > Okay, so maybe this code/design of using the stashed ipoib_neighbour at the tx > completion code is the root cause of all these troubles?! > > >From a quick look on the code and two patches that touched this area > >(f56bcd801... "Use separate CQ for UD send completions" and 57ce41d1... "Fix > >transmit queue stalling forever") - I see that the original tx cq handler - > >ipoib_ib_handle_tx_wc() doesn't touch the neigbour but today is called only > >from the drain timer & dev-stop flows. Now, ipoib_cm_handle_tx_wc() is > >called for "normal" flow both for datagram and connected modes, and this > >function touches he neighbour.
Or, I don't follow on you - ipoib_cm_handle_tx_wc() called ipoib_neigh_free() from the first commit. Also please note the following designation of CQs: recv_cq: used for all receives and for CM send send_cq: used for UD send Thus, since in ipoib_poll() we poll "recv_cq", any none receive must be that of CM mode sends. > > I am not sure why commit f56bcd801... made UD completions to go through > ipoib_cm_handle_tx_wc() nor why this function must use the neighbor to access > the data-structure it needs to, maybe Eli can comment on that? > > Or. > > _______________________________________________ > general mailing list > [email protected] > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
