Hello! Quoting r. Roland Dreier ([EMAIL PROTECTED]) "Re: [PATCH] Re: [openib-general] Re: IPoIB Failure CQ overrun": > Michael> I know but races are always tricky, could be just a > Michael> timing issue. Its just that CI doorbells are routinely > Michael> stressed here by QA. > > The thing that really makes it hard to for to think of a potential > driver problem is that changing from updating the CI all at once to > updating it by 1 at a time in a loop fixes things for me. If anything > this lengthens the amount of time during which the CQ has too little > space. > > Also adding a 1000 extra entries to the CQ created by IPoIB -- ie > changing the code in ipoib_verbs.c to > > priv->cq = ib_create_cq(priv->ca, ipoib_ib_completion, NULL, dev, > IPOIB_TX_RING_SIZE + IPOIB_RX_RING_SIZE + 1000); > > still has the same problem, so we're not just transiently overrunning > by 1 or something like that -- it looks like we're systematically > losing updates to the CQ. > > Michael> Let me know when you do. But why wait? Once you close > Michael> the CQ, and get the command interface event of the hw2sw > Michael> cq, it is guaranteed you wont get any new cqes or events > Michael> on this cq. > > OK, it's done. The reason for the wait here is that we are actually > cleaning up the QP and want to make sure that we don't leak any > resources. First we transition the QP to error, wait for all work > requests to complete, and then transition the QP to reset. > > - Roland
But why wait for completion? Once QP is in error no new WQEs will be processed by hardware. You can close the CQ and free all of them. MST _______________________________________________ openib-general mailing list [EMAIL PROTECTED] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
