Hello!
Quoting r. Roland Dreier ([EMAIL PROTECTED]) "Re: [PATCH] Re: [openib-general] 
Re: IPoIB Failure CQ overrun":
>     Michael> I know but races are always tricky, could be just a
>     Michael> timing issue.  Its just that CI doorbells are routinely
>     Michael> stressed here by QA.
> 
> The thing that really makes it hard to for to think of a potential
> driver problem is that changing from updating the CI all at once to
> updating it by 1 at a time in a loop fixes things for me.  If anything
> this lengthens the amount of time during which the CQ has too little
> space.
> 
> Also adding a 1000 extra entries to the CQ created by IPoIB -- ie
> changing the code in ipoib_verbs.c to
> 
>       priv->cq = ib_create_cq(priv->ca, ipoib_ib_completion, NULL, dev,
>                               IPOIB_TX_RING_SIZE + IPOIB_RX_RING_SIZE + 1000);
> 
> still has the same problem, so we're not just transiently overrunning
> by 1 or something like that -- it looks like we're systematically
> losing updates to the CQ.
> 
>     Michael> Let me know when you do. But why wait?  Once you close
>     Michael> the CQ, and get the command interface event of the hw2sw
>     Michael> cq, it is guaranteed you wont get any new cqes or events
>     Michael> on this cq.
> 
> OK, it's done.  The reason for the wait here is that we are actually
> cleaning up the QP and want to make sure that we don't leak any
> resources.  First we transition the QP to error, wait for all work
> requests to complete, and then transition the QP to reset.
> 
>  - Roland

But why wait for completion? Once QP is in error no new WQEs will
be processed by hardware. You can close the CQ and free all of them.

MST
_______________________________________________
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to