Hi,
I've been debugging an IB kernel driver and see that sometimes we get a stuck send operation. I believe what happens is the send actually happens but we don't get a CQ completion callback. I've been trying to track down the CORRECT programming semantics for CQ polling and rearming. Looking in the Windows IB stack, I see in same cases were in the completion callback routine, the CQ is rearmed BEFORE the CQ entries are polled (like in the base mad processing code). In other places (like the IPoIB driver) I see where it polls first, in a loop until no CQ entries are returned, and then it rearms the CQ. I also found a document from 2003 from Intel called the IB verb implementers guide (at infiniband.sourceforge.net/HWDrivers/HCA_DDK/VIG_SF.pdf), and it very clearly states in section 8.3 you need to use what look like edge triggered interrupt semantics to handle the race condition of polling and rearming the CQ. Assuming the Intel document is correct, then the IB stack may be getting stuck completions on occasion. Can anybody give a definite answer if the CQ trigger has edge or level semantics, and what I need to do to assure CQ entries are always processed without a delay? The docs for ib_rearm_cq seem to say something different than the docs for ib_rearm_n_cq, so the docs aren't much help either. - Jan
_______________________________________________ ofw mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
