> -----Original Message----- > From: Or Gerlitz; openib-general > > In most cases, I would expect that the IB CM will eventually receive the > RTU, > > which will generate an event to the RDMA CM to transition the QP into > RTS. > > But we want an IB stack and set of ULPs which would work in production so > they > need to handle also irregular cases... eg when the RTU is lost over and > over.
Agreed. The missing RTU case must be handled for a few reasons: 1. The RTU could honestly be lost (GSI QPs are UD, they could overflow, fabric could loose the packet, etc) 2. The RC send could beat the processing of the RTU (packets on wire may be out of order if there are different SLs/VLs involved with GSI vs application QP). Also its possible the CM is slower getting to its queue of packets (such as when bombarded by many connections) while application/ULP gets its RC send quickly. [I have observed this situation in various real world stress tests]. This problem is quite simple to handle (I did it a few years ago in the SilverStorm stack) and the IB spec completely covers this issue: CM - have a hook so the CM can get the Async Events for all CAs. On getting the Async Event for packet first packet received while in RTR (Communication established), the CM should treat this exactly like an RTU (with no private data). The CM will need to cross reference the CA/QP this event was reported for to identify the applicable connection endpoint. If you check the IBTA spec and the CM state machines you will see the CM is supposed to handle this event. Also if the RTU does arrive later, the CM state machine also handles that correctly by discarding the RTU as if it was a duplicate. Note: this is why applications should not depend on private data in the RTU. ULPs - all ULPs should be written so they are fully ready to process inbound data before they tell the CM to send the REP. It is very likely the ULP will get a CQ completion for the inbound RQ data before the CM has completed its processing. In general IB allows for this situation quite nicely. The ULP can process the inbound data normally and queue it to the Send Q. Putting data on a Send Q is permitted in RTR, but the QP will not initiate sending until moved to RTS. As such the ULP can allow the Cm RTU processing (which will race with the RQ data completion) do its normal thing and move the QP to RTS. Todd Rimmer _______________________________________________ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general