devel-boun...@open-mpi.org wrote: >> There are two new issues so far: >> >> 1) this has uncovered a connection migration issue in the Chelsio >> driver/firmware. We are developing and testing a fix for this now. >> Should be ready tomorrow hopefully. >> > > I have a fix for the above issue and I can continue with OMPI testing. > > To work around the client-must-send issue, I put a nice fat > sleep in the udapl btl right after it calls dat_cr_accept(), > in mca_btl_udapl_accept_connect(). This, however, exposes > another issue with the udapl btl: > > Neither the client nor the server side of the udapl btl > connection setup pre-post RECV buffers before connecting. > This can allow a SEND to arrive before a RECV buffer is > available. I _think_ IB will handle this issue by > retransmitting the SEND. Chelsio's iWARP device, however, > TERMINATEs the connection. My sleep() makes this condition > happen every time. >
A compliant DAPL program also ensures that there are adequate receive buffers in place before the remote peer Sends. It is explicitly noted that failure to follow this real will invoke a transport/device dependent penalty. It may be that the sendq will be fenced, or it may be that the connection will be terminated. So any RDMA BTL should pre-post recv buffers before initiating or accepting a connection. >> From what I can tell, the udapl btl exchanges memory info as a first > order of business after connection establishment > (mba_btl_udapl_sendrecv(). The RECV buffer post for this > exchange, however, should really be done _before_ the > dat_ep_connect() on the active side, and _before_ the > dat_cr_accept() on the server side. > Currently its done after the ESTABLISHED event is dequeued, > thus allowing the race condition. > > I believe the rules are the ULP must ensure that a RECV is > posted before the client can post a SEND for that buffer. > And further, the ULP must enforce flow control somehow so > that a SEND never arrives without a RECV buffer being available. > > Perhaps this is just a bug and I opened it up with my sleep() > > Or is the uDAPL btl assuming the transport will deal with > lack of RECV buffer at the time a SEND arrives? > No. uDAPL *allows* a provider to compensate for this through unspecified means, but the application MUST NOT rely on it (on the flip side the application MUST NOT rely on any mistake generating a fault. That's akin to relying on a state trooper pulling you over when you exceed the speed limit. It is always possible that your application has too many buffers in flight but this is never detected because the new buffers are posted before the messages actually arrive. Your not supposed to do that, but you have a good chance of getting away with it). As a general rule DAPL *never* requires a provider to check anything that the provider does not need to check on its own (other than memory access rights). So typically the provider will complain about too many buffers when it actually runs out of buffers, not when the application's end-to-end credits are theoretically negative. A "fast path" interface becomes a lot less so if every work request is validated dynamically against every relevant restriction.