On Sun, 2007-05-13 at 21:26 -0400, Donald Kerr wrote: > > Caitlin Bestler wrote: > > >Donal Kerr wrote: > > > > > > > >>>>order of business after connection establishment > >>>>(mba_btl_udapl_sendrecv(). The RECV buffer post for this exchange, > >>>>however, should really be done _before_ the > >>>>dat_ep_connect() on the active side, and _before_ the > >>>>dat_cr_accept() on the server side. > >>>>Currently its done after the ESTABLISHED event is dequeued, thus > >>>>allowing the race condition. > >>>> > >>>>I believe the rules are the ULP must ensure that a RECV is posted > >>>>before the client can post a SEND for that buffer. > >>>>And further, the ULP must enforce flow control somehow so that a > >>>>SEND never arrives without a RECV buffer being available. > >>>> > >>>> > >>>> > >>>> > >>maybe this is a rule iwarp imposes on its ULPs but not uDAPL. > >> > >> > >> > > > >It is most assuredly a rule for uDAPL. And it is not a matter > >of iWARP "imposing" on uDAPL. uDAPL was explicitly designed > >to support IB, iWARP and VI. To do that DAPL documents its > >model of what RDMA is. > > > > > (sorry I was off the grid for a couple of days) > Not to beat a dead horse but you would have to show me where in the Spec > it says I must post a recv before a send. And thinking about it some I > don't believe there is a race condition because this is not called out > as such. Now if posting the handshake recv before the connect call > speeds things up and helps the iwarp scenario I am all for it. > > >This issue is in fact one that is truly fundamental to the > >efficiency of RDMA -- the transport layer DOES NOT provide > >buffering. That's the application's job. It is precisely > >because the application layer does a better job that RDMA > >can achieve better performance at high bandwidth. > > > >For reasons that have been discussed in more depth in the > >RDMA applicability statement and in RDDP/IPS discussions > >on iSER, the absence of transport layer buffer throttling > >places the onus for end-to-end pacing on the application. > >It is a situation somewhat akin to a car with a broken > >spedometer that had previously only driven during rush > >hour bumper-to-bumper traffic. The fact that the spedometer > >was broken was irrelevant. But if that same car hits the > >open road the driver will need to come up with some method > >of regulating their speed. > > > >The DAPL semantics are very clear that send/recv operations must > >be matched one to one, that the receive buffer must be large > >enough for the received message and that there must be a receive > >buffer for each incoming send/recv message. That means that > >the sender needs to have some basis for believing that the > >RECV has been posted. Usually this is an explicit credit > >that is decremented per message and incremented per response. > > > > > Matching one to one sure, still does not say a recv must be posted > before a send. Flow control is handled by the BTL. > > >What DAPL does not state is if the transport does explicit flow > >control so that the sending application's work request is simply > >not processed (and the sending application continues to provide > >the buffer, as with InfiniBand) or whether the sender simply > >transmits and leaves error detection to the receiver (iWARP). > >There are theoretical advantages to both, but more importantly > >neither of them is going to change. So the Consumer of RDMA > >applications needs to use ULP/application layer flow control > >to pace the transmitter. At the application layer that means > >that the RECV must be posted *before* the Send/accept that > >grants ULP credits to the far side. > > > >All of that should be clear in the IOV ownership rules and > >discussion of the semantics of send/recv. If you thought you > >saw something that implied any guarantees to the contrary > >then could you point them out in a posting to the DAT reflector? > >(or just send them to me or Arkady Kanevsky). > > > > > I believe it was either your Steve who claimed a recv must be posted > before a send thus leading to a race condition. I fail to see this. But > again, if Steve's patch makes things better I am all for it. >
For iWARP, the connection may be TERMINATED if a SEND arrives on a QP and no corresponding RECV buffer is posted. Steve.