I am not clear what you are proposing? A transport specific API? The current proposal provides on sending side: single post, and single completion in the error free case. This is commonality that simplify ULP.
Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16. Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 > -----Original Message----- > From: Larsen, Roy K [mailto:[EMAIL PROTECTED] > Sent: Monday, February 06, 2006 6:50 PM > To: Kanevsky, Arkady; Caitlin Bestler; > [EMAIL PROTECTED]; Sean Hefty > Cc: openib-general@openib.org > Subject: RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 > immediatedataproposal > > > > >From: Kanevsky, Arkady [mailto:[EMAIL PROTECTED] > >Sent: Monday, February 06, 2006 2:27 PM > > > >Roy, > >comments inline. > > > > Mine too.... > > >> > >> >From: Kanevsky, Arkady [mailto:[EMAIL PROTECTED] > >> >Roy, > >> >Can you explain, please? > >> > > >> >For IB the operation will be layered properly on Transport > primitive. > >> >And on Recv side it will indicate in completion event DTO that it > >> >matches RDMA Write with Immediate and that Immediate Data is > >> in event. > >> > > >> >For iWARP I expect initially, it will be layered on RDMA > >> Write followed > >> >by Send. The Provider can do post more efficiently than > Consumer and > >> >guarantee atomicity. > >> >On Recv side Consumer will get Recv DTO completion in event and > >> >Immediate Data inline as specified by Provider Attribute. > >> > > >> >From the performance point of view Consumers who program > to IB only > >> >will have no performance degradation at all. But this API > >> also allows > >> >Consumers to write ULP to be transport independent with minimal > >> >penalty: one binary comparison and extra 4 bytes in recv buffer. > >> > >> If the application could be written transport > independently, I would > >> have no objection at all. Instead, it must be written in a > >> transport-adaptive way and to be able to adapt to all possible > >> implementations, the application could not send arbitrary > >> "immediate"-sized data as messages because there is no way to > >> distinguish between them on the receiving side. That is > HUGE! It is > >> my experience that send/receive is generally used for > small messages > >> and to take away particular message sizes or to depend on > the so the > >> application can "adapt" to whatever the immediate size is for a > >> particular transport, if even needed, is a very weak facility to > >> offer. > > > >But the remote side does posts Recv. Since it anticipate > that this Recv > >will be matched against the RDMA Write with immediate it > posts the recv > >buffer which fits. Yes, there is an issue for > Transport-independent ULP > >that it does needs a buffer. > >For IB it is possible to post 0-size buffer. But if this is the case > >Recv end Consumer DOES know that it will be macthed against > RDMA Write > >so ULP DOES know what it will be matched against. > >So in the worst case Consumer does have to pay the price of creating > >LMR to handle 4 byte buffer to match RDMA Write Immediate data. > > I think you missed my larger point. The point was that the > application must be written in such a way that it could > inferred when immediate data arrived for a variety of > immediate data sizes and that places a constraint on the > application wrt to data it may want to send/receive normally. > Where as, if the application embraced the fact that it was > responsible for sending a message to indicate a write > completion, it is free to send whatever amount of data best > met its needs. > > Transports that support true immediate data do not require > the ULP to perform buffer matching. They can post a series > of receive buffers that may or may not indicate immediate > data. The ULP does not have to know ahead of time when > immediate data will arrive **against other data receives**. > The fact that an IB oriented application never needs to back > a receive request with a buffer if they were only used to > indicate immediate data is orthogonal. > > > > >> > >> It also affects interface resource allocation. Send queue > sizes will > >> have to adapt to possibly twice there size. > >> > > > >That is correct. We argued about it at the meeting. > >One alternative is to have EP and EVD attr. But this will not be > >efficient since it will double the queue size where a > smaller increment > >is possible due to the depth of the RDMA Write pipeline outstanding. > > > >> It just dawned on me that the immediate data must be in registered > >> memory to be sent in a message. This means the API must > be amended > >> to pass an LMR or, even worse, the provider would have to register > >> memory in the speed path or create and manipulate its own queue of > >> "immediate" > >> data buffers/LMRs. Of course, LMRs are not needed and an overhead > >> for transports that provide true immediate data. > > > >No registration on the speed path. It is Consumer responsibility to > >provide Recv Buffer of the right size. > >Yes for IB only ULP this can be avoided. > >But ULP can be written to the proposed API to take full > advantage of IB > >performance but that code will not be transport independent. > > I was referring to the sending side. Source data of a > message send must be from registered memory. For transports > that will emulate this service with a write/send sequence, > user specified immediate data will need to be copied to a > provider managed pool of "immediate" data buffers/LMRs or the > interface changed to specify an LMR. > > > > >But this API allows to write transport independent code albeit with > >certain price attached. > > > >> > >> Oh, and another thing. InfiniBand indicates the size of the RDMA > >> write in the receive completion. That is something that > will have to > >> be addressed in a "transport independent" way or dropped > as part of > >> the service. > > > >Good point. I will augment Spec accordingly. > > > >> > >> The bottom line here is that it is NOT transport independent. > > > >implementation is not transport independent. > >But API allows to write Transport-specific ULP with full > perfromance as > >well Transport-independent ULP with better performance than without > >proposed API and with "minimal" performance penalty for > Transports that > >provide it. > > Of course, you can make the application as transport service > adaptive as you want but that is a weak argument and a > slippery slop. My point is that the operational semantics of > non-native immediate data transports are identical to > write/send in all respects. So, embrace this and just give > the ULP a simple interface that has broader applicability for > all transports. Provide a thread atomic combined request > capability which can be used for write completion > notification (if not natively > supported) or any other purpose an application may fancy. > > > > >> > >> Now, the atomicity argument between write and send has some > >> credibility. > >> If an application chooses to "adapt" to an explicit write/send > >> semantic for write completion notification in environments > that can't > >> provide it natively, this could be addressed by a generalized > >> combined request API that can guarantee thread-based > atomicity to the > >> send queue. This seems much more straightforward to me since, in > >> essence, to adapt to non-native immediate data services, > they would > >> have to allocate resources and behave in virtually the > same way as if > >> they did write/send explicitly. > >> > >> It is obvious that the proposed service is not one of > immediate data > >> in the sense defined by InfiniBand. Since true immediate > data is a > >> transport specific speed path service, it needs to be > implemented as > >> a transport specific extension. To allow an application > to initiate > >> multiple request sequences that must be queued sequentially to > >> explicitly create a write completion notification or any other > >> order-based sequence, a generalized combined request API should be > >> defined. > > > > > >No disagreemnt here. We were debating a generic way to > combine multiple > >DTOs into a single call for some time. > >But how to define a generic way to do it and to have a single > completion > >on both ends of the connection in successful case was always > a problem. > > I would think an array of pointers and a count to standard > work requests would do it. And of course, each work request > can control whether is solicits a completion so a write/send > sequence can generate a single completion event on both ends. > Use the EVD lock to guard against other threads injecting > requests on the queue during a combined request operation and > the ULP has everything it needs. > > Roy > > > > >> > >> > > >> >Arkady Kanevsky email: [EMAIL PROTECTED] > >> >Network Appliance Inc. phone: 781-768-5395 > >> >1601 Trapelo Rd. - Suite 16. Fax: 781-895-1195 > >> >Waltham, MA 02451 central phone: 781-768-5300 > >> > > >> > > >> > _______________________________________________ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general