RE: [dat-discussions] [openib-general][RFC]DAT2.0immediatedataproposal
We need a better job coordinating between 2 reflectors. One issue is that someone must subscribe to the dat-discussion list to post to it. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC]DAT2.0immediatedataproposal
On Wed, 2006-02-08 at 23:20 -0800, Sean Hefty wrote: Hmm. Can you put a number on how much better RDMA write with immediate is on current HCA hardware? How does using the underlying OpenIB verbs ability to post a list of work requests compare (ie posting an RDMA write followed by a send in one verbs call)? Maybe post multiple is a better direction for DAT. A post multiple call as a general API makes sense, but I think that's a separate issue. Given that IB provides true immediate data with RDMA writes, a way should be available to make use of it. I don't know what the performance numbers between using a write with immediate versus a write followed by a send, but I don't think that anyone could argue that the write with immediate wouldn't perform better. To me, the question is whether write with immediate is supported as a transport specific extension, which was Arlin's original patch, or through some standard API. The attempt to make the API standard, so that iWarp could emulate it (poorly in my view), is what appears to be driving the disagreements. It also appears to me that the decisions are coming down to one of the following. If iWarp can emulate write with immediate, then a generic API should be used. This opens Pandora's box. Should iWARP also emulate ATOMICs? Which should be emulated and which should not ... What are the criteria for deciding? If iWarp cannot properly emulate write with immediate, then the API should be transport specific. It should be transport specific because it is a transport specific feature. Although -- in this case -- it could but implemented in iWARP in my view it _should_ not. It's curious to me that in both cases, iWarp is driving the API decision and design for something that is an IB specific feature. Huh? - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general][RFC]DAT2.0immediatedataproposal
But why define an IB specific feature when a transport neutral feature can be defined? Viewing the operation as Write with following Send maintains transport neutral semantics AND allows IB to encode it as a Write with Immediate. That avoids IB to use the silicon that already exists to support compressing the Write and Send into a single message. That is the real benefit, isn't it? No, it's not And for both transports it enables the Provider to pass the 4 byte immediate data by value rather than by registered reference. So there is a definite benefit to IB, and a potential benefit to IP, and it works for both transports. The *only* thing gained by making it a transport specific method is the implicit 33rd bit in the that RDMA Write payload you asked for has arrived message. Ok, finally. A realization that the semantics of write/send are not the same as IB write with immediate data. And the difference is important. The proposed emulation could not pass a black box test since nothing distinguishes an immediate receive message from standard one containing rkeys or any other random data an application my need to exchange through send/receive. A true write with immediate data can pass such a black box test because it offers a unique service whereas the proposed emulation does not. It is a helper function that uses existing services. I have no objection to a write/send helper function, just call it that and not write with immediate data. Leave the true immediate data service as an extension as first proposed. Is there a concrete example of any benefit from encoding a 33rd bit in the selection of Write with Immediate versus Write followed by 32-bit Send? Yes, as stated several times, applications that use the send/receive facility to exchange information such as rkeys as well as using write immediate services must be able to unambiguously tell the difference between receive indications. Putting a requirement on the application to make that distinction by their own devices provides no additional service that they don't already have in existing APIs. Roy ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
Roland Dreier wrote: Hmm. Can you put a number on how much better RDMA write with immediate is on current HCA hardware? How does using the underlying OpenIB verbs ability to post a list of work requests compare (ie posting an RDMA write followed by a send in one verbs call)? Maybe post multiple is a better direction for DAT. With post multiple, unlike immediate data, you don't have the ability to distinguish between a normal receive and a rdma write completion indication on the other end. This is the uniqueness of the service that cannot be provided by the post multiple. Yes, post multiple would be a nice option for DAT it is just a different service. It would also be required to conform to the semantics rules of the bundled operations so you could not do any optimization tricks under the covers with an IB rdma_write_immediate operation. -arlin - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
[EMAIL PROTECTED] wrote: Roland Dreier wrote: Hmm. Can you put a number on how much better RDMA write with immediate is on current HCA hardware? How does using the underlying OpenIB verbs ability to post a list of work requests compare (ie posting an RDMA write followed by a send in one verbs call)? Maybe post multiple is a better direction for DAT. With post multiple, unlike immediate data, you don't have the ability to distinguish between a normal receive and a rdma write completion indication on the other end. This is the uniqueness of the service that cannot be provided by the post multiple. Yes, post multiple would be a nice option for DAT it is just a different service. It would also be required to conform to the semantics rules of the bundled operations so you could not do any optimization tricks under the covers with an IB rdma_write_immediate operation. A post_multiple also requires defining a single DTO data structure. If the post multiple is atomic (meaning all make it or none do) then it requires an intermediate data structure to have been created. If it is not atomic there really isn't reason for it to not just be a utility function layered above DAT. What I'm not seeing with the immediate is this urgent need by the application to be able to use the same 32-bit value for both an immediate and a 4 byte message that requires an entire additional API just to support it. Why can't the application just add a bool to the send message? Or encode the 32-bits so that they come from disjoint domains? There seems to be agreement that a consolidated write-and-send call would enable the application to get the benefits of rdma write with immediate whenever the application could distinguish the two. I cannot see why doing this is almost free for virtually all applications, and trivial for the remainder. Adding and documenting an extra call to deal with such an extreme corner case that is being presented only in the abstract is just not justified. This extra capability has to have enough functionality for enough applications to justify keeping it on the books, writing test cases for it, etc. We already made a similar decision in having a 128-bit IA Address. That means we cannot support a host that interfaces to the Internet with IPv6 and an InfiniBand network that not only had global GIDs, but allocated a global subnetwork a network id that was already in use as a valid public IPv6 network. The complexity of dealing with an IA Address that was 128+1 bits was simply not jusitified to deal with an extreme corner case that could very easily be avoided (there is no shortage of site local network IDs in the IPv6/GID format, so using a global network prefix that was disjoint from the official IPv6 hierarchy would be just plain silly). So far I haven't seen any explanation as to why an application has a need to encode this 33rd bit of their message in this terribly transport specific matter. Is there some severe performance penalty to slightly restructuring the send message so that it is no longer ambiguous with the immeidate data? ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
At 03:36 PM 2/8/2006, Arlin Davis wrote: Roland Dreier wrote: Michael So, here we have a long discussion on attempting to Michael perpetuate a concept that is not universal across Michael transports and was deemed to have minimal value that most Michael wanted to see removed from the architecture. But this discussion is being driven by an application developer who does see value in immediate data. Arlin, can you quantify the benefit you see from RDMA write with immediate vs. RDMA write followed by a send? We need speed and simplicity. A very latency sensitive application that requires immediate notification of RDMA write completion on the remote node without ANY latency penalties associated with combining operations, HCA priority rules across QPs, wire congestion, etc. An application that has no requirement for messaging outside of remote rdma write completion notifications. The application would not have to register and manage additional message buffers on either side, we can just size the queues accordingly and post zero byte messages. We need something that would be equivelent to setting there polling on the last byte of inbound data. But, since data ordering within an operation is not guaranteed that is not an option. So, rdma with immediate data is the most optimal and simplistic method for indication of RDMA-write completion that we have available today. In fact, I would like to see it increased in size to make it even more useful. RDMA Write with Immediate is part of the IB Extended Transport Header. It is a fixed-sized quantity and not one subject to change, i.e. increasing its size. Your argument above reinforces that the particular application need is IB-specific and thus should not be part of a general API but a transport-specific API. If the application will only operate optimally using immediate data, then it is only suitable for an IB fabric. This reinforces the need for a transport-specific API. Those applications that simply want to enable completion notification when a RDMA Write has occurred can use a general purpose API that is interconnect independent and whose code is predicated upon a RDMA Write - Send set of operations. This will enable application portability across all interconnect types. Mike ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
Hmm. Can you put a number on how much better RDMA write with immediate is on current HCA hardware? How does using the underlying OpenIB verbs ability to post a list of work requests compare (ie posting an RDMA write followed by a send in one verbs call)? Maybe post multiple is a better direction for DAT. With post multiple, unlike immediate data, you don't have the ability to distinguish between a normal receive and a rdma write completion indication on the other end. This is the uniqueness of the service that cannot be provided by the post multiple. Yes, post multiple would be a nice option for DAT it is just a different service. It would also be required to conform to the semantics rules of the bundled operations so you could not do any optimization tricks under the covers with an IB rdma_write_immediate operation. A post_multiple also requires defining a single DTO data structure. If the post multiple is atomic (meaning all make it or none do) then it requires an intermediate data structure to have been created. If it is not atomic there really isn't reason for it to not just be a utility function layered above DAT. That is very good point. And since the emulated immediate data service can't make the atomic guarantee it is the killer argument for just making the service plain - a potentially more efficient write/send. What I'm not seeing with the immediate is this urgent need by the application to be able to use the same 32-bit value for both an immediate and a 4 byte message that requires an entire additional API just to support it. Why can't the application just add a bool to the send message? Or encode the 32-bits so that they come from disjoint domains? Some applications can do as you suggest. Some applications can make good use of unambiguous indications where the buffer size, content, or arrival timing is not constrained. Some don't need write notification at all. What's your point? There seems to be agreement that a consolidated write-and-send call would enable the application to get the benefits of rdma write with immediate whenever the application could distinguish the two. Well, I think there is agreement that *some* applications can use write-and-send in a beneficial way. But then again, nothing prevents them from doing that now. They do not need an additional API. But again, I don't have an issue with defining a helper function. I do have an issue with defining an API and semantic that says the target side needs to be coded in a way to always deal with both true immediate data and emulation. Just define a write/send helper API and the UPL can be coded in a consistent manner if that is a beneficial service. If a true unambiguous indication service is more beneficial or required, it can use the extension and accept the extra complexity. To demand extra complexity in applications that obviously don't need the true immediate data semantic is just wrong in my option. I cannot see why doing this is almost free for virtually all applications, and trivial for the remainder. Adding and documenting an extra call to deal with such an extreme corner case that is being presented only in the abstract is just not justified. This extra capability has to have enough functionality for enough applications to justify keeping it on the books, writing test cases for it, etc. All we're asking is that a write/send combined API not be called immediate data unless it fits the semantics of immediate data. I am puzzled at the resistance this is getting. There is a standards body specification for immediate data. If it is not followed, don't call it immediate data. It's that simple. For those transports that can provide the service, the UPL may be able to gain access to it through an extension. Roy ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
Why both Immediate Data and the Stag which was used for RDMA Write? Immediate data already contains info in response to what operation the RDMA Write has completed locally. Stag would make sence if Stag invalidation also put in the mix. But for MPI RMR_context have a long lifecycle so not clear which apps will be interested in combining Invalidation with RDMA Write with Immediate data. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Caitlin Bestler [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 07, 2006 3:03 PM To: Larsen, Roy K; [EMAIL PROTECTED]; Arlin Davis; Hefty, Sean Cc: openib-general@openib.org Subject: RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal [EMAIL PROTECTED] wrote: Caitlin Bestler wrote: Arlin Davis wrote: Sean Hefty wrote: The requirement is to provide an API that supports RDMA writes with immediate data. A send that follows an RDMA write is not immediate data, and the API should not be constructed around trying to make it so. To be clear, I believe that write with immediate should be part of the normal APIs, rather than an extension, but should be designed around those devices that provide it natively. I totally agree. A standard RDMA write with immediate API can be very useful to RDMA applications based on the requirements (native support) set forth in my earlier email. It is analogous to the new dat_ep_post_send_with_invalidate() call; a call that supports a native iWARP transport operation but provides no provisions to help other transports emulate. So, other transports simply return NOT_SUPPORTED and add it natively in the future if it makes sense. -arlin What is proposed in a definition of 'dat_ep_post_rdma_write_with_immediate' that can be implemented over iWARP using the sequence of messages that were intended to support the same purpose (i.e., letting the other side know that an RDMA Write transfer has been fully received). No, iWARP *CAN NOT* implement write immediate data any better than IB can implement send with invalidate. Immediate data *MUST* be indicated to the ULP unambiguously. Imposing an algorithm on the application to infer immediate data arrival is hack, pure and simple. An application is free to perform a write/send if that is the semantic they want. Why does iWARP get transport unique APIs but not IB? I find this attempt to bastardize the IB semantic of immediate data a little curious. The transports aren't getting anything. Features are there for applications, especially when the feature can be defined in a way that makes sense without explaining transport mechanics. Completing a transaction, complete with supplying a transaction response and releasing the advertised STag associated with the transaction is something that makes sense in the application domain and conforms to normal DAT ordering rules. Provide information about an RDMA Write to a receive operation also meets that definition -- as long as it conforms to the existing ordering rules. Shifting to an 8 byte message over iWARP to allow for the write length *and* immediate 'tag' is certainly doable. We could even consider having the DAT Provider supply the 'buffer' silently in the DTO itself. With that definition the consumer would get a receive completion that told them that their peer's RDMA Write had been successfully placed, how long it is (the length) and which one (a tag). I think that is of value. iWARP can implement it as two work requests and maintain the overall semantics. Are you arguing that iWARP should NOT provide this service until it can do it in a single work request? It seems to me that allowing an extra work request and completion is a fairly simple accomodation as opposed to using an alternate algorithm in the main transaction processing of the application. If we enable the applicatin can query how a remote write with immediate will complete outside of the transaction loop then we can allow the application to have *no* overhead inside the main transaction loop, and *identical* logic on the sending side. And IB *could* implement send with invalidate by simply agreeing on how the RKey to be invalidated is communicated between the IB providers (perhaps as an immediate). But more to the point, I don't see how the more flexible definition of write with immediate negatively impacts the IB implementation of the feature. IB providers do not need to allow for the extra work requests. They are not being asked to place the immediate data into the receive buffer, or to do any extra work at all. Yahoo! Groups
RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
Caitlin, can you clarify this. Are you proposing that Consumer encode a bit of Immediate Data to specify that it is immediate data? iWARP will pass it in Send message and IB in Immediate Data. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Caitlin Bestler [mailto:[EMAIL PROTECTED] Sent: Thursday, February 09, 2006 2:40 PM To: Arlin Davis; Roland Dreier Cc: [EMAIL PROTECTED]; openib-general@openib.org Subject: RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal [EMAIL PROTECTED] wrote: Roland Dreier wrote: Hmm. Can you put a number on how much better RDMA write with immediate is on current HCA hardware? How does using the underlying OpenIB verbs ability to post a list of work requests compare (ie posting an RDMA write followed by a send in one verbs call)? Maybe post multiple is a better direction for DAT. With post multiple, unlike immediate data, you don't have the ability to distinguish between a normal receive and a rdma write completion indication on the other end. This is the uniqueness of the service that cannot be provided by the post multiple. Yes, post multiple would be a nice option for DAT it is just a different service. It would also be required to conform to the semantics rules of the bundled operations so you could not do any optimization tricks under the covers with an IB rdma_write_immediate operation. A post_multiple also requires defining a single DTO data structure. If the post multiple is atomic (meaning all make it or none do) then it requires an intermediate data structure to have been created. If it is not atomic there really isn't reason for it to not just be a utility function layered above DAT. What I'm not seeing with the immediate is this urgent need by the application to be able to use the same 32-bit value for both an immediate and a 4 byte message that requires an entire additional API just to support it. Why can't the application just add a bool to the send message? Or encode the 32-bits so that they come from disjoint domains? There seems to be agreement that a consolidated write-and-send call would enable the application to get the benefits of rdma write with immediate whenever the application could distinguish the two. I cannot see why doing this is almost free for virtually all applications, and trivial for the remainder. Adding and documenting an extra call to deal with such an extreme corner case that is being presented only in the abstract is just not justified. This extra capability has to have enough functionality for enough applications to justify keeping it on the books, writing test cases for it, etc. We already made a similar decision in having a 128-bit IA Address. That means we cannot support a host that interfaces to the Internet with IPv6 and an InfiniBand network that not only had global GIDs, but allocated a global subnetwork a network id that was already in use as a valid public IPv6 network. The complexity of dealing with an IA Address that was 128+1 bits was simply not jusitified to deal with an extreme corner case that could very easily be avoided (there is no shortage of site local network IDs in the IPv6/GID format, so using a global network prefix that was disjoint from the official IPv6 hierarchy would be just plain silly). So far I haven't seen any explanation as to why an application has a need to encode this 33rd bit of their message in this terribly transport specific matter. Is there some severe performance penalty to slightly restructuring the send message so that it is no longer ambiguous with the immeidate data? Yahoo! Groups Links * To visit your group on the web, go to: http://groups.yahoo.com/group/dat-discussions/ * To unsubscribe from this group, send an email to: [EMAIL PROTECTED] * Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/ ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
[EMAIL PROTECTED] wrote: Hmm. Can you put a number on how much better RDMA write with immediate is on current HCA hardware? How does using the underlying OpenIB verbs ability to post a list of work requests compare (ie posting an RDMA write followed by a send in one verbs call)? Maybe post multiple is a better direction for DAT. With post multiple, unlike immediate data, you don't have the ability to distinguish between a normal receive and a rdma write completion indication on the other end. This is the uniqueness of the service that cannot be provided by the post multiple. Yes, post multiple would be a nice option for DAT it is just a different service. It would also be required to conform to the semantics rules of the bundled operations so you could not do any optimization tricks under the covers with an IB rdma_write_immediate operation. A post_multiple also requires defining a single DTO data structure. If the post multiple is atomic (meaning all make it or none do) then it requires an intermediate data structure to have been created. If it is not atomic there really isn't reason for it to not just be a utility function layered above DAT. That is very good point. And since the emulated immediate data service can't make the atomic guarantee it is the killer argument for just making the service plain - a potentially more efficient write/send. What I'm not seeing with the immediate is this urgent need by the application to be able to use the same 32-bit value for both an immediate and a 4 byte message that requires an entire additional API just to support it. Why can't the application just add a bool to the send message? Or encode the 32-bits so that they come from disjoint domains? Some applications can do as you suggest. Some applications can make good use of unambiguous indications where the buffer size, content, or arrival timing is not constrained. Some don't need write notification at all. What's your point? There seems to be agreement that a consolidated write-and-send call would enable the application to get the benefits of rdma write with immediate whenever the application could distinguish the two. Well, I think there is agreement that *some* applications can use write-and-send in a beneficial way. But then again, nothing prevents them from doing that now. They do not need an additional API. But again, I don't have an issue with defining a helper function. I do have an issue with defining an API and semantic that says the target side needs to be coded in a way to always deal with both true immediate data and emulation. Just define a write/send helper API and the UPL can be coded in a consistent manner if that is a beneficial service. If a true unambiguous indication service is more beneficial or required, it can use the extension and accept the extra complexity. To demand extra complexity in applications that obviously don't need the true immediate data semantic is just wrong in my option. I cannot see why doing this is almost free for virtually all applications, and trivial for the remainder. Adding and documenting an extra call to deal with such an extreme corner case that is being presented only in the abstract is just not justified. This extra capability has to have enough functionality for enough applications to justify keeping it on the books, writing test cases for it, etc. All we're asking is that a write/send combined API not be called immediate data unless it fits the semantics of immediate data. I am puzzled at the resistance this is getting. There is a standards body specification for immediate data. If it is not followed, don't call it immediate data. It's that simple. For those transports that can provide the service, the UPL may be able to gain access to it through an extension. I have no objection to calling this dat_ep_post_rdma_write_with_notifier and labelling the 32-bit data as a notifier tag. Even on iWARP transports small send data can be in-lined, avoiding the need for buffers to be registered. A special API where the length of the send buffer is known in advance makes this even easier. What I still fail to see is a rationale that works down from the application layer on why an application would need still one more page in their cookbook. Creating an entire new method to enable a strange method of signalling one bit of information to the other end doesn't seem like much of a payoff to me. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
[EMAIL PROTECTED] wrote: Caitlin, can you clarify this. Are you proposing that Consumer encode a bit of Immediate Data to specify that it is immediate data? iWARP will pass it in Send message and IB in Immediate Data. If we agreed that there was some accute need for this 33rd bit coming down from the application layer then creating an iWARP untagged message that encoded the first 32 bits, the length of the RDMA write and the magic bonus bit would indeed be a possible solution. I am skeptical that there is a true application derived need for this bonus bit that justifies the complexity required to document it. If the application only needs this bonus bit when running over IB then it really doesn't need it at all. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general][RFC] DAT2.0immediatedataproposal
Mike, but then the combined operation can as easily be handle by a "multiple post operation". What is the need specific transport-independent RDMA Write with immediate data. I am still concern over the need of Consumer Recv side to separate recv of Immediate Data from "regular" Recv. Consumer "knows" what it expect to match the posted Recv. There is one to one mapping between non-pure RDMA transfer ops of one side with Recv of another. Sure ULP may use the same size buffers for all. But how many ULPs mix the Immediate Data size messages ( 4 bytes on IB ) with normal Sends of the same exact size. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 From: Michael Krause [mailto:[EMAIL PROTECTED] Sent: Thursday, February 09, 2006 3:25 PMTo: Arlin DavisCc: [EMAIL PROTECTED]; openib-general@openib.orgSubject: Re: [dat-discussions] [openib-general][RFC] DAT2.0immediatedataproposal At 03:36 PM 2/8/2006, Arlin Davis wrote: Roland Dreier wrote: Michael So, here we have a long discussion on attempting to Michael perpetuate a concept that is not universal across Michael transports and was deemed to have minimal value that most Michael wanted to see removed from the architecture.But this discussion is being driven by an application developer whodoes see value in immediate data.Arlin, can you quantify the benefit you see from RDMA write withimmediate vs. RDMA write followed by a send?We need speed and simplicity.A very latency sensitive application that requires immediate notification of RDMA write completion on the remote node without ANY latency penalties associated with combining operations, HCA priority rules across QPs, wire congestion, etc. An application that has no requirement for messaging outside of remote rdma write completion notifications. The application would not have to register and manage additional message buffers on either side, we can just size the queues accordingly and post zero byte messages. We need something that would be equivelent to setting there polling on the last byte of inbound data. But, since data ordering within an operation is not guaranteed that is not an option. So, rdma with immediate data is the most optimal and simplistic method for indication of RDMA-write completion that we have available today. In fact, I would like to see it increased in size to make it even more useful.RDMA Write with Immediate is part of the IB Extended Transport Header. It is a fixed-sized quantity and not one subject to change, i.e. increasing its size.Your argument above reinforces that the particular application need is IB-specific and thus should not be part of a general API but a transport-specific API. If the application will only operate optimally using immediate data, then it is only suitable for an IB fabric. This reinforces the need for a transport-specific API.Those applications that simply want to enable completion notification when a RDMA Write has occurred can use a general purpose API that is interconnect independent and whose code is predicated upon a RDMA Write - Send set of operations. This will enable application portability across all interconnect types.Mike ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC]DAT2.0immediatedataproposal
Roy, and if tomorrow iWARP decides to support Immediate data with variable length. API does not changes. Semantic does not changes and IB will not be able to support it. I am trying to define the semantic and API which will not have to be modified for each rev of the transport. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Larsen, Roy K [mailto:[EMAIL PROTECTED] Sent: Thursday, February 09, 2006 3:32 PM To: [EMAIL PROTECTED]; Arlin Davis; Roland Dreier Cc: openib-general@openib.org Subject: RE: [dat-discussions] [openib-general] [RFC]DAT2.0immediatedataproposal Hmm. Can you put a number on how much better RDMA write with immediate is on current HCA hardware? How does using the underlying OpenIB verbs ability to post a list of work requests compare (ie posting an RDMA write followed by a send in one verbs call)? Maybe post multiple is a better direction for DAT. With post multiple, unlike immediate data, you don't have the ability to distinguish between a normal receive and a rdma write completion indication on the other end. This is the uniqueness of the service that cannot be provided by the post multiple. Yes, post multiple would be a nice option for DAT it is just a different service. It would also be required to conform to the semantics rules of the bundled operations so you could not do any optimization tricks under the covers with an IB rdma_write_immediate operation. A post_multiple also requires defining a single DTO data structure. If the post multiple is atomic (meaning all make it or none do) then it requires an intermediate data structure to have been created. If it is not atomic there really isn't reason for it to not just be a utility function layered above DAT. That is very good point. And since the emulated immediate data service can't make the atomic guarantee it is the killer argument for just making the service plain - a potentially more efficient write/send. What I'm not seeing with the immediate is this urgent need by the application to be able to use the same 32-bit value for both an immediate and a 4 byte message that requires an entire additional API just to support it. Why can't the application just add a bool to the send message? Or encode the 32-bits so that they come from disjoint domains? Some applications can do as you suggest. Some applications can make good use of unambiguous indications where the buffer size, content, or arrival timing is not constrained. Some don't need write notification at all. What's your point? There seems to be agreement that a consolidated write-and-send call would enable the application to get the benefits of rdma write with immediate whenever the application could distinguish the two. Well, I think there is agreement that *some* applications can use write-and-send in a beneficial way. But then again, nothing prevents them from doing that now. They do not need an additional API. But again, I don't have an issue with defining a helper function. I do have an issue with defining an API and semantic that says the target side needs to be coded in a way to always deal with both true immediate data and emulation. Just define a write/send helper API and the UPL can be coded in a consistent manner if that is a beneficial service. If a true unambiguous indication service is more beneficial or required, it can use the extension and accept the extra complexity. To demand extra complexity in applications that obviously don't need the true immediate data semantic is just wrong in my option. I cannot see why doing this is almost free for virtually all applications, and trivial for the remainder. Adding and documenting an extra call to deal with such an extreme corner case that is being presented only in the abstract is just not justified. This extra capability has to have enough functionality for enough applications to justify keeping it on the books, writing test cases for it, etc. All we're asking is that a write/send combined API not be called immediate data unless it fits the semantics of immediate data. I am puzzled at the resistance this is getting. There is a standards body specification for immediate data. If it is not followed, don't call it immediate data. It's that simple. For those transports that can provide the service, the UPL may be able to gain access to it through an extension. Roy ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib
RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
All we're asking is that a write/send combined API not be called immediate data unless it fits the semantics of immediate data. I am puzzled at the resistance this is getting. There is a standards body specification for immediate data. If it is not followed, don't call it immediate data. It's that simple. For those transports that can provide the service, the UPL may be able to gain access to it through an extension. I have no objection to calling this dat_ep_post_rdma_write_with_notifier and labelling the 32-bit data as a notifier tag. If this MUST be implanted by the provider as a (possibly optimized) write followed by a send, that sounds good to me. All transports can support it and provide the same semantic. No need for application schism. However, I wouldn't place a restriction on the size of the notifier tag. Somewhere along the line, the send data has to reside in a registered buffer. Might as well have the ULP supply it and let it define the contents and size. Even on iWARP transports small send data can be in-lined, avoiding the need for buffers to be registered. A special API where the length of the send buffer is known in advance makes this even easier. Ah, I wasn't aware iWARP could carry inline data. I take it that's not possible on an iWARP RDMA write PDU however. What I still fail to see is a rationale that works down from the application layer on why an application would need still one more page in their cookbook. Creating an entire new method to enable a strange method of signalling one bit of information to the other end doesn't seem like much of a payoff to me. Of course the semantics are much more that signaling one bit. Nevertheless, if the contention is that applications don't need that bit, that all they need are write/send semantics, then by all means, simply define an API that gives them that and this thread is closed. Provider writers for transports that can supply a true immediate data service would be free to waste their time supplying an unused service through an extension. But that business decision should be left to the provider writer, not his mailing list. Roy ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
Larsen, Roy K wrote: Even on iWARP transports small send data can be in-lined, avoiding the need for buffers to be registered. A special API where the length of the send buffer is known in advance makes this even easier. Ah, I wasn't aware iWARP could carry inline data. I take it that's not possible on an iWARP RDMA write PDU however. On the wire the data is always in-line, only an RDMA Read Request references data that is not part of the message. The iWARP protocols do not specify much about the local interface. That role has been taken by the RDMAC verbs and RNIC-PI so far. The standard functionality defined in the RDMAC verbs do not mandate support for Inline Send work request. Neither do the IBTA verbs. The option shows up in APIs, and in firmware, because it is a valuable optimization that improves latency in the Device/host exchange independent of the wire protocol. In the vast majority of cases, the user verbs can implement inline sends very easily whenever the data is shorter than the SGL would have been. So in the sense that you can view the SQ itself as a registered buffer then it is true. But there is no need for a *separate* registered buffer. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC]DAT2.0immediatedataproposal
Roy, and if tomorrow iWARP decides to support Immediate data with variable length. API does not changes. Semantic does not changes and IB will not be able to support it. I am trying to define the semantic and API which will not have to be modified for each rev of the transport. Arkady, Simply define the API as all the parameters needed to do an RDMA write followed by a send. This is semantically all that many seem to believe is required. I would not restrict the size or contents of the send buffer supplied by the ULP. Could even be a zero length buffer just to trigger the receive completion. Don't try to make the operation any more magical than that. All transports can implement it consistently and the ULP can handle it consistently too. I can't see how anyone could object to that API since it is providing the service desired consistently among all transports. That said, I am not conceding that this service is the equivalent to IB RDMA write with immediate data and want to see a general extension API added for this and any future transport service that won't be supported by the DAPL API. Roy Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
Michael Krause wrote: RDMA Write with Immediate is part of the IB Extended Transport Header. It is a fixed-sized quantity and not one subject to change, i.e. increasing its size. Your argument above reinforces that the particular application need is IB-specific and thus should not be part of a general API but a transport-specific API. If the application will only operate optimally using immediate data, then it is only suitable for an IB fabric. This reinforces the need for a transport-specific API. I agree. I will move the IB immediate data service back into the extension interface and update the OpenIB uDAPL provider patch. Those applications that simply want to enable completion notification when a RDMA Write has occurred can use a general purpose API that is interconnect independent and whose code is predicated upon a RDMA Write - Send set of operations. This will enable application portability across all interconnect types. I will defer this to Arkady to draft. -arlin ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
Arlin, This can be done. But I have an issue that extension call violate Transport Requirement. Currently, the matching semantic is well-defined since Recv only matches Send. Since Spec does not have any idea what operations are defined in extension(s) there is a problem with the transport requirements. We can, of course, make some generic statement that with does not cover APIs that are defined in extensions. The API requirements are easier to handle. Since they have been written as Nonrequirement for the APIs we decide to define yet. (I will need to review chapter 5 to make we had followed this in all cases.) Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Arlin Davis [mailto:[EMAIL PROTECTED] Sent: Thursday, February 09, 2006 5:57 PM To: Michael Krause Cc: [EMAIL PROTECTED]; openib-general@openib.org; Kanevsky, Arkady Subject: Re: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal Michael Krause wrote: RDMA Write with Immediate is part of the IB Extended Transport Header. It is a fixed-sized quantity and not one subject to change, i.e. increasing its size. Your argument above reinforces that the particular application need is IB-specific and thus should not be part of a general API but a transport-specific API. If the application will only operate optimally using immediate data, then it is only suitable for an IB fabric. This reinforces the need for a transport-specific API. I agree. I will move the IB immediate data service back into the extension interface and update the OpenIB uDAPL provider patch. Those applications that simply want to enable completion notification when a RDMA Write has occurred can use a general purpose API that is interconnect independent and whose code is predicated upon a RDMA Write - Send set of operations. This will enable application portability across all interconnect types. I will defer this to Arkady to draft. -arlin ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
One more issue to discuss. Does Completion of Recv that matches RDMA Write with Immediate Data automatically sync local memory or Consumer still need to do lmr_sync_rdma_write prior to accessing RDMAed data. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Caitlin Bestler [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 07, 2006 7:40 PM To: [EMAIL PROTECTED]; Larsen, Roy K; Arlin Davis; Hefty, Sean Cc: openib-general@openib.org Subject: RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal [EMAIL PROTECTED] wrote: We have problem no matter which option we choose. The current Transport Level Requirement state: There is a one-to-one correspondence between send operation on one Endpoint of the Connection and recv operations on the other Endpoint of the Connection. There is no correspondence between RDMA operations on one Endpoint of the Connection and recv or send data transfer operation on the other Endpoint of the Connection. Receive operations on a Connection must be completed in the order of posting of their corresponding sends. The Immediate data and Atomic ops violate these requirements including ordering rules. I had started updating these rules when I generated the first draft of the requirements. They are included in the enclosed pdf file. But they do not cover Atomic ops that also impact transport requirements. This chapter of the spec have not been changed since DAPL 1.0 and I am very concern with any changes to it. Arkady If RDMA Write with Immediate is viewed as being the equivalent of doing RDMA Write and then an RDMA Send the correspondence rule is maintained. But *only* if the rdma write with immediate has all of the semantics of a Send. Atomics do not violate the rules if you view them as being a variation on an RDMA Read. They are an RDMA Read with modify. The real question is whether it makes sense to put it in the RDMA device. It is also not subject to emulation at a highe layer. With send with invalidate we know how InfiniBand *will* support it, because of the IB 1.2 verbs. We do not know that for atomics over iWARP. We do not know whether it will be added, more importantly we do not know *how* it would be added if it were added. That makes coming up with a transport neutral definition very premature. In particular, if atomics were added to iWARP there is a distinct design option where it would *not* be the same work queue as RDMA Reads (adding atomics through Queue ID 3 would make layering on top of a current implementation much easier. But it would mean that atomic credits would be distinct from read credits. This is a very strong reason to defer attempting to define RDMA Atomics in a transport neutral fashion. Yahoo! Groups Links * To visit your group on the web, go to: http://groups.yahoo.com/group/dat-discussions/ * To unsubscribe from this group, send an email to: [EMAIL PROTECTED] * Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/ ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
[EMAIL PROTECTED] wrote: One more issue to discuss. Does Completion of Recv that matches RDMA Write with Immediate Data automatically sync local memory or Consumer still need to do lmr_sync_rdma_write prior to accessing RDMAed data. Why would it be any different than for a plain receive? The intent is the same, to indicate that prior Writes have completed. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
At 09:16 PM 2/6/2006, Sean Hefty wrote: The requirement is to provide an API that supports RDMA writes with immediate data. A send that follows an RDMA write is not immediate data, and the API should not be constructed around trying to make it so. To be clear, I believe that write with immediate should be part of the normal APIs, rather than an extension, but should be designed around those devices that provide it natively. One thing to keep in mind is that the IBTA workgroup responsible for the transport wanted to eliminate immediate data support entirely but it was retained solely to enable VIA application migration (even though the application base was quite small). If that requirement could have been eliminated, then it would have been gone in a heart beat. Given a RDMA-WRITE followed by a SEND provides the same application semantics based on the use models, iWARP chose not to support immediate data. So, here we have a long discussion on attempting to perpetuate a concept that is not universal across transports and was deemed to have minimal value that most wanted to see removed from the architecture. One has to question the value of trying to develop any API / software to support immediate data instead of just enabling the preferred method which is RDMA WRITE - SEND. I agree with those who have contended that this is difficult to do in a general purpose fashion. When all of this is taken into account, it seems the only good engineering answer is to eliminate immediate data support by the software and focused on the method that works across all interconnects. Mike ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
Michael So, here we have a long discussion on attempting to Michael perpetuate a concept that is not universal across Michael transports and was deemed to have minimal value that most Michael wanted to see removed from the architecture. But this discussion is being driven by an application developer who does see value in immediate data. Arlin, can you quantify the benefit you see from RDMA write with immediate vs. RDMA write followed by a send? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
Roland Dreier wrote: Michael So, here we have a long discussion on attempting to Michael perpetuate a concept that is not universal across Michael transports and was deemed to have minimal value that most Michael wanted to see removed from the architecture. But this discussion is being driven by an application developer who does see value in immediate data. Arlin, can you quantify the benefit you see from RDMA write with immediate vs. RDMA write followed by a send? We need speed and simplicity. A very latency sensitive application that requires immediate notification of RDMA write completion on the remote node without ANY latency penalties associated with combining operations, HCA priority rules across QPs, wire congestion, etc. An application that has no requirement for messaging outside of remote rdma write completion notifications. The application would not have to register and manage additional message buffers on either side, we can just size the queues accordingly and post zero byte messages. We need something that would be equivelent to setting there polling on the last byte of inbound data. But, since data ordering within an operation is not guaranteed that is not an option. So, rdma with immediate data is the most optimal and simplistic method for indication of RDMA-write completion that we have available today. In fact, I would like to see it increased in size to make it even more useful. -arlin ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
Arlin A very latency sensitive application that requires Arlin immediate notification of RDMA write completion on the Arlin remote node without ANY latency penalties associated with Arlin combining operations, HCA priority rules across QPs, wire Arlin congestion, etc. An application that has no requirement for Arlin messaging outside of remote rdma write completion Arlin notifications. The application would not have to register Arlin and manage additional message buffers on either side, we Arlin can just size the queues accordingly and post zero byte Arlin messages. We need something that would be equivelent to Arlin setting there polling on the last byte of inbound Arlin data. But, since data ordering within an operation is not Arlin guaranteed that is not an option. So, rdma with immediate Arlin data is the most optimal and simplistic method for Arlin indication of RDMA-write completion that we have available Arlin today. In fact, I would like to see it increased in size to Arlin make it even more useful. Hmm. Can you put a number on how much better RDMA write with immediate is on current HCA hardware? How does using the underlying OpenIB verbs ability to post a list of work requests compare (ie posting an RDMA write followed by a send in one verbs call)? Maybe post multiple is a better direction for DAT. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
One thing to keep in mind is that the IBTA workgroup responsible for the transport wanted to eliminate immediate data support entirely but it was retained solely to enable VIA application migration (even though the application base was quite small). If that requirement could have been eliminated, then it would have been gone in a heart beat. Given a RDMA-WRITE followed by a SEND provides the same application semantics based on the use models, iWARP chose not to support immediate data. Mike, I was not part of the original IBTA discussions and I wont argue whether this facility should or shouldnt have been include. Nevertheless, it is part of the specification, there are HCA vendors that implement it, and we have applications that make use of it. I would, however, disagree with your assertion that write followed by a send is semantically equivalent to write immediate. Ordering may be semantically the same, but the service is not. Receive work completions are explicitly indicated as being associated with immediate data and therefore an associated write completion. A write followed by a send does not provide the same indication semantic. Roy ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
[EMAIL PROTECTED] wrote: Arlin A very latency sensitive application that requires Arlin immediate notification of RDMA write completion on the Arlin remote node without ANY latency penalties associated with Arlin combining operations, HCA priority rules across QPs, wire Arlin congestion, etc. An application that has no requirement for Arlin messaging outside of remote rdma write completion Arlin notifications. The application would not have to register Arlin and manage additional message buffers on either side, we Arlin can just size the queues accordingly and post zero byte Arlin messages. We need something that would be equivelent to Arlin setting there polling on the last byte of inbound Arlin data. But, since data ordering within an operation is not Arlin guaranteed that is not an option. So, rdma with immediate Arlin data is the most optimal and simplistic method for Arlin indication of RDMA-write completion that we have available Arlin today. In fact, I would like to see it increased in size to Arlin make it even more useful. Hmm. Can you put a number on how much better RDMA write with immediate is on current HCA hardware? How does using the underlying OpenIB verbs ability to post a list of work requests compare (ie posting an RDMA write followed by a send in one verbs call)? Maybe post multiple is a better direction for DAT. The distinction between Write and Send versus post multiple is that it maintains a very simple one-to-one correspondence with the post_recv at the data sink. I also do not see how the *application* keeping the write and send semantics can have a negative performance implication if we allow InfiniBand Providers to encode it as an RDMA Write with Immediate. If the Data Source needs to communicate to the Data Sink that a specific RDMA Write transfer is done then it is sending a message. Information transfer and synchronization is occuring. I fail to see the value, let alone the optimization, of layering on an extra bit of information disguised as an opcode and using a specific transport's encoding methods as the model for a transport neutral API (particularly one at the DAT layer, at the verb layer it is a different issue because at the verb layer we do not want to hide any hardware capabilities even while encouraging safe harbor transport neutral practices). If distinquishing between 32-bit messages and 32-bit immediates that can arrive in indeterminate order is really that important to your application then maybe you really needed a 33-bit message to begin with. Encoding application layer information via your choice of carrier pigeon is not a very robust strategy. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC]DAT2.0immediatedataproposal
Hmm. Can you put a number on how much better RDMA write with immediate is on current HCA hardware? How does using the underlying OpenIB verbs ability to post a list of work requests compare (ie posting an RDMA write followed by a send in one verbs call)? Maybe post multiple is a better direction for DAT. A post multiple call as a general API makes sense, but I think that's a separate issue. Given that IB provides true immediate data with RDMA writes, a way should be available to make use of it. I don't know what the performance numbers between using a write with immediate versus a write followed by a send, but I don't think that anyone could argue that the write with immediate wouldn't perform better. To me, the question is whether write with immediate is supported as a transport specific extension, which was Arlin's original patch, or through some standard API. The attempt to make the API standard, so that iWarp could emulate it (poorly in my view), is what appears to be driving the disagreements. It also appears to me that the decisions are coming down to one of the following. If iWarp can emulate write with immediate, then a generic API should be used. If iWarp cannot properly emulate write with immediate, then the API should be transport specific. It's curious to me that in both cases, iWarp is driving the API decision and design for something that is an IB specific feature. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
Sean Hefty wrote: The requirement is to provide an API that supports RDMA writes with immediate data. A send that follows an RDMA write is not immediate data, and the API should not be constructed around trying to make it so. To be clear, I believe that write with immediate should be part of the normal APIs, rather than an extension, but should be designed around those devices that provide it natively. I totally agree. A standard RDMA write with immediate API can be very useful to RDMA applications based on the requirements (native support) set forth in my earlier email. It is analogous to the new dat_ep_post_send_with_invalidate() call; a call that supports a native iWARP transport operation but provides no provisions to help other transports emulate. So, other transports simply return NOT_SUPPORTED and add it natively in the future if it makes sense. -arlin ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
Arlin Davis wrote: Sean Hefty wrote: The requirement is to provide an API that supports RDMA writes with immediate data. A send that follows an RDMA write is not immediate data, and the API should not be constructed around trying to make it so. To be clear, I believe that write with immediate should be part of the normal APIs, rather than an extension, but should be designed around those devices that provide it natively. I totally agree. A standard RDMA write with immediate API can be very useful to RDMA applications based on the requirements (native support) set forth in my earlier email. It is analogous to the new dat_ep_post_send_with_invalidate() call; a call that supports a native iWARP transport operation but provides no provisions to help other transports emulate. So, other transports simply return NOT_SUPPORTED and add it natively in the future if it makes sense. -arlin What is proposed in a definition of 'dat_ep_post_rdma_write_with_immediate' that can be implemented over iWARP using the sequence of messages that were intended to support the same purpose (i.e., letting the other side know that an RDMA Write transfer has been fully received). This definition also conforms to all existing DAT ordering rules. Is there anything wrong with this definition for an IB provider? There is a similarity between write_with_immediate and send_with_invalidate in that they combine operations which a) are already logically tied from the consumer's perspective and b) can be more easily optimized by the Provider over the wire when presented as one request. Indeed, with send_with_invalidate it *has* to be optimized since you cannot send the invalidate later. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
Caitlin Bestler wrote: Arlin Davis wrote: Sean Hefty wrote: The requirement is to provide an API that supports RDMA writes with immediate data. A send that follows an RDMA write is not immediate data, and the API should not be constructed around trying to make it so. To be clear, I believe that write with immediate should be part of the normal APIs, rather than an extension, but should be designed around those devices that provide it natively. I totally agree. A standard RDMA write with immediate API can be very useful to RDMA applications based on the requirements (native support) set forth in my earlier email. It is analogous to the new dat_ep_post_send_with_invalidate() call; a call that supports a native iWARP transport operation but provides no provisions to help other transports emulate. So, other transports simply return NOT_SUPPORTED and add it natively in the future if it makes sense. -arlin What is proposed in a definition of 'dat_ep_post_rdma_write_with_immediate' that can be implemented over iWARP using the sequence of messages that were intended to support the same purpose (i.e., letting the other side know that an RDMA Write transfer has been fully received). No, iWARP *CAN NOT* implement write immediate data any better than IB can implement send with invalidate. Immediate data *MUST* be indicated to the ULP unambiguously. Imposing an algorithm on the application to infer immediate data arrival is hack, pure and simple. An application is free to perform a write/send if that is the semantic they want. Why does iWARP get transport unique APIs but not IB? I find this attempt to bastardize the IB semantic of immediate data a little curious. Roy ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
IB does optionally support send_with_invalidate as defined in IBTA 1.2 spec. OpenIB does not support this yet but this is a different matter. So this is bad analogy. The better analogy is socket based CM. But I am still not clear what you are advocating: extensions, IB specific API or something else. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Larsen, Roy K [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 07, 2006 2:46 PM To: [EMAIL PROTECTED]; Arlin Davis; Hefty, Sean Cc: Kanevsky, Arkady; Sean Hefty; openib-general@openib.org Subject: RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal Caitlin Bestler wrote: Arlin Davis wrote: Sean Hefty wrote: The requirement is to provide an API that supports RDMA writes with immediate data. A send that follows an RDMA write is not immediate data, and the API should not be constructed around trying to make it so. To be clear, I believe that write with immediate should be part of the normal APIs, rather than an extension, but should be designed around those devices that provide it natively. I totally agree. A standard RDMA write with immediate API can be very useful to RDMA applications based on the requirements (native support) set forth in my earlier email. It is analogous to the new dat_ep_post_send_with_invalidate() call; a call that supports a native iWARP transport operation but provides no provisions to help other transports emulate. So, other transports simply return NOT_SUPPORTED and add it natively in the future if it makes sense. -arlin What is proposed in a definition of 'dat_ep_post_rdma_write_with_immediate' that can be implemented over iWARP using the sequence of messages that were intended to support the same purpose (i.e., letting the other side know that an RDMA Write transfer has been fully received). No, iWARP *CAN NOT* implement write immediate data any better than IB can implement send with invalidate. Immediate data *MUST* be indicated to the ULP unambiguously. Imposing an algorithm on the application to infer immediate data arrival is hack, pure and simple. An application is free to perform a write/send if that is the semantic they want. Why does iWARP get transport unique APIs but not IB? I find this attempt to bastardize the IB semantic of immediate data a little curious. Roy ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
[EMAIL PROTECTED] wrote: Caitlin Bestler wrote: Arlin Davis wrote: Sean Hefty wrote: The requirement is to provide an API that supports RDMA writes with immediate data. A send that follows an RDMA write is not immediate data, and the API should not be constructed around trying to make it so. To be clear, I believe that write with immediate should be part of the normal APIs, rather than an extension, but should be designed around those devices that provide it natively. I totally agree. A standard RDMA write with immediate API can be very useful to RDMA applications based on the requirements (native support) set forth in my earlier email. It is analogous to the new dat_ep_post_send_with_invalidate() call; a call that supports a native iWARP transport operation but provides no provisions to help other transports emulate. So, other transports simply return NOT_SUPPORTED and add it natively in the future if it makes sense. -arlin What is proposed in a definition of 'dat_ep_post_rdma_write_with_immediate' that can be implemented over iWARP using the sequence of messages that were intended to support the same purpose (i.e., letting the other side know that an RDMA Write transfer has been fully received). No, iWARP *CAN NOT* implement write immediate data any better than IB can implement send with invalidate. Immediate data *MUST* be indicated to the ULP unambiguously. Imposing an algorithm on the application to infer immediate data arrival is hack, pure and simple. An application is free to perform a write/send if that is the semantic they want. Why does iWARP get transport unique APIs but not IB? I find this attempt to bastardize the IB semantic of immediate data a little curious. The transports aren't getting anything. Features are there for applications, especially when the feature can be defined in a way that makes sense without explaining transport mechanics. Completing a transaction, complete with supplying a transaction response and releasing the advertised STag associated with the transaction is something that makes sense in the application domain and conforms to normal DAT ordering rules. Provide information about an RDMA Write to a receive operation also meets that definition -- as long as it conforms to the existing ordering rules. Shifting to an 8 byte message over iWARP to allow for the write length *and* immediate 'tag' is certainly doable. We could even consider having the DAT Provider supply the 'buffer' silently in the DTO itself. With that definition the consumer would get a receive completion that told them that their peer's RDMA Write had been successfully placed, how long it is (the length) and which one (a tag). I think that is of value. iWARP can implement it as two work requests and maintain the overall semantics. Are you arguing that iWARP should NOT provide this service until it can do it in a single work request? It seems to me that allowing an extra work request and completion is a fairly simple accomodation as opposed to using an alternate algorithm in the main transaction processing of the application. If we enable the applicatin can query how a remote write with immediate will complete outside of the transaction loop then we can allow the application to have *no* overhead inside the main transaction loop, and *identical* logic on the sending side. And IB *could* implement send with invalidate by simply agreeing on how the RKey to be invalidated is communicated between the IB providers (perhaps as an immediate). But more to the point, I don't see how the more flexible definition of write with immediate negatively impacts the IB implementation of the feature. IB providers do not need to allow for the extra work requests. They are not being asked to place the immediate data into the receive buffer, or to do any extra work at all. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
IB does optionally support send_with_invalidate as defined in IBTA 1.2 spec. OpenIB does not support this yet but this is a different matter. So this is bad analogy. The better analogy is socket based CM. But I am still not clear what you are advocating: extensions, IB specific API or something else. I advocate a write with immediate data API that delivers immediate data to the target ULP *unambiguously*. That is, the ULP need never infer from buffer contents or receive completion timing that a write with immediate has taken place. If it is not granted formal API status, I advocate implementation as a DAPL extension. The notion of a combined request API is orthogonal so I won't pursue it any further in this thread. Roy ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
What is proposed in a definition of 'dat_ep_post_rdma_write_with_immediate' that can be implemented over iWARP using the sequence of messages that were intended to support the same purpose (i.e., letting the other side know that an RDMA Write transfer has been fully received). No, iWARP *CAN NOT* implement write immediate data any better than IB can implement send with invalidate. Immediate data *MUST* be indicated to the ULP unambiguously. Imposing an algorithm on the application to infer immediate data arrival is hack, pure and simple. An application is free to perform a write/send if that is the semantic they want. Why does iWARP get transport unique APIs but not IB? I find this attempt to bastardize the IB semantic of immediate data a little curious. The transports aren't getting anything. Features are there for applications, especially when the feature can be defined in a way that makes sense without explaining transport mechanics. APIs exist to gain access to transport services so of course it is all about the transport. Presumably the transport services were defined because they seemed useful, but a transport service exists in a standard somewhere before it is defined in DAPL. I believe that the IB immediate data service and semantic is useful and should be supported too. Completing a transaction, complete with supplying a transaction response and releasing the advertised STag associated with the transaction is something that makes sense in the application domain and conforms to normal DAT ordering rules. I don't disagree. And unambiguous immediate data indications fall into that same category which is why I'm puzzled there is so much resistance. Provide information about an RDMA Write to a receive operation also meets that definition -- as long as it conforms to the existing ordering rules. Shifting to an 8 byte message over iWARP to allow for the write length *and* immediate 'tag' is certainly doable. We could even consider having the DAT Provider supply the 'buffer' silently in the DTO itself. If you make the receive indication unambiguous as to the fact it's associated with a write immediate, you've got my full support, even if immediate data is delivered differently by different transports. If not, it is nothing more than a write/send that the application can do itself. With that definition the consumer would get a receive completion that told them that their peer's RDMA Write had been successfully placed, how long it is (the length) and which one (a tag). I think that is of value. iWARP can implement it as two work requests and maintain the overall semantics. If completion of the service is ambiguous, I strongly disagree. The application can do this with write/send now and with more flexibility. True immediate indications are unambiguous and doesn't rely on the contents of a receive buffer or its completion timing. An application must be able to perform normal send/receives of any size and content simultaneously with RDMA write with immediate and without regard to when they arrive. The semantic proposed would put a constraint on how an application could use the send/receive facility. If an application can live with such a constraint, it is free to use write/send now. Those that can't or would perform much better with a legitimate write/immediate should be given access to the facility. Are you arguing that iWARP should NOT provide this service until it can do it in a single work request? I'm arguing that an iWARP provider NOT support this service until it can deliver immediate data indications unambiguously. It seems to me that allowing an extra work request and completion is a fairly simple accomodation as opposed to using an alternate algorithm in the main transaction processing of the application. If we enable the applicatin can query how a remote write with immediate will complete outside of the transaction loop then we can allow the application to have *no* overhead inside the main transaction loop, and *identical* logic on the sending side. I would contend that placing constraints on what and when an application can send normal data just to use write immediate is far far worse. And all just too basically save one extra function call. And IB *could* implement send with invalidate by simply agreeing on how the RKey to be invalidated is communicated between the IB providers (perhaps as an immediate). I'm afraid I don't follow. If you're talking about providers setting up there own private EPs to communicate, perhaps that's a solution for iWARP providers to supply unambiguous immediate data indications But more to the point, I don't see how the more flexible definition of write with immediate negatively impacts the IB implementation of the feature. IB providers do not need to allow for the extra work requests. They are not being asked to place the immediate data into the receive buffer, or to do any extra work at all. This is not
RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
Larsen, Roy K wrote: Completing a transaction, complete with supplying a transaction response and releasing the advertised STag associated with the transaction is something that makes sense in the application domain and conforms to normal DAT ordering rules. I don't disagree. And unambiguous immediate data indications fall into that same category which is why I'm puzzled there is so much resistance. Provide information about an RDMA Write to a receive operation also meets that definition -- as long as it conforms to the existing ordering rules. Shifting to an 8 byte message over iWARP to allow for the write length *and* immediate 'tag' is certainly doable. We could even consider having the DAT Provider supply the 'buffer' silently in the DTO itself. If you make the receive indication unambiguous as to the fact it's associated with a write immediate, you've got my full support, even if immediate data is delivered differently by different transports. If not, it is nothing more than a write/send that the application can do itself. From the viewpoint of the Provider/RNIC/driver there is no wire Send message which can be known to be associated with an RDMA Write. Up to the maximum message size, any combination of bytes are legal. Therefore this is a distinct that an iWARP provider CANNOT make. Unlike send with invalidate, which CAN be supported under IB 1.2. Keep in mind that under the basic DAT semantics, RDMA Writes are *not* signalled to the Data Sink. That was settled years ago. So under DAT semantics you use a Send to cause a completion at the other end -- period. What we are talking about is whether to allow short sends that supply a minimal set of standard information to be piggy-backed on a prior RDMA Write by making it an RDMA Write with immediate. I am not opposed to allowing IB Providers to do that. I am opposed to changing the fundamental DAT semantics that RDMA Writes are not visible the Data Sink. Conceptually, a Send is required. And as with all Send Messages, it is up to the *application* to ensure tha their meaning is known at the Data Sink. This can be done by ordering and/or content of the data. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
Completing a transaction, complete with supplying a transaction response and releasing the advertised STag associated with the transaction is something that makes sense in the application domain and conforms to normal DAT ordering rules. I don't disagree. And unambiguous immediate data indications fall into that same category which is why I'm puzzled there is so much resistance. Provide information about an RDMA Write to a receive operation also meets that definition -- as long as it conforms to the existing ordering rules. Shifting to an 8 byte message over iWARP to allow for the write length *and* immediate 'tag' is certainly doable. We could even consider having the DAT Provider supply the 'buffer' silently in the DTO itself. If you make the receive indication unambiguous as to the fact it's associated with a write immediate, you've got my full support, even if immediate data is delivered differently by different transports. If not, it is nothing more than a write/send that the application can do itself. From the viewpoint of the Provider/RNIC/driver there is no wire Send message which can be known to be associated with an RDMA Write. Up to the maximum message size, any combination of bytes are legal. Therefore this is a distinct that an iWARP provider CANNOT make. Unlike send with invalidate, which CAN be supported under IB 1.2. So, your stance is that if an RDMA transport protocol specification exists that can't support or emulate a service faithfully, an API can't exist for that service in DAPL. Ok, that makes this discussion much clearer and to the point. Keep in mind that under the basic DAT semantics, RDMA Writes are *not* signalled to the Data Sink. That was settled years ago. So under DAT semantics you use a Send to cause a completion at the other end -- period. Addressed below... What we are talking about is whether to allow short sends that supply a minimal set of standard information to be piggy-backed on a prior RDMA Write by making it an RDMA Write with immediate. That is not what those advocating immediate data have been talking about. It has always been whether the IB capability could be exposed to the ULP. Remember, the first proposal by Arlin was to make this an extension. This list wanted to expose it as a formal API. So, I find that assertion puzzling. I am not opposed to allowing IB Providers to do that. I am opposed to changing the fundamental DAT semantics that RDMA Writes are not visible the Data Sink. Conceptually, a Send is required. Conceptual, eh? Well, of course IB immediate data *is* indicated on the receive queue. Not conceptual enough? But that aside, it is a rather strict and convenient interpretation. Are you sure you want to put a stake that deep in the ground about all currently defined DAPL semantics against transport standards that evolve, or just those that can't be implemented by all transports? I was under the assumption that the DAT community defined the APIs and semantics through an open process. Given that the IB write immediate data facility does not break the implementation or semantics of the currently defined RDMA write facility, I see no reason the DAPL spec couldn't be updated, through consensus, with the realities of existing transport services. Nevertheless, I presume you'll have no objection to implementing this useful service as a DAPL extension since the semantic rules for extensions haven't been define yet. Roy ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
We have problem no matter which option we choose. The current Transport Level Requirement state: There is a one-to-one correspondence between send operation on one Endpoint of the Connection and recv operations on the other Endpoint of the Connection. There is no correspondence between RDMA operations on one Endpoint of the Connection and recv or send data transfer operation on the other Endpoint of the Connection. Receive operations on a Connection must be completed in the order of posting of their corresponding sends. The Immediate data and Atomic ops violate these requirements including ordering rules. I had started updating these rules when I generated the first draft of the requirements. They are included in the enclosed pdf file. But they do not cover Atomic ops that also impact transport requirements. This chapter of the spec have not been changed since DAPL 1.0 and I am very concern with any changes to it. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Caitlin Bestler [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 07, 2006 6:57 PM To: Larsen, Roy K; [EMAIL PROTECTED]; Arlin Davis; Hefty, Sean Cc: openib-general@openib.org Subject: RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal [EMAIL PROTECTED] wrote: I was under the assumption that the DAT community defined the APIs and semantics through an open process. Given that the IB write immediate data facility does not break the implementation or semantics of the currently defined RDMA write facility, I see no reason the DAPL spec couldn't be updated, through consensus, with the realities of existing transport services. Nevertheless, I presume you'll have no objection to implementing this useful service as a DAPL extension since the semantic rules for extensions haven't been define yet. Roy That is correct, because as an extension the user would not expect normal semantics to still be guaranteed. Yahoo! Groups Links * To visit your group on the web, go to: http://groups.yahoo.com/group/dat-discussions/ * To unsubscribe from this group, send an email to: [EMAIL PROTECTED] * Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/ transport_req_020706.pdf Description: transport_req_020706.pdf ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general