Re: [openib-general] dapl broken for iWARP
Steve, what is an issue of using max_qp_rd_atom and max_qp_init_rd_atom beside the bad name? Thanks, Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Steve Wise [mailto:[EMAIL PROTECTED] Sent: Thursday, February 08, 2007 6:11 PM To: Arlin Davis Cc: openib-general Subject: Re: [openib-general] dapl broken for iWARP On Wed, 2007-02-07 at 15:57 -0600, Steve Wise wrote: On Wed, 2007-02-07 at 14:02 -0600, Steve Wise wrote: Arlin, The OFED dapl code is assuming the responder_resources and initiator_depth passed up on a connection request event are from the remote peer. This doesn't happen for iWARP. In the current iWARP specifications, its up to the application to exchange this information somehow. So these are defaulting to 0 on the server side of any dapl connection over iWARP. This is a fairly recent change, I think. We need to come up with some way to deal with this for OFED 1.2 IMO. The IWCM could set these to the device max values for instance. Steve. There is a slight problem with all this. There are no device attributes currently for ORD and IRD. The ammasso driver maps these to max_qp_rd_atom (IRD) and max_qp_init_rd_atom(ORD). But this is screwy. We need new attribute for these. For OFED 1.2, I think I should just have the IWCM set them to 8. The only RNIC in ofed is cxgb3 and it supports 8... Steve. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] dapl broken for iWARP
That is correct. I am working with Krishna on it. Expect patches soon. By the way the problem is not DAPL specific and so is a proposed solution. There are 3 aspects of the solution. One is APIs. We suggest that we do not augment these. That is a connection requestor sets its QP RDMA ORD and IRD. When connection is established user can check the QP RDMA ORD and IRD to see what he has now to use over the connection. We may consider to extend QP attributes to support transport specific parameters passing in the future. For example, iWARP MPA CRC request. Second is the semantic that CM provides. The proposal is to match IBCM semantic. That is CM guarantee that local IRD is = remote ORD. This guarantees that incoming RDMA Read requests will not overwhelm the QP RDMA Read capabilities. Again there is not changes to IBCM only to IWCM. Notice that as part of this IWCM will pass down to driver and extract from driver needed info. The final part is iWARP CM extension to exchange RDMA ORD, IRD. This is similar to IBTA Annex for IP Addressing. The harder part that this will eventually require IETF MPA spec extension, and the fact that MPA protocol is implemented in RNIC HW by many vendors, and hence can not be done by IWCM itself. Thanks, Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Steve Wise [mailto:[EMAIL PROTECTED] Sent: Wednesday, February 07, 2007 6:12 PM To: Arlin Davis Cc: openib-general Subject: Re: [openib-general] dapl broken for iWARP On Wed, 2007-02-07 at 15:05 -0800, Arlin Davis wrote: Steve Wise wrote: On Wed, 2007-02-07 at 14:02 -0600, Steve Wise wrote: Arlin, The OFED dapl code is assuming the responder_resources and initiator_depth passed up on a connection request event are from the remote peer. This doesn't happen for iWARP. In the current iWARP specifications, its up to the application to exchange this information somehow. So these are defaulting to 0 on the server side of any dapl connection over iWARP. This is a fairly recent change, I think. We need to come up with some way to deal with this for OFED 1.2 IMO. Yes, this was changed recently to sync up with the rdma_cm changes that exposed the values. The IWCM could set these to the device max values for instance. That would work fine as long as you know the remote settings will be equal or better. The provider just sets the min of local device max values and the remote values provided with the request. I know Krishna Kumar is working on a solution for exchanging this info in private data so the IWCM can do the right thing. Stay tuned for a patch series to review for this. But this functionality is definitely post OFED-1.2. So for the OFED-1.2, I will set these to the device max in the IWCM. Assuming the other side is OFED 1.2 DAPL, then it will work fine. Steve. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] SVN deprication
Thanks Jeff. This works. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Jeff Squyres [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 17, 2007 3:30 PM To: Kanevsky, Arkady Cc: openib-general@openib.org Subject: Re: [openib-general] SVN deprication SVN is still available, but it is at a new URL: https://svn.openfabrics.org/svn/openib. All the history and everything should be there; let me know if you have any problems. On Jan 17, 2007, at 3:11 PM, Arkady Kanevsky wrote: Jeff and Co, Is there a way to find out the date of a specific SVN revision #? I can no longer access svn: svn info -r 5400 https://openfabric.org/svn svn: PROPFIND request failed on '/svn' svn: PROPFIND of '/svn': could not connect to server (https:// openfabric.org) Is the SVN server depricated for good? Do we have an SVN log somewhere in a git? If yes, how can I find the correlation between Linux version and SVN revision? Thanks, Arkady ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/ openib-general -- Jeff Squyres Server Virtualization Business Unit Cisco Systems ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenFabrics Developer Summit at SC06, Tampa Nov 16 - 17
Bill, 2 small changes to the diagram on slide 6. SRP box should be yellow since it is IB specific. Drop the word "R-NIC" from the User APIs box. I think we can improve this diagram message. Both kernel and user API boxes for "verbs/API" should be non-colored "common". Thanks, Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 From: Bill Boas [mailto:[EMAIL PROTECTED] Sent: Sunday, October 15, 2006 5:03 PMTo: [EMAIL PROTECTED]; [EMAIL PROTECTED]; openib-general@openib.org; [EMAIL PROTECTED]Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; 'Kyril Faenov'; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; 'Jeffrey Scott'; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; Kianoosh Naghshineh; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]Subject: [openib-general] OpenFabrics Developer Summit at SC06, Tampa Nov 16 - 17 To all in the OpenFabrics Community We will be holding our first Developer Summit in the Tampa Convention Center courtesy of SC06 starting at 1.30PM in Room 17 on Thursday November 16, 2006. On Friday November 17, we will start in Room 13 at 8.00 AM and continue till 5.00PM. We have had to schedule into these time slots because no other usable space is available at any other times during the week of SC06! OpenFabrics will cater food and beverages for afternoon break and supper on Thursday, breakfast, lunch and two breaks on Friday. We will set up a registration site at Acteva to collect $$ to cover our out of pocket expenses Ill email out the URL for that site in the next day or two. Please review attached Strawman purposes, suggested attendees and agenda. Any changes or comments, please email them to the community for all to comment on please. The Summit has several dimensions and themes throughout our work there: 1) consistency and robustness of the Linux and Windows software stacks for Release 2.0 of OpenFabrics; 2) - feature selection, development resources and timelines for Release 2.0; 3) - activities, features and processes of the Enterprise Working Group on OFED 1.x until Release 2.0 is ready hand-off to the EWG; 4) enhancing the resources of the EWG to be ready for 2.0 it so that it may be subsequently be distributed as OFED 2.0. and adopted by the OpenFabrics vendor and customer communities for production use. This is a far too much work for just a day and half! PLEASE START NOW exchanging ideas for additional features, contact peer engineers from companies and customers to discuss work sizing, development resources, identify volunteer developers for items so that when we meet on the 16th were not starting from a blank sheet! Sujal Das, Johann George, Matt Leininger, Pramod Srivatsa, Hal Rosenstock, Tom Tucker and Bob Woodruff are leading the pre-meeting, STRAWMAN collation of requirements, feature prioritization, developer assignments, sizing and processes so that we have the list largely complete prior to the meeting and people know has already volunteered for items from the list. Bill Boas VP, Business Development | System Fabric Works [EMAIL PROTECTED] | 510-375-8840 ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] posting send requests in RTR
If a QP is not in the RTS state then Send post should be flushed to CQ for IB. This fact need to be preserved so ULP can ensure that for Completion Suppression Sends have been completed. Thanks, Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Caitlin Bestler [mailto:[EMAIL PROTECTED] Sent: Thursday, July 27, 2006 6:19 PM To: Sean Hefty; Rimmer, Todd; Michael S. Tsirkin Cc: Or Gerlitz; Roland Dreier; openib-general@openib.org Subject: Re: [openib-general] posting send requests in RTR Sean Hefty wrote: Alternately, it would be reasonable to simply document that a receive completion *implied* a connection established event, and therefore the application could post to the send queue after it reaped a receive completion (or got a connection established event). The problem is that the QP is not in the RTS state, so cannot accept sends. Well, I suppose if your adapter can be in a state where it has completed a receive work request for a connection but is not yet convinced that that connection is established then it would have to queue those work completions somewhere. If that is all you are proposing then I have no objections, an iWARP adapter can never be in such a state. But I am curious as to why completing a receive work request does not place the QP in the RTS state since the end-to-end QP pairing has obviously been confirmed, and therefore the QP can send. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] mthca_reset question
Here is an extract from the mthca_reset.c /* * Reset the chip. This is somewhat ugly because we have to * save off the PCI header before reset and then restore it * after the chip reboots. We skip config space offsets 22 * and 23 since those have a special meaning. * * To make matters worse, for Tavor (PCI-X HCA) we have to * find the associated bridge device and save off its PCI * header as well. */ if (!(mdev-mthca_flags MTHCA_FLAG_PCIE)) { /* Look for the bridge -- its device ID will be 2 more than HCA's device ID. */ while ((bridge = pci_get_device(mdev-pdev-vendor, mdev-pdev-device + 2, bridge)) != NULL) { if (bridge-hdr_type== PCI_HEADER_TYPE_BRIDGE bridge-subordinate == mdev-pdev-bus) { mthca_dbg(mdev, Found bridge: %s\n, pci_name(bridge)); break; } } First, Why do we check for not PCIE instead of PCIX? Second, why while instead of if? Most interesting, third, Why is bridge device ID 2 more than HCA device ID? What is this hack rely/depends on? Can we find a device parent which should be a bridge instead? Thanks, Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] cache.c
Roland, in core/cache.c should device-cache.gid_cache = kmalloc(sizeof *device-cache.pkey_cache * (end_port(device) - start_port(device) + 1), GFP_KERNEL); be device-cache.gid_cache = kmalloc(sizeof *device-cache.gid_cache * (end_port(device) - start_port(device) + 1), GFP_KERNEL); Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
Why both Immediate Data and the Stag which was used for RDMA Write? Immediate data already contains info in response to what operation the RDMA Write has completed locally. Stag would make sence if Stag invalidation also put in the mix. But for MPI RMR_context have a long lifecycle so not clear which apps will be interested in combining Invalidation with RDMA Write with Immediate data. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Caitlin Bestler [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 07, 2006 3:03 PM To: Larsen, Roy K; [EMAIL PROTECTED]; Arlin Davis; Hefty, Sean Cc: openib-general@openib.org Subject: RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal [EMAIL PROTECTED] wrote: Caitlin Bestler wrote: Arlin Davis wrote: Sean Hefty wrote: The requirement is to provide an API that supports RDMA writes with immediate data. A send that follows an RDMA write is not immediate data, and the API should not be constructed around trying to make it so. To be clear, I believe that write with immediate should be part of the normal APIs, rather than an extension, but should be designed around those devices that provide it natively. I totally agree. A standard RDMA write with immediate API can be very useful to RDMA applications based on the requirements (native support) set forth in my earlier email. It is analogous to the new dat_ep_post_send_with_invalidate() call; a call that supports a native iWARP transport operation but provides no provisions to help other transports emulate. So, other transports simply return NOT_SUPPORTED and add it natively in the future if it makes sense. -arlin What is proposed in a definition of 'dat_ep_post_rdma_write_with_immediate' that can be implemented over iWARP using the sequence of messages that were intended to support the same purpose (i.e., letting the other side know that an RDMA Write transfer has been fully received). No, iWARP *CAN NOT* implement write immediate data any better than IB can implement send with invalidate. Immediate data *MUST* be indicated to the ULP unambiguously. Imposing an algorithm on the application to infer immediate data arrival is hack, pure and simple. An application is free to perform a write/send if that is the semantic they want. Why does iWARP get transport unique APIs but not IB? I find this attempt to bastardize the IB semantic of immediate data a little curious. The transports aren't getting anything. Features are there for applications, especially when the feature can be defined in a way that makes sense without explaining transport mechanics. Completing a transaction, complete with supplying a transaction response and releasing the advertised STag associated with the transaction is something that makes sense in the application domain and conforms to normal DAT ordering rules. Provide information about an RDMA Write to a receive operation also meets that definition -- as long as it conforms to the existing ordering rules. Shifting to an 8 byte message over iWARP to allow for the write length *and* immediate 'tag' is certainly doable. We could even consider having the DAT Provider supply the 'buffer' silently in the DTO itself. With that definition the consumer would get a receive completion that told them that their peer's RDMA Write had been successfully placed, how long it is (the length) and which one (a tag). I think that is of value. iWARP can implement it as two work requests and maintain the overall semantics. Are you arguing that iWARP should NOT provide this service until it can do it in a single work request? It seems to me that allowing an extra work request and completion is a fairly simple accomodation as opposed to using an alternate algorithm in the main transaction processing of the application. If we enable the applicatin can query how a remote write with immediate will complete outside of the transaction loop then we can allow the application to have *no* overhead inside the main transaction loop, and *identical* logic on the sending side. And IB *could* implement send with invalidate by simply agreeing on how the RKey to be invalidated is communicated between the IB providers (perhaps as an immediate). But more to the point, I don't see how the more flexible definition of write with immediate negatively impacts the IB implementation of the feature. IB providers do not need to allow for the extra work requests. They are not being asked to place the immediate data into the receive buffer, or to do any extra work at all. Yahoo! Groups
RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
Caitlin, can you clarify this. Are you proposing that Consumer encode a bit of Immediate Data to specify that it is immediate data? iWARP will pass it in Send message and IB in Immediate Data. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Caitlin Bestler [mailto:[EMAIL PROTECTED] Sent: Thursday, February 09, 2006 2:40 PM To: Arlin Davis; Roland Dreier Cc: [EMAIL PROTECTED]; openib-general@openib.org Subject: RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal [EMAIL PROTECTED] wrote: Roland Dreier wrote: Hmm. Can you put a number on how much better RDMA write with immediate is on current HCA hardware? How does using the underlying OpenIB verbs ability to post a list of work requests compare (ie posting an RDMA write followed by a send in one verbs call)? Maybe post multiple is a better direction for DAT. With post multiple, unlike immediate data, you don't have the ability to distinguish between a normal receive and a rdma write completion indication on the other end. This is the uniqueness of the service that cannot be provided by the post multiple. Yes, post multiple would be a nice option for DAT it is just a different service. It would also be required to conform to the semantics rules of the bundled operations so you could not do any optimization tricks under the covers with an IB rdma_write_immediate operation. A post_multiple also requires defining a single DTO data structure. If the post multiple is atomic (meaning all make it or none do) then it requires an intermediate data structure to have been created. If it is not atomic there really isn't reason for it to not just be a utility function layered above DAT. What I'm not seeing with the immediate is this urgent need by the application to be able to use the same 32-bit value for both an immediate and a 4 byte message that requires an entire additional API just to support it. Why can't the application just add a bool to the send message? Or encode the 32-bits so that they come from disjoint domains? There seems to be agreement that a consolidated write-and-send call would enable the application to get the benefits of rdma write with immediate whenever the application could distinguish the two. I cannot see why doing this is almost free for virtually all applications, and trivial for the remainder. Adding and documenting an extra call to deal with such an extreme corner case that is being presented only in the abstract is just not justified. This extra capability has to have enough functionality for enough applications to justify keeping it on the books, writing test cases for it, etc. We already made a similar decision in having a 128-bit IA Address. That means we cannot support a host that interfaces to the Internet with IPv6 and an InfiniBand network that not only had global GIDs, but allocated a global subnetwork a network id that was already in use as a valid public IPv6 network. The complexity of dealing with an IA Address that was 128+1 bits was simply not jusitified to deal with an extreme corner case that could very easily be avoided (there is no shortage of site local network IDs in the IPv6/GID format, so using a global network prefix that was disjoint from the official IPv6 hierarchy would be just plain silly). So far I haven't seen any explanation as to why an application has a need to encode this 33rd bit of their message in this terribly transport specific matter. Is there some severe performance penalty to slightly restructuring the send message so that it is no longer ambiguous with the immeidate data? Yahoo! Groups Links * To visit your group on the web, go to: http://groups.yahoo.com/group/dat-discussions/ * To unsubscribe from this group, send an email to: [EMAIL PROTECTED] * Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/ ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general][RFC] DAT2.0immediatedataproposal
Mike, but then the combined operation can as easily be handle by a "multiple post operation". What is the need specific transport-independent RDMA Write with immediate data. I am still concern over the need of Consumer Recv side to separate recv of Immediate Data from "regular" Recv. Consumer "knows" what it expect to match the posted Recv. There is one to one mapping between non-pure RDMA transfer ops of one side with Recv of another. Sure ULP may use the same size buffers for all. But how many ULPs mix the Immediate Data size messages ( 4 bytes on IB ) with normal Sends of the same exact size. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 From: Michael Krause [mailto:[EMAIL PROTECTED] Sent: Thursday, February 09, 2006 3:25 PMTo: Arlin DavisCc: [EMAIL PROTECTED]; openib-general@openib.orgSubject: Re: [dat-discussions] [openib-general][RFC] DAT2.0immediatedataproposal At 03:36 PM 2/8/2006, Arlin Davis wrote: Roland Dreier wrote: Michael So, here we have a long discussion on attempting to Michael perpetuate a concept that is not universal across Michael transports and was deemed to have minimal value that most Michael wanted to see removed from the architecture.But this discussion is being driven by an application developer whodoes see value in immediate data.Arlin, can you quantify the benefit you see from RDMA write withimmediate vs. RDMA write followed by a send?We need speed and simplicity.A very latency sensitive application that requires immediate notification of RDMA write completion on the remote node without ANY latency penalties associated with combining operations, HCA priority rules across QPs, wire congestion, etc. An application that has no requirement for messaging outside of remote rdma write completion notifications. The application would not have to register and manage additional message buffers on either side, we can just size the queues accordingly and post zero byte messages. We need something that would be equivelent to setting there polling on the last byte of inbound data. But, since data ordering within an operation is not guaranteed that is not an option. So, rdma with immediate data is the most optimal and simplistic method for indication of RDMA-write completion that we have available today. In fact, I would like to see it increased in size to make it even more useful.RDMA Write with Immediate is part of the IB Extended Transport Header. It is a fixed-sized quantity and not one subject to change, i.e. increasing its size.Your argument above reinforces that the particular application need is IB-specific and thus should not be part of a general API but a transport-specific API. If the application will only operate optimally using immediate data, then it is only suitable for an IB fabric. This reinforces the need for a transport-specific API.Those applications that simply want to enable completion notification when a RDMA Write has occurred can use a general purpose API that is interconnect independent and whose code is predicated upon a RDMA Write - Send set of operations. This will enable application portability across all interconnect types.Mike ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC]DAT2.0immediatedataproposal
Roy, and if tomorrow iWARP decides to support Immediate data with variable length. API does not changes. Semantic does not changes and IB will not be able to support it. I am trying to define the semantic and API which will not have to be modified for each rev of the transport. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Larsen, Roy K [mailto:[EMAIL PROTECTED] Sent: Thursday, February 09, 2006 3:32 PM To: [EMAIL PROTECTED]; Arlin Davis; Roland Dreier Cc: openib-general@openib.org Subject: RE: [dat-discussions] [openib-general] [RFC]DAT2.0immediatedataproposal Hmm. Can you put a number on how much better RDMA write with immediate is on current HCA hardware? How does using the underlying OpenIB verbs ability to post a list of work requests compare (ie posting an RDMA write followed by a send in one verbs call)? Maybe post multiple is a better direction for DAT. With post multiple, unlike immediate data, you don't have the ability to distinguish between a normal receive and a rdma write completion indication on the other end. This is the uniqueness of the service that cannot be provided by the post multiple. Yes, post multiple would be a nice option for DAT it is just a different service. It would also be required to conform to the semantics rules of the bundled operations so you could not do any optimization tricks under the covers with an IB rdma_write_immediate operation. A post_multiple also requires defining a single DTO data structure. If the post multiple is atomic (meaning all make it or none do) then it requires an intermediate data structure to have been created. If it is not atomic there really isn't reason for it to not just be a utility function layered above DAT. That is very good point. And since the emulated immediate data service can't make the atomic guarantee it is the killer argument for just making the service plain - a potentially more efficient write/send. What I'm not seeing with the immediate is this urgent need by the application to be able to use the same 32-bit value for both an immediate and a 4 byte message that requires an entire additional API just to support it. Why can't the application just add a bool to the send message? Or encode the 32-bits so that they come from disjoint domains? Some applications can do as you suggest. Some applications can make good use of unambiguous indications where the buffer size, content, or arrival timing is not constrained. Some don't need write notification at all. What's your point? There seems to be agreement that a consolidated write-and-send call would enable the application to get the benefits of rdma write with immediate whenever the application could distinguish the two. Well, I think there is agreement that *some* applications can use write-and-send in a beneficial way. But then again, nothing prevents them from doing that now. They do not need an additional API. But again, I don't have an issue with defining a helper function. I do have an issue with defining an API and semantic that says the target side needs to be coded in a way to always deal with both true immediate data and emulation. Just define a write/send helper API and the UPL can be coded in a consistent manner if that is a beneficial service. If a true unambiguous indication service is more beneficial or required, it can use the extension and accept the extra complexity. To demand extra complexity in applications that obviously don't need the true immediate data semantic is just wrong in my option. I cannot see why doing this is almost free for virtually all applications, and trivial for the remainder. Adding and documenting an extra call to deal with such an extreme corner case that is being presented only in the abstract is just not justified. This extra capability has to have enough functionality for enough applications to justify keeping it on the books, writing test cases for it, etc. All we're asking is that a write/send combined API not be called immediate data unless it fits the semantics of immediate data. I am puzzled at the resistance this is getting. There is a standards body specification for immediate data. If it is not followed, don't call it immediate data. It's that simple. For those transports that can provide the service, the UPL may be able to gain access to it through an extension. Roy ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit
RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
Arlin, This can be done. But I have an issue that extension call violate Transport Requirement. Currently, the matching semantic is well-defined since Recv only matches Send. Since Spec does not have any idea what operations are defined in extension(s) there is a problem with the transport requirements. We can, of course, make some generic statement that with does not cover APIs that are defined in extensions. The API requirements are easier to handle. Since they have been written as Nonrequirement for the APIs we decide to define yet. (I will need to review chapter 5 to make we had followed this in all cases.) Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Arlin Davis [mailto:[EMAIL PROTECTED] Sent: Thursday, February 09, 2006 5:57 PM To: Michael Krause Cc: [EMAIL PROTECTED]; openib-general@openib.org; Kanevsky, Arkady Subject: Re: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal Michael Krause wrote: RDMA Write with Immediate is part of the IB Extended Transport Header. It is a fixed-sized quantity and not one subject to change, i.e. increasing its size. Your argument above reinforces that the particular application need is IB-specific and thus should not be part of a general API but a transport-specific API. If the application will only operate optimally using immediate data, then it is only suitable for an IB fabric. This reinforces the need for a transport-specific API. I agree. I will move the IB immediate data service back into the extension interface and update the OpenIB uDAPL provider patch. Those applications that simply want to enable completion notification when a RDMA Write has occurred can use a general purpose API that is interconnect independent and whose code is predicated upon a RDMA Write - Send set of operations. This will enable application portability across all interconnect types. I will defer this to Arkady to draft. -arlin ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
One more issue to discuss. Does Completion of Recv that matches RDMA Write with Immediate Data automatically sync local memory or Consumer still need to do lmr_sync_rdma_write prior to accessing RDMAed data. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Caitlin Bestler [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 07, 2006 7:40 PM To: [EMAIL PROTECTED]; Larsen, Roy K; Arlin Davis; Hefty, Sean Cc: openib-general@openib.org Subject: RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal [EMAIL PROTECTED] wrote: We have problem no matter which option we choose. The current Transport Level Requirement state: There is a one-to-one correspondence between send operation on one Endpoint of the Connection and recv operations on the other Endpoint of the Connection. There is no correspondence between RDMA operations on one Endpoint of the Connection and recv or send data transfer operation on the other Endpoint of the Connection. Receive operations on a Connection must be completed in the order of posting of their corresponding sends. The Immediate data and Atomic ops violate these requirements including ordering rules. I had started updating these rules when I generated the first draft of the requirements. They are included in the enclosed pdf file. But they do not cover Atomic ops that also impact transport requirements. This chapter of the spec have not been changed since DAPL 1.0 and I am very concern with any changes to it. Arkady If RDMA Write with Immediate is viewed as being the equivalent of doing RDMA Write and then an RDMA Send the correspondence rule is maintained. But *only* if the rdma write with immediate has all of the semantics of a Send. Atomics do not violate the rules if you view them as being a variation on an RDMA Read. They are an RDMA Read with modify. The real question is whether it makes sense to put it in the RDMA device. It is also not subject to emulation at a highe layer. With send with invalidate we know how InfiniBand *will* support it, because of the IB 1.2 verbs. We do not know that for atomics over iWARP. We do not know whether it will be added, more importantly we do not know *how* it would be added if it were added. That makes coming up with a transport neutral definition very premature. In particular, if atomics were added to iWARP there is a distinct design option where it would *not* be the same work queue as RDMA Reads (adding atomics through Queue ID 3 would make layering on top of a current implementation much easier. But it would mean that atomic credits would be distinct from read credits. This is a very strong reason to defer attempting to define RDMA Atomics in a transport neutral fashion. Yahoo! Groups Links * To visit your group on the web, go to: http://groups.yahoo.com/group/dat-discussions/ * To unsubscribe from this group, send an email to: [EMAIL PROTECTED] * Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/ ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC] DAT 2.0immediatedataproposal
But each of the multiple work requests follow the semantic of single completion per work request. It can be controlled by completion_flags but it still not a semantic of a single post. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Sean Hefty [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 07, 2006 10:39 AM To: 'Caitlin Bestler'; Kanevsky, Arkady; Larsen, Roy K; [EMAIL PROTECTED]; Sean Hefty Cc: openib-general@openib.org Subject: RE: [dat-discussions] [openib-general] [RFC] DAT 2.0immediatedataproposal And further it is only on the receiving side. And only if the receiving side cares about the data (sometimes it only needs the notification). The send size cares about this check because it must size its SQ appropriately. I disagree with the assumption that a transport neutral API is inherently easier for the application developer. The attempt is to define a composite work request that can reduce the number of actual work requests required for some providers, without requiring different work flows dependent on whether the immediate feature was present. This is exactly what Roy was pointing out. This is no longer defining a write with immediate data, but instead addressing some other requirement. In this case, you can define a generic send side API that takes multiple work requests as input, since a provider may be able to reduce the actual number of work requests in this case as well. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC] DAT 2.0immediatedataproposal
All 3 options: proposed APIs, extensions, or IB semantic API all provide the same performance benefit on IB. But the last option is the easiest to use. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Sean Hefty [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 07, 2006 11:12 AM To: Kanevsky, Arkady; Caitlin Bestler; Larsen, Roy K; [EMAIL PROTECTED]; Sean Hefty Cc: openib-general@openib.org Subject: RE: [dat-discussions] [openib-general] [RFC] DAT 2.0immediatedataproposal Why would any Consumer hook itself on proprietary features and APIs is a different question. Because it provides a real performance benefit. This is the same reason apps code to DAPL versus standard sockets. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
IB does optionally support send_with_invalidate as defined in IBTA 1.2 spec. OpenIB does not support this yet but this is a different matter. So this is bad analogy. The better analogy is socket based CM. But I am still not clear what you are advocating: extensions, IB specific API or something else. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Larsen, Roy K [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 07, 2006 2:46 PM To: [EMAIL PROTECTED]; Arlin Davis; Hefty, Sean Cc: Kanevsky, Arkady; Sean Hefty; openib-general@openib.org Subject: RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal Caitlin Bestler wrote: Arlin Davis wrote: Sean Hefty wrote: The requirement is to provide an API that supports RDMA writes with immediate data. A send that follows an RDMA write is not immediate data, and the API should not be constructed around trying to make it so. To be clear, I believe that write with immediate should be part of the normal APIs, rather than an extension, but should be designed around those devices that provide it natively. I totally agree. A standard RDMA write with immediate API can be very useful to RDMA applications based on the requirements (native support) set forth in my earlier email. It is analogous to the new dat_ep_post_send_with_invalidate() call; a call that supports a native iWARP transport operation but provides no provisions to help other transports emulate. So, other transports simply return NOT_SUPPORTED and add it natively in the future if it makes sense. -arlin What is proposed in a definition of 'dat_ep_post_rdma_write_with_immediate' that can be implemented over iWARP using the sequence of messages that were intended to support the same purpose (i.e., letting the other side know that an RDMA Write transfer has been fully received). No, iWARP *CAN NOT* implement write immediate data any better than IB can implement send with invalidate. Immediate data *MUST* be indicated to the ULP unambiguously. Imposing an algorithm on the application to infer immediate data arrival is hack, pure and simple. An application is free to perform a write/send if that is the semantic they want. Why does iWARP get transport unique APIs but not IB? I find this attempt to bastardize the IB semantic of immediate data a little curious. Roy ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal
We have problem no matter which option we choose. The current Transport Level Requirement state: There is a one-to-one correspondence between send operation on one Endpoint of the Connection and recv operations on the other Endpoint of the Connection. There is no correspondence between RDMA operations on one Endpoint of the Connection and recv or send data transfer operation on the other Endpoint of the Connection. Receive operations on a Connection must be completed in the order of posting of their corresponding sends. The Immediate data and Atomic ops violate these requirements including ordering rules. I had started updating these rules when I generated the first draft of the requirements. They are included in the enclosed pdf file. But they do not cover Atomic ops that also impact transport requirements. This chapter of the spec have not been changed since DAPL 1.0 and I am very concern with any changes to it. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Caitlin Bestler [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 07, 2006 6:57 PM To: Larsen, Roy K; [EMAIL PROTECTED]; Arlin Davis; Hefty, Sean Cc: openib-general@openib.org Subject: RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal [EMAIL PROTECTED] wrote: I was under the assumption that the DAT community defined the APIs and semantics through an open process. Given that the IB write immediate data facility does not break the implementation or semantics of the currently defined RDMA write facility, I see no reason the DAPL spec couldn't be updated, through consensus, with the realities of existing transport services. Nevertheless, I presume you'll have no objection to implementing this useful service as a DAPL extension since the semantic rules for extensions haven't been define yet. Roy That is correct, because as an extension the user would not expect normal semantics to still be guaranteed. Yahoo! Groups Links * To visit your group on the web, go to: http://groups.yahoo.com/group/dat-discussions/ * To unsubscribe from this group, send an email to: [EMAIL PROTECTED] * Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/ transport_req_020706.pdf Description: transport_req_020706.pdf ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 immediate dataproposal
Arlin, On Friday we agreed that receiver can not distinguish between 4 byte of Send or 4 bytes of Immediate data if RDMA Write with Immed is implemented as 2 operations: RDMA Write followed by Send. ULP Reciever expects Immediate data that is why it posts Recv. Depending on Transport capability it MAY complete as Recv or as Recv_RDMA_Write_with_Immed_in_event. Neither Provider not Consumer can distinguish between the cases unless there is additional info. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Davis, Arlin R [mailto:[EMAIL PROTECTED] Sent: Monday, February 06, 2006 1:25 PM To: Kanevsky, Arkady; Sean Hefty Cc: [EMAIL PROTECTED]; openib-general@openib.org Subject: RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 immediate dataproposal Arkady, Your requirements are slightly different then the proposed set of requirements. iii) DAPL Provider does not provide any identification that that the Receive operation matches remote RDMA Write with Immediate data if it completes as Receive DTO. - It is up to an ULP to separate Receive completion of remote Send from remote RDMA Write withImmediate Data. Tell me how this is possible? How can the application distinguish between a 4 byte message and a 4 byte immediate data message? We would have to add a new requirement... If the provider supports immediate data in the payload the ULP cannot send a message equal to the immediate data size. -arlin -Original Message- From: Kanevsky, Arkady [mailto:[EMAIL PROTECTED] Sent: Monday, February 06, 2006 8:08 AM To: Sean Hefty; Davis, Arlin R Cc: [EMAIL PROTECTED]; openib-general@openib.org Subject: RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 immediate dataproposal Here are the changes to the existing requirements chapters for RDMA Write with Immediate Data. Feedback please. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Sean Hefty [mailto:[EMAIL PROTECTED] Sent: Friday, February 03, 2006 7:30 PM To: Davis, Arlin R Cc: [EMAIL PROTECTED]; openib-general@openib.org Subject: Re: [dat-discussions] [openib-general] [RFC] DAT 2.0 immediate dataproposal Davis, Arlin R wrote: Applications need an optimized mechanism to notify the receiving end that RDMA write data has completed beyond the two operation method currently used (RDMA write followed by message send). This new RDMA write feature will support 4-bytes of inline data that will be sent Is there any reason to restrict the size of the immediate data? Could you define the API such that the size is variable? I.e. the provider can simply give the immediate data size, with 0 indicating that it is not supported. It should avoid any latency penalties normally associated with a two operation method. I would state this as a requirement. A write followed by a send should be pushed to the application, since they may be able to provide additional optimizations (such as combining operations) beyond what a provider could. The initiating side must expose a 4-byte immediate data parameter for the application to set the inline data. The receiving side must provide a mechanism to accept the 4-byte immediate data. On the receiving side, the write with immediate completion notification is indicated through a receive completion. It is the responsibility of the provider to identify to the application 4-byte immediate data from a normal 4-byte send message. The inline byte ordering is application specific. Requirements look good to me. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 immediatedataproposal
Arlin, It is too strong to state that Consumer should never send a message equal in size to the size of immediate data. Consumer knows from the context which one it is. it may be based on dedicated connection, or based on ULP protocol ordering. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Kanevsky, Arkady Sent: Monday, February 06, 2006 2:05 PM To: Davis, Arlin R; Sean Hefty Cc: [EMAIL PROTECTED]; openib-general@openib.org Subject: RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 immediatedataproposal Arlin, On Friday we agreed that receiver can not distinguish between 4 byte of Send or 4 bytes of Immediate data if RDMA Write with Immed is implemented as 2 operations: RDMA Write followed by Send. ULP Reciever expects Immediate data that is why it posts Recv. Depending on Transport capability it MAY complete as Recv or as Recv_RDMA_Write_with_Immed_in_event. Neither Provider not Consumer can distinguish between the cases unless there is additional info. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Davis, Arlin R [mailto:[EMAIL PROTECTED] Sent: Monday, February 06, 2006 1:25 PM To: Kanevsky, Arkady; Sean Hefty Cc: [EMAIL PROTECTED]; openib-general@openib.org Subject: RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 immediate dataproposal Arkady, Your requirements are slightly different then the proposed set of requirements. iii) DAPL Provider does not provide any identification that that the Receive operation matches remote RDMA Write with Immediate data if it completes as Receive DTO. - It is up to an ULP to separate Receive completion of remote Send from remote RDMA Write with Immediate Data. Tell me how this is possible? How can the application distinguish between a 4 byte message and a 4 byte immediate data message? We would have to add a new requirement... If the provider supports immediate data in the payload the ULP cannot send a message equal to the immediate data size. -arlin -Original Message- From: Kanevsky, Arkady [mailto:[EMAIL PROTECTED] Sent: Monday, February 06, 2006 8:08 AM To: Sean Hefty; Davis, Arlin R Cc: [EMAIL PROTECTED]; openib-general@openib.org Subject: RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 immediate dataproposal Here are the changes to the existing requirements chapters for RDMA Write with Immediate Data. Feedback please. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Sean Hefty [mailto:[EMAIL PROTECTED] Sent: Friday, February 03, 2006 7:30 PM To: Davis, Arlin R Cc: [EMAIL PROTECTED]; openib-general@openib.org Subject: Re: [dat-discussions] [openib-general] [RFC] DAT 2.0 immediate dataproposal Davis, Arlin R wrote: Applications need an optimized mechanism to notify the receiving end that RDMA write data has completed beyond the two operation method currently used (RDMA write followed by message send). This new RDMA write feature will support 4-bytes of inline data that will be sent Is there any reason to restrict the size of the immediate data? Could you define the API such that the size is variable? I.e. the provider can simply give the immediate data size, with 0 indicating that it is not supported. It should avoid any latency penalties normally associated with a two operation method. I would state this as a requirement. A write followed by a send should be pushed to the application, since they may be able to provide additional optimizations (such as combining operations) beyond what a provider could. The initiating side must expose a 4-byte immediate data parameter for the application to set the inline data. The receiving side must provide a mechanism to accept the 4-byte immediate data. On the receiving side, the write with immediate completion notification is indicated through a receive completion. It is the responsibility of the provider to identify to the application 4-byte immediate data from a normal 4-byte send message. The inline byte ordering is application specific. Requirements look good to me
RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 immediatedataproposal
Roy, Can you explain, please? For IB the operation will be layered properly on Transport primitive. And on Recv side it will indicate in completion event DTO that it matches RDMA Write with Immediate and that Immediate Data is in event. For iWARP I expect initially, it will be layered on RDMA Write followed by Send. The Provider can do post more efficiently than Consumer and guarantee atomicity. On Recv side Consumer will get Recv DTO completion in event and Immediate Data inline as specified by Provider Attribute. From the performance point of view Consumers who program to IB only will have no performance degradation at all. But this API also allows Consumers to write ULP to be transport independent with minimal penalty: one binary comparison and extra 4 bytes in recv buffer. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Larsen, Roy K [mailto:[EMAIL PROTECTED] Sent: Monday, February 06, 2006 2:10 PM To: Caitlin Bestler; [EMAIL PROTECTED]; Kanevsky, Arkady; Sean Hefty Cc: openib-general@openib.org Subject: RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 immediatedataproposal If it is up to the ULP to separate out normal receive data from that associated with a write immediate, how is this different from the ULP doing a write followed by a send? If there is no difference, then what we're really talking about is a convenience to the initiating ULP. Perhaps what would be best is to construct an API that allows the ULP to perform standard write/send operations into one call which the underlying provider could optimize into one transaction with the associated interconnect interface. Better yet, a general request combining interface would have even more value, but calling this write/send immediate data is a stretch, if not downright silly. Some transports have true immediate data that provides unique value. There is nothing unique in a write/send sequence - ULPs do it all the time... Roy -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Caitlin Bestler Sent: Monday, February 06, 2006 10:48 AM To: [EMAIL PROTECTED]; Kanevsky, Arkady; Sean Hefty Cc: openib-general@openib.org Subject: RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 immediatedataproposal [EMAIL PROTECTED] wrote: Arkady, Your requirements are slightly different then the proposed set of requirements. iii) DAPL Provider does not provide any identification that that the Receive operation matches remote RDMA Write with Immediate data if it completes as Receive DTO. - It is up to an ULP to separate Receive completion of remote Send from remote RDMA Write with Immediate Data. Tell me how this is possible? How can the application distinguish between a 4 byte message and a 4 byte immediate data message? We would have to add a new requirement... If the provider supports immediate data in the payload the ULP cannot send a message equal to the immediate data size. The data sink knows whether the 4 bytes was sent as a message or as an immediate because it is clear in the ULP context. Possible methods: The expected completion is an immediate. All 4 byte messages are immediates. All 4 byte messages where the ms-byte is X are immediate. If its Tuesday its an immediate. If it's a prime number its an immediate ... But there is no clue from the transport layer. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 immediatedataproposal
good point. I will add this to the requirements and augement the necessary transfered_length text. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Davis, Arlin R [mailto:[EMAIL PROTECTED] Sent: Monday, February 06, 2006 4:17 PM To: Kanevsky, Arkady; Sean Hefty Cc: [EMAIL PROTECTED]; openib-general@openib.org Subject: RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 immediatedataproposal I just want to get consensus on the requirements before we get too far. One thing I forgot is that with Infiniband, the receive with immediate provides the size of the rdma write that just completed. I think we should include this in the requirements since there is ULP value here. -arlin -Original Message- From: Kanevsky, Arkady [mailto:[EMAIL PROTECTED] Sent: Monday, February 06, 2006 11:08 AM To: Kanevsky, Arkady; Davis, Arlin R; Sean Hefty Cc: [EMAIL PROTECTED]; openib-general@openib.org Subject: RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 immediatedataproposal Arlin, It is too strong to state that Consumer should never send a message equal in size to the size of immediate data. Consumer knows from the context which one it is. it may be based on dedicated connection, or based on ULP protocol ordering. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Kanevsky, Arkady Sent: Monday, February 06, 2006 2:05 PM To: Davis, Arlin R; Sean Hefty Cc: [EMAIL PROTECTED]; openib-general@openib.org Subject: RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 immediatedataproposal Arlin, On Friday we agreed that receiver can not distinguish between 4 byte of Send or 4 bytes of Immediate data if RDMA Write with Immed is implemented as 2 operations: RDMA Write followed by Send. ULP Reciever expects Immediate data that is why it posts Recv. Depending on Transport capability it MAY complete as Recv or as Recv_RDMA_Write_with_Immed_in_event. Neither Provider not Consumer can distinguish between the cases unless there is additional info. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Davis, Arlin R [mailto:[EMAIL PROTECTED] Sent: Monday, February 06, 2006 1:25 PM To: Kanevsky, Arkady; Sean Hefty Cc: [EMAIL PROTECTED]; openib-general@openib.org Subject: RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 immediate dataproposal Arkady, Your requirements are slightly different then the proposed set of requirements. iii) DAPL Provider does not provide any identification that that the Receive operation matches remote RDMA Write with Immediate data if it completes as Receive DTO. - It is up to an ULP to separate Receive completion of remote Send from remote RDMA Write with Immediate Data. Tell me how this is possible? How can the application distinguish between a 4 byte message and a 4 byte immediate data message? We would have to add a new requirement... If the provider supports immediate data in the payload the ULP cannot send a message equal to the immediate data size. -arlin -Original Message- From: Kanevsky, Arkady [mailto:[EMAIL PROTECTED] Sent: Monday, February 06, 2006 8:08 AM To: Sean Hefty; Davis, Arlin R Cc: [EMAIL PROTECTED]; openib-general@openib.org Subject: RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 immediate dataproposal Here are the changes to the existing requirements chapters for RDMA Write with Immediate Data. Feedback please. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Sean Hefty [mailto:[EMAIL PROTECTED] Sent: Friday, February 03, 2006 7:30 PM To: Davis, Arlin R Cc: [EMAIL PROTECTED]; openib-general@openib.org Subject: Re: [dat-discussions] [openib-general] [RFC] DAT 2.0 immediate dataproposal Davis, Arlin R wrote: Applications need an optimized mechanism to notify the receiving end that RDMA write data has completed beyond
RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 immediatedataproposal
Roy, comments inline. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Larsen, Roy K [mailto:[EMAIL PROTECTED] Sent: Monday, February 06, 2006 4:25 PM To: Kanevsky, Arkady; Caitlin Bestler; [EMAIL PROTECTED]; Sean Hefty Cc: openib-general@openib.org Subject: RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 immediatedataproposal From: Kanevsky, Arkady [mailto:[EMAIL PROTECTED] Roy, Can you explain, please? For IB the operation will be layered properly on Transport primitive. And on Recv side it will indicate in completion event DTO that it matches RDMA Write with Immediate and that Immediate Data is in event. For iWARP I expect initially, it will be layered on RDMA Write followed by Send. The Provider can do post more efficiently than Consumer and guarantee atomicity. On Recv side Consumer will get Recv DTO completion in event and Immediate Data inline as specified by Provider Attribute. From the performance point of view Consumers who program to IB only will have no performance degradation at all. But this API also allows Consumers to write ULP to be transport independent with minimal penalty: one binary comparison and extra 4 bytes in recv buffer. If the application could be written transport independently, I would have no objection at all. Instead, it must be written in a transport-adaptive way and to be able to adapt to all possible implementations, the application could not send arbitrary immediate-sized data as messages because there is no way to distinguish between them on the receiving side. That is HUGE! It is my experience that send/receive is generally used for small messages and to take away particular message sizes or to depend on the so the application can adapt to whatever the immediate size is for a particular transport, if even needed, is a very weak facility to offer. But the remote side does posts Recv. Since it anticipate that this Recv will be matched against the RDMA Write with immediate it posts the recv buffer which fits. Yes, there is an issue for Transport-independent ULP that it does needs a buffer. For IB it is possible to post 0-size buffer. But if this is the case Recv end Consumer DOES know that it will be macthed against RDMA Write so ULP DOES know what it will be matched against. So in the worst case Consumer does have to pay the price of creating LMR to handle 4 byte buffer to match RDMA Write Immediate data. It also affects interface resource allocation. Send queue sizes will have to adapt to possibly twice there size. That is correct. We argued about it at the meeting. One alternative is to have EP and EVD attr. But this will not be efficient since it will double the queue size where a smaller increment is possible due to the depth of the RDMA Write pipeline outstanding. It just dawned on me that the immediate data must be in registered memory to be sent in a message. This means the API must be amended to pass an LMR or, even worse, the provider would have to register memory in the speed path or create and manipulate its own queue of immediate data buffers/LMRs. Of course, LMRs are not needed and an overhead for transports that provide true immediate data. No registration on the speed path. It is Consumer responsibility to provide Recv Buffer of the right size. Yes for IB only ULP this can be avoided. But ULP can be written to the proposed API to take full advantage of IB performance but that code will not be transport independent. But this API allows to write transport independent code albeit with certain price attached. Oh, and another thing. InfiniBand indicates the size of the RDMA write in the receive completion. That is something that will have to be addressed in a transport independent way or dropped as part of the service. Good point. I will augment Spec accordingly. The bottom line here is that it is NOT transport independent. implementation is not transport independent. But API allows to write Transport-specific ULP with full perfromance as well Transport-independent ULP with better performance than without proposed API and with minimal performance penalty for Transports that provide it. Now, the atomicity argument between write and send has some credibility. If an application chooses to adapt to an explicit write/send semantic for write completion notification in environments that can't provide it natively, this could be addressed by a generalized combined request API that can guarantee thread-based atomicity to the send queue. This seems much more straightforward to me since, in essence, to adapt to non-native immediate data services, they would have to allocate resources and behave
RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 immediatedataproposal
I am not clear what you are proposing? A transport specific API? The current proposal provides on sending side: single post, and single completion in the error free case. This is commonality that simplify ULP. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Larsen, Roy K [mailto:[EMAIL PROTECTED] Sent: Monday, February 06, 2006 6:50 PM To: Kanevsky, Arkady; Caitlin Bestler; [EMAIL PROTECTED]; Sean Hefty Cc: openib-general@openib.org Subject: RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 immediatedataproposal From: Kanevsky, Arkady [mailto:[EMAIL PROTECTED] Sent: Monday, February 06, 2006 2:27 PM Roy, comments inline. Mine too From: Kanevsky, Arkady [mailto:[EMAIL PROTECTED] Roy, Can you explain, please? For IB the operation will be layered properly on Transport primitive. And on Recv side it will indicate in completion event DTO that it matches RDMA Write with Immediate and that Immediate Data is in event. For iWARP I expect initially, it will be layered on RDMA Write followed by Send. The Provider can do post more efficiently than Consumer and guarantee atomicity. On Recv side Consumer will get Recv DTO completion in event and Immediate Data inline as specified by Provider Attribute. From the performance point of view Consumers who program to IB only will have no performance degradation at all. But this API also allows Consumers to write ULP to be transport independent with minimal penalty: one binary comparison and extra 4 bytes in recv buffer. If the application could be written transport independently, I would have no objection at all. Instead, it must be written in a transport-adaptive way and to be able to adapt to all possible implementations, the application could not send arbitrary immediate-sized data as messages because there is no way to distinguish between them on the receiving side. That is HUGE! It is my experience that send/receive is generally used for small messages and to take away particular message sizes or to depend on the so the application can adapt to whatever the immediate size is for a particular transport, if even needed, is a very weak facility to offer. But the remote side does posts Recv. Since it anticipate that this Recv will be matched against the RDMA Write with immediate it posts the recv buffer which fits. Yes, there is an issue for Transport-independent ULP that it does needs a buffer. For IB it is possible to post 0-size buffer. But if this is the case Recv end Consumer DOES know that it will be macthed against RDMA Write so ULP DOES know what it will be matched against. So in the worst case Consumer does have to pay the price of creating LMR to handle 4 byte buffer to match RDMA Write Immediate data. I think you missed my larger point. The point was that the application must be written in such a way that it could inferred when immediate data arrived for a variety of immediate data sizes and that places a constraint on the application wrt to data it may want to send/receive normally. Where as, if the application embraced the fact that it was responsible for sending a message to indicate a write completion, it is free to send whatever amount of data best met its needs. Transports that support true immediate data do not require the ULP to perform buffer matching. They can post a series of receive buffers that may or may not indicate immediate data. The ULP does not have to know ahead of time when immediate data will arrive **against other data receives**. The fact that an IB oriented application never needs to back a receive request with a buffer if they were only used to indicate immediate data is orthogonal. It also affects interface resource allocation. Send queue sizes will have to adapt to possibly twice there size. That is correct. We argued about it at the meeting. One alternative is to have EP and EVD attr. But this will not be efficient since it will double the queue size where a smaller increment is possible due to the depth of the RDMA Write pipeline outstanding. It just dawned on me that the immediate data must be in registered memory to be sent in a message. This means the API must be amended to pass an LMR or, even worse, the provider would have to register memory in the speed path or create and manipulate its own queue of immediate data buffers/LMRs. Of course, LMRs are not needed and an overhead for transports that provide true immediate data. No registration on the speed path. It is Consumer responsibility to provide Recv Buffer of the right size. Yes for IB
RE: [openib-general] RE: [RFC] DAT 2.0 immediate data proposal
comments on Arlin and Caitlin's emails inline. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Caitlin Bestler [mailto:[EMAIL PROTECTED] Sent: Monday, January 30, 2006 7:16 PM To: Arlin Davis; Kanevsky, Arkady Cc: Lentini, James; [EMAIL PROTECTED]; openib-general@openib.org Subject: RE: [openib-general] RE: [RFC] DAT 2.0 immediate data proposal Arlin Davis wrote: Kanevsky, Arkady wrote: Arlin, I am not convinced we need a new recv for immediate data. But what is needed is change in normative text in many places. Recv, RDMA Write, DTO completion events, error behavior. Sure you can define immed data in extension but it still effects behavior of the normative part of the spec. How does it effect the normative part of the spec outside of the DTO event extension? The post_recv behaves exactly the same. We will need a paragraph that size of the recv buffer shall accommodate immediate data if the recv may be matched with rdma_write_immed. There we can reference Provider attribute for how immed data is returned. Then in Advice to Consumer state how to generate transport independent recv and how it can be optimized based on Provider attr. This is why my preference is to put it into the main spec. ok, with no new recv_immed call we do get a little closer. The xfer_size is minor thing. We just need to define it meaning with respect to immed_data. Defining it either way is fine. Handling extra space on CQ can be handled by Provider. We can add a new EVD attribute for the use for handling RDMA_write with immed data and Provider can automatically add extra space on CQ. Provider is already responsible to handing user a single completion. SO it will only be used for error handling. sounds good. Error handling takes maost of the new write up anyhow. Regardless where it is done in the spec or in extension. Question on do we want to support Send with immed_data have to be decided. Ditto remote RMR invalidation with new post(s) for immed_data. Just because IB supports all possible correlation under one Send post does not mean that uDAPL should follow that too. I would agree, strike them all except rdma_write_immed. The only one which need to be discussed it Remote invalidate with rdma_write_immed and Local invalidate with rdma_write_immed. Can you give some idea how you would write up the normative text for the transport independent receive that would accept immediate data? thanks, -arlin The data source: posts an rdma write with immediate DTO, supplying the RDMA Write data source and an immediate value. This is translated into one work request (if the device supports write with immediate), or into a RDMA Write followed by a RDMA Send (if it does not). This should be Model Implication section. While successful completion of the RDMA Write will be suppressed, the Consumer must still allow for the extra space on the SendQ and the CQ. An IA attribute will document how many work requests a write_with_immediate will translate into. This belongs to Model implication also and in Usage section. The data sink: post a recv (to EP or SRQ) with a four byte buffer. When it reaps the completion it needs to be ready to see the data either in an immediate field in the work completion, or in the buffer originally specified in the recv DTO. This is in the Usage section. A Provider MAY indicate that it supports immediate receives, but on iWARP or any transport where this is not the default optimized receive processing MUST be enabled by the user. Otherwise, RFC compliance would require that a four byte untagged message matched to a zero byte buffer was an error. Essentially the user is posting a receive operation that names the four bytes in the Work completion as the buffer. Ditto. Also it should reference Provider Attribute and not transport. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] RE: [openib-general] RE: [RFC] DAT 2.0 immediate data proposal
Caitlin, Agree that Send with immed is too hard to handle. I have not heard from any ULP that they need that. So we can take informal vote and close that issue. The sizing of EVD to handle 2 completions in case of the error for post of RDMA_write_with_immed can be handled by Provider adding extra if EVD will be used for posting RDMA_write_with_immed. It does not allow Consumer to optimize queue size based on exact number of oustanding RDMA_write_with_immed ops but it is simpler to program to. Of course ULP can be adaptive and chooses the code pass based on Provider attr if we add the attr if extra queue size is needed. It is separate from how immed data is returned. We can combined the 2 under one Provider attr but conceptually it is wrong to combine two. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Caitlin Bestler [mailto:[EMAIL PROTECTED] Sent: Friday, January 27, 2006 12:23 PM To: [EMAIL PROTECTED]; Kanevsky, Arkady; Arlin Davis Cc: Lentini, James; openib-general@openib.org Subject: RE: [dat-discussions] RE: [openib-general] RE: [RFC] DAT 2.0 immediate data proposal [EMAIL PROTECTED] wrote: But this penalizes user which need to deal with 2 way to deal with post calls and completions. I do not think we are not to far from consensus. Transport independent App will allocate 4 bytes extra for buffers that can match immediate data. Completion data will return where the immediate data is return (Consumer can not request it on posting), and 4 bytes for immediate data in completion event. The rest are ironing details for complete specification. This is no different than for any other new functionality proposed. And except for wasting 4 bytes per buffer or completion I do not see how it penalizes IB. Moreover if Apps knows that Provider returns immediate data in completion event it can avoid any penalty. There is no penalty to the user if you just provide native features via extensions. Your extension will provide the best possible interface for your native capabilities. I think we are further from consensus then we first thought: Right now we have a new post recv, different delivery mechanisms, and a requirement to allocate an extra 4 bytes of user data. The only requirement to support immediate data on IB, is a new post send and write immediate data calls and a new event data construct. The normal post_recv can be used unchanged and can already process normal and immediate data. No requirement on the user to allocate and manage an extra 4 bytes in the receive buffer. In fact, you can post receive with no buffer. In order to support immediate data via iWARP, you now have a requirement to use a special new receive post, new user buffer constructs to place the data, and new delivery method that has to be checked via provider attributes or at event time. Is there anyway to get this closer? If not, I would recommend going back to an extension interface for immediate data. I think the trick to finding out if there is something useful that can be made transport neutral is to work in the opposite direction. Start with the message sequence that the application would use *without* immediates, and then ask if there is a way to allow an InfiniBand Provider to compress that message sequence. That is possible for RDMA Write with Immediate. With careful definition of a composite message it can be viewed as a transport specific replacement for an RDMA Write followed by a 4-byte RDMA Send. There are only two special considerations required: 1) A single post has to submit the combination (otherwise it is too difficult for the Provider to detect the optimization). 2) The receive completion may report the received data in the user supplied buffer OR in an immediate data field in the completion. I do not think it is feasible to define a transport neutral equivalent of a RDMA Send with Immediate. How is the extra data transmitted via iWARP? An extra send? Pre-pend the four bytes? Or 4 bytes at the end? Delivery of the immediate data is transport dependent? Adding an immediate data field to the completion doesn't cost much, and it would allow IB DAT Provider to interact with IB-specific fields. But I can't see adding a send with immediate method in any way that would create an expectation in developers that it would work in a transport neutral fashion. Write with immediate is possible. It carries the complexity that a single DTO request might result in two flushed work completions. The current consensus is that this was too complex relative to the benefit. But that's really a call for application developers
RE: [openib-general] RE: [RFC] DAT 2.0 immediate data proposal
Sean, Immediate data can be handled in Transport independent way. API for it certainly is. I am more concern that different vendors will come up with their own extensions for the same features. The size of immediate data is no big deal. The reall issue is that App will need to be changes to handle more data. So DAT can just increase the size of the immed_data field in event and in posted buffer. NO API functionality change just API header change and recompile of app. But these kind of changes will face the same problem whether it is part of DAT or part of the DAT extension. Let talk more about it on the DAT call tomorrow. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Sean Hefty [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 24, 2006 7:17 PM To: Kanevsky, Arkady Cc: Arlin Davis; Caitlin Bestler; Lentini, James; [EMAIL PROTECTED]; openib-general@openib.org; Davis, Arlin R Subject: Re: [openib-general] RE: [RFC] DAT 2.0 immediate data proposal Kanevsky, Arkady wrote: But this penalizes user which need to deal with 2 way to deal with post calls and completions. Yes, any app that wants to take advantage of transport specific features, which immediate data is, is no longer transport neutral. How do you plan to handle the next RDMA transport that comes along with 64-bytes of immediate data? - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] RE: [RFC] DAT 2.0 immediate data proposal
Arlin, I am not convinced we need a new recv for immediate data. But what is needed is change in normative text in many places. Recv, RDMA Write, DTO completion events, error behavior. Sure you can define immed data in extension but it still effects behavior of the normative part of the spec. This is why my preference is to put it into the main spec. The xfer_size is minor thing. We just need to define it meaning with respect to immed_data. Defining it either way is fine. Handling extra space on CQ can be handled by Provider. We can add a new EVD attribute for the use for handling RDMA_write with immed data and Provider can automatically add extra space on CQ. Provider is already responsible to handing user a single completion. SO it will only be used for error handling. Error handling takes maost of the new write up anyhow. Regardless where it is done in the spec or in extension. Question on do we want to support Send with immed_data have to be decided. Ditto remote RMR invalidation with new post(s) for immed_data. Just because IB supports all possible correlation under one Send post does not mean that uDAPL should follow that too. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Arlin Davis [mailto:[EMAIL PROTECTED] Sent: Thursday, January 26, 2006 3:02 PM To: Kanevsky, Arkady; Arlin Davis; Caitlin Bestler Cc: Lentini, James; [EMAIL PROTECTED]; openib-general@openib.org Subject: RE: [openib-general] RE: [RFC] DAT 2.0 immediate data proposal But this penalizes user which need to deal with 2 way to deal with post calls and completions. I do not think we are not to far from consensus. Transport independent App will allocate 4 bytes extra for buffers that can match immediate data. Completion data will return where the immediate data is return (Consumer can not request it on posting), and 4 bytes for immediate data in completion event. The rest are ironing details for complete specification. This is no different than for any other new functionality proposed. And except for wasting 4 bytes per buffer or completion I do not see how it penalizes IB. Moreover if Apps knows that Provider returns immediate data in completion event it can avoid any penalty. There is no penalty to the user if you just provide native features via extensions. Your extension will provide the best possible interface for your native capabilities. I think we are further from consensus then we first thought: Right now we have a new post recv, different delivery mechanisms, and a requirement to allocate an extra 4 bytes of user data. The only requirement to support immediate data on IB, is a new post send and write immediate data calls and a new event data construct. The normal post_recv can be used unchanged and can already process normal and immediate data. No requirement on the user to allocate and manage an extra 4 bytes in the receive buffer. In fact, you can post receive with no buffer. In order to support immediate data via iWARP, you now have a requirement to use a special new receive post, new user buffer constructs to place the data, and new delivery method that has to be checked via provider attributes or at event time. Is there anyway to get this closer? If not, I would recommend going back to an extension interface for immediate data. -arlin ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] RE: [RFC] DAT 2.0 immediate data proposal
But this penalizes user which need to deal with 2 way to deal with post calls and completions. I do not think we are not to far from consensus. Transport independent App will allocate 4 bytes extra for buffers that can match immediate data. Completion data will return where the immediate data is return (Consumer can not request it on posting), and 4 bytes for immediate data in completion event. The rest are ironing details for complete specification. This is no different than for any other new functionality proposed. And except for wasting 4 bytes per buffer or completion I do not see how it penalizes IB. Moreover if Apps knows that Provider returns immediate data in completion event it can avoid any penalty. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Arlin Davis [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 24, 2006 5:42 PM To: Caitlin Bestler Cc: Davis, Arlin R; Kanevsky, Arkady; Lentini, James; [EMAIL PROTECTED]; openib-general@openib.org Subject: Re: [openib-general] RE: [RFC] DAT 2.0 immediate data proposal ok, maybe we should backup and start over This is exactly why immediate data was initially proposed as an extension instead of general API. We start to penalize native IB features based on the requirements of other RDMA interfaces that have to emulate the feature anyway. What prevents the next RDMA interface that comes along from requiring other variations of the interface due to implementation implications? This is an IB specific feature that does not map well on iWARP so lets just call it what it is and let IB providers supply immediate data capabilities via the extension interface. -arlin Caitlin Bestler wrote: Maybe we need to just go back to one model and always deliver via the event? With the post_recv_immed requirements, other transports have a mechanism to emulate and create the necessary resources on the recv side to place idata and copy to event when operation is completed. Would this work for iWARP? Two different models for receiving idata should be avoided if at all possible. Always delivering by the event is not feasible for an iWARP vendor. If you are working over RDMAC verbs then the work completion is no longer accessible by the time the Work Completion is reaped. So copying from the receive buffer to the event does not work since the location of the receive buffer is now known only to the application. The same problem exists in the opposite direction for InfiniBand HCAs using standard verbs. They cannot copy from the CQE to the receive buffer. So the user is stuck checking a flag or the event type to know where their data is. This is not terribly user friendly, but it is the best that can be offered if we want to enable this optimization. The need to check the flag does reduce the value of the optimization though. 6. Is dto_completion_data xfer_length include immediate_data size or not? no Then how does the receiver know how much data there is? Even if an iWarp Provider attempts to optimize immediate placement into the CQ, it will end up setting the xfer_length whenever the packet is received out of order. So it is far simpler for the application to simply know that the data will be in the buffer, and that the xfer_length will be set. It doesn't need to worry about whether they were set by the cq_poll verb or by the hardware. 11. Need to cleanup operation description to make it clear that Send|RDMA_write and immediate data part is a single atomic operation. The current followed by language is misleading. Make it explicit that there is a single local DTO completion and single remote DTO completion. Ok, I will clean that up The best mapping available over RDMAC-compliant firmware for an iWARP NIC would be to post two operations (RDMA Write followed by a short Send). That would require additional spacein the send and completion queues since a completion for the write can only be suppressed for a successful completion. Whether these extra slots were required would be an IA attribute. And the requirement is that nothing for that QP can come between the iWARP Write and the Send. How the provider does that is up to it. Options include locking over both posts and a composite work request. Anyone working over existing RDMAC-compliant verbs will have to use the first approach. 12. Is your intension that post_recv_immed can ONLY except immediate data and is not capable to recv any message? No, the intention is to extend the post_recv to handle 32bit idata which may arrive with or without other send or rdma_write data
[openib-general] RE: [RFC] DAT 2.0 immediate data proposal
Arlin, comments inline. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 From: Davis, Arlin R [mailto:[EMAIL PROTECTED] Sent: Monday, January 23, 2006 7:15 PMTo: Kanevsky, Arkady; Lentini, JamesCc: openib-general@openib.org; [EMAIL PROTECTED]Subject: RE: [RFC] DAT 2.0 immediate data proposal Arkady, Response inline From: Kanevsky, Arkady [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 17, 2006 7:16 AMTo: Davis, Arlin R; Lentini, JamesCc: [EMAIL PROTECTED]; openib-general@openib.orgSubject: RE: [RFC] DAT 2.0 immediate data proposal Arlin, a few things need to be addressed. 1. correlation with local and remote invalidate This potentially effects both DAT_DTOs and post operations How does this differ from normal sends or writes?[AK]We had added a new Send_with_Invalidate. The completion also states whether RMR was invalidated and which one. But the text for interaction is added through out the completion and post operations. See the latest draft of uDAPL and kDAPL 1.3 specs on the DAT reflector. 2. Need a precise defintion for CONFIRM_FLAG definition in a transport independent fashion. What guarantees DAT Provider "provides" on successful local completion? Remote end guarantee? My understanding what you are trying to do is create 2 models one IB and one for iWARP. So for IB Consumers will use CONFIRM_FLAG and for iWARP IMMED_FLAG. Provider will indicate in Provider_attr which model it supports. The issue I have with it is that I do not see a model that Consumer can use to create a transport independent code. It looks like Immed_flag can be made transport independent. But with "sender" specifying the behavior a protocol extension is needed for IB. IB will always deliver Immediate data in the header not a payload and remote Provider can control how it is delivered to a Consumer. But this means that there is no need for DTO_flags for Send side. Instead it can be used for Recv side or controlled purely by Provider. Maybe we need to just go back to one model and always deliver via the event? With the post_recv_immed requirements, other transports have a mechanism to emulate and create the necessary resources on the recv side to place idata and copy to event when operation is completed. Would this work for iWARP? Two different models for receiving idata should be avoided if at all possible.[AK]Caitlin already responded to this. 3. Need to define error behavior. for new operations, async errors, EP behavior. I will work on updating the draft. post_send_immed will look much like post_send and post_rdma_write_immed will look a lot like post_rdma_write with some additional errors based on the post receive buffer requirement.[AK]Also consider if youwantto addremote invalidate to the new operation. 4. Need to define DAT_Provider attributes for immediate data and dto_flags behavior 5. Does Solicited_wait completion_flag value now applicable for RDMA_write for immediate data? yes, applicable to send, send_immed, and write_immed 6. Is dto_completion_data xfer_length include immediate_data size or not? no[AK]It can work both ways. Either we include4 extra bytes for immediate dataor not. Consumerjust have to know.The real data alwaysstarts at 4 byte boundary into the buffer is immediate data is returned inline. We need to state how immediate data is positioned if it is smaller than 4 bytes. 7. what memory privilages needed for a recv buffer for immediate data? Based on the operation write_immed would require write privileges and send_immed would require recv privileges. 8. SRQ interaction? Good question. all post_recv_immed or all post_recv?[AK]Will this work for the user model? Not supporting handling immediate recv and regular recv with potential immediate data onone SRQ. 9. What happens of buffer for recv operation NOT recv_immed is matched for incomming recv/rdma_write op? The rules should be: Can receive a send, send_immed, or write_immed with recv_immed. Cannot receive send_immed or write_immed on a recv. However, I am not sure how you would enforce this on IB (DTO error on the receiving side?) since the idata is delivered via CQ and does not require a special receive post descriptor.[AK]We can make thisProvider attribute. Or we can state that if immed data is return in event then there is no error for recv. 10. Change dat_ep_post_write_immed to dat_ep_post_rdma_write_immed to be consis
[openib-general] RE: [RFC] DAT 2.0 immediate data proposal
Arlin, a few things need to be addressed. 1. correlation with local and remote invalidate This potentially effects both DAT_DTOs and post operations 2. Need a precise defintion for CONFIRM_FLAG definition in a transport independent fashion. What guarantees DAT Provider "provides" on successful local completion? Remote end guarantee? My understanding what you are trying to do is create 2 models one IB and one for iWARP. So for IB Consumers will use CONFIRM_FLAG and for iWARP IMMED_FLAG. Provider will indicate in Provider_attr which model it supports. The issue I have with it is that I do not see a model that Consumer can use to create a transport independent code. It looks like Immed_flag can be made transport independent. But with "sender" specifying the behavior a protocol extension is needed for IB. IB will always deliver Immediate data in the header not a payload and remote Provider can control how it is delivered to a Consumer. But this means that there is no need for DTO_flags for Send side. Instead it can be used for Recv side or controlled purely by Provider. 3. Need to define error behavior. for new operations, async errors, EP behavior. 4. Need to define DAT_Provider attributes for immediate data and dto_flags behavior 5. Does Solicited_wait completion_flag value now applicable for RDMA_write for immediate data? 6. Is dto_completion_data xfer_length include immediate_data size or not? 7. what memory privilages needed for a recv buffer for immediate data? 8. SRQ interaction? 9. What happens of buffer for recv operation NOT recv_immed is matched for incomming recv/rdma_write op? 10. Change dat_ep_post_write_immed to dat_ep_post_rdma_write_immed to be consistent with current terminology. 11. Need to cleanup operation description to make it clear that Send|RDMA_write and immediate data part is a single atomic operation. The current "followed by" language is misleading. Make it explicit that there is a single local DTO completion and single remote DTO completion. 12. Is your intension that post_recv_immed can ONLY except immediate data and is not capable to recv any message? 13. size should be num_segments for dat_ep_post_recv_immed() Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 From: Arlin Davis [mailto:[EMAIL PROTECTED] Sent: Monday, January 16, 2006 5:55 PMTo: Kanevsky, Arkady; Lentini, JamesCc: [EMAIL PROTECTED]; openib-general@openib.orgSubject: [RFC] DAT 2.0 immediate data proposal Arkady, The attached proposal adds immediate data options as standard APIs instead of extensions for the following calls. dat_ep_post_send_immed() dat_ep_post_recv_immed() dat_ep_post_write_immed() The patch should be ready by tomorrow. Thanks, -arlin ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] RE: [RFC] DAT 2.0 extension proposal
Arlin, 1. Does it mean that existing DAT providers will have to be modified so they report DAT_NOT_IMPLEMENTED for each extension? 2. Why is there DAT_INVALID in DAT_DTOS? 3. Do you want to use DAT_EXTENSION_DATA or DAT_EXT_DATA? 4. The proposed operations are operation on EP and they are DTOs. Why not define DAT_DTO_EXT_OP instead of DAT_EXT_OP? MY concern is that if these are not DTO then we have a new event stream type for "extensions" and we need to define rules for this event stream including ordering rules and interactions with other event streams, provider attributes for stream mixing and so on... If we restrictextensions to DTO operation extension we avoid all these issues and simplify APIs. On the negative side these extension are restrictive. 5. Memory protection extension for atomic operations 6. error returns for extensions? Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 From: Davis, Arlin R [mailto:[EMAIL PROTECTED] Sent: Monday, January 16, 2006 5:55 PMTo: Kanevsky, Arkady; Lentini, JamesCc: [EMAIL PROTECTED]; openib-general@openib.orgSubject: [RFC] DAT 2.0 extension proposal Arkady, The attached proposal adds generic DTO extensions and provider specific atomic operations as follow. dat_ep_post_cmp_and_swap() dat_ep_post_fetch_and_add() The patch should be ready by tomorrow. Thanks, -arlin ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] RE: [RFC][PATCH] OpenIB uDAPL extension proposal - sample immed data and atomic api's
comments inline. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Arlin Davis [mailto:[EMAIL PROTECTED] Sent: Thursday, January 05, 2006 6:35 PM To: Kanevsky, Arkady Cc: Arlin Davis; Lentini, James; [EMAIL PROTECTED]; openib-general@openib.org Subject: Re: [openib-general] RE: [RFC][PATCH] OpenIB uDAPL extension proposal - sample immed data and atomic api's Kanevsky, Arkady wrote: Arlin, nice proposal, thanks. I have one high level question and a few specific technical ones. 1. Why do you want to provide this functionality via extension instead of part of new DAT spec, say 2.0? This will allow Consumers to use all events, operations, and Provider/IA functionality uniformly instead of via 2 separate layers. This will also ensure that this basic funcionality can be provided by all DAPL Provider the same way on DAPL and DAT layers. DAPL 2.0 is not done yet so we have time to incorporate that. DAPL 2.0 already introduced new functionality which is easy to beef up for your proposal. See DAT_DTOS for example. DAT_EVENT is also modified to handle remote invalidation so a small addition for Immediate data and Atoimc ops is a sensible addition. This should simplify proposal significantly. As you will not need to introduce any new EXT structures. As mentioned on the con-call, there are two separate items to consider while looking at the proposal. The first is the ability to extend DAT for specific provider value-add and the second is to validate the need for general atomic and immediate data functionality in the basic set of API's for all providers. I included atomics and immediate data as examples since it is specific to one provider (IB), it includes operations that require new ops, events, and event data types, and it also provides a working model to validate the extension model from request to completion events. I would like to concentrate on getting consensus on the extension proposal first if possible. Just try to think of the actual operations as some opaque dat_ext_foobar_op(). The thing that bothers me is that we already have several APIs that are transport specific. While some are possible to implement on other transports the others, like Socket CM, can not. So I view both of your specific extensions as transport specific amd hence prefer to add them as normal APIs not extensions. The secondary goal is that Provider can add extensions without requiring to change to DAT. These fall into 3 categories. 1. New memory types including privilages and protection attributes. We can add extension entry to these structures. We need to check if this is sufficient. Think of shared memory for example. I am assuming no changes to PZ. 2. New DTOs. The main issue is not DTOs but their completions and async errors. This is why Immediate data is better handled by incorporating into DAT spec while atomic can be handled by extensions. That is completion will return extention and Consumer will do the secondary switch on the extension type. Extension should not impact backwards compatibility. We had not looked at errors. But assuming a simple model that async errors break connection and we can return extension error with extensions defining new reason. Again details need to be polished. 3. new connection types or CM models... New connections seems to have little impact on existing API assuming that EP type can be extended. The new connection can even restrict which DTO they can handle. CM model is more problematic. Arlin, it would be nice to consider some of your other extensions that are not transport specific to see how it will fit before we make the final decision. This should give us idea how extensible DAT extension model is. In general, extension route was intended for RNIC|HCA providers to expose HW capabilities beyond IBTA, iWARP and VIA standards. The standard RDMA functionality is best handle via spec addition. DAT 2.0 does it for FMR, remote and local memory invalidation as well as others. True, but the extension route is not fully defined, documented, nor implemented. This is what I would like to work on getting completed in time for 2.0 if possible. BTW: The existing implementation actually uses dapl_provider-extension to store the hca_ptr but the specification states that it is reserved for the providers private use (8.2.1 in DAPL1.2 spec). This is why I had to defined another extension_func in the patch. I had posted a complete list of changes/addition to DAT 2.0 about a month ago. But we had not discussed yet version change from 1.3 to 2.0 nor how much backwards compatibility spec will provide. 2. What is IMMED_EVENT
[openib-general] FW: [swg] 12/6 meeting minutes (2nd half)
SWG have approved the IP address proposal (v5). Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -Original Message- From: Mike Ko [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 06, 2005 6:41 PM To: [EMAIL PROTECTED] Subject: [swg] 12/6 meeting minutes (2nd half) We had a brief discussion on the revised slide deck from Arkady on the RDMA-Aware SID and CM REQ Message Extension and there were no disagreements on the direction. Arkady Kanevsky from NetApp made the following motion: Create a new Annex for RDMA aware ULPs that includes: a. port mapping between IETF protocols ports and IB SIDs b. CM REQ message private data format extensions c. CM usage for RDMA aware ULPs Ted Kim from Sun seconded the motion. Vote count: Against: 0 Abstain: 0 Motion passed. We continued with a discussion on the slide deck from Mike Ko on supporting iSER on InfiniBand. There were disagreements on the merits on the need for Connection Preference bits. We decided to move forward with the rest of the suggestions from Mike and postpone the decision on the CP bits until the next meeting. Mike Ko from IBM made the following motion: Create a new annex to support iSER on InfiniBand release 1.1 and 1.2 as represented in Mike Ko's slidedeck dated December 1 but not including the support for Connection Preference bits, and also making ARI a must requirement for CM REJ. Yaron Haviv from Voltaire seconded the motion. Vote count: Against: 0 Abstain: 0 Motion passed. The meeting was adjourned after the vote. Mike ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [swg] RE: [openib-general] socket based connectionmodel for IBproposal -round 4
agreed. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 275 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message- From: Sean Hefty [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 30, 2005 12:59 PM To: Yaron Haviv Cc: Kanevsky, Arkady; Ted H. Kim; [EMAIL PROTECTED]; openib-general@openib.org Subject: Re: [swg] RE: [openib-general] socket based connectionmodel for IBproposal -round 4 Yaron Haviv wrote: How about using ARP to get from IP to DGID+Partition Followed by an SIDR to map DGID+PKey+Service to QKey QP It is the same concept as CMA that first uses IP stack (ARP etc') to get to the remote end-point (in that case GID+PKey combination) followed by SA-PR and CM REQ, we just substitute the CM REQ with a SIDR REQ It may not solve all the cases but probably most of the practical ones This was my thought as well. Anyway the packets will need to carry some header (since it's not a connected model), you can add more stuff in that header (e.g. can use IPoIB header as is which contains already the src/dst IP) I was assuming that each packet would need to carry some sort of header. At this point, we may want to defer defining anything for UDP until there's a better understanding of what an application would want. My guess is that such an application will need new APIs for posting sends based on UDP addressing. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] scoket based connection model for IB - round 5
Here is the fifth and I hope the final version of the proposal. The changes from previous version: 1. IBTA bit numbering scheme (reserse order) 2. Protocol version is split into major and monr wiht 4 bits each. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 275 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 IP Address Support by InfiniBand CM_v5.pdf Description: IP Address Support by InfiniBand CM_v5.pdf ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [swg] RE: [openib-general] socket based connectionmodel for IB proposal -round 4
Sean, SWG discussed today the extending private data format proposal to SIDR_REQ. The group does not see the need for it since ULP is no RDMA aware. That is ULP does not use RDMA operations. Do you have some specific ULP in mind for this functionality? For UDP a different IP address can be used for each message. There is no persistent connection. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 275 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message- From: Sean Hefty [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 23, 2005 3:41 PM To: Ted H. Kim Cc: Kanevsky, Arkady; [EMAIL PROTECTED]; openib-general@openib.org Subject: Re: [swg] RE: [openib-general] socket based connectionmodel for IB proposal -round 4 Ted H. Kim wrote: I know we originally set out to compress everything down to the minimum to preserve as much ULP specific private data as possible. But it seems to me in the current proposal we have reserved space now which could be used to re-expand the version to major 4-bits and minor-4 bits without harming anything else. I don't see any benefit to having 2 4-bit version numbers over a single 8-bit number. A single 4-bit version number should suffice. If all version numbers are ever consumed, then version 15 can define an extended version field. IMO, multiple version fields simply complicate the implementation. I would rather see the reserved space used to define the size of carried user-private data. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] socket based connection model for IB proposal -round 4
Yes. The private data format is not RC or UC specific. I will add this comment that format covers both EE and C. Is this sufficient? Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 275 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message- From: Sean Hefty [mailto:[EMAIL PROTECTED] Sent: Thursday, November 17, 2005 12:40 PM To: Kanevsky, Arkady; [EMAIL PROTECTED]; openib-general@openib.org; [EMAIL PROTECTED] Subject: RE: [openib-general] socket based connection model for IB proposal -round 4 If the proposal will include UDP, should the definition extend beyond connections to include UD QPs as well (i.e. SIDR REQ)? - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [swg] RE: [openib-general] socket based connectionmodel for IB proposal -round 4
This is fine with me. I will update the proposal with this for next version. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 275 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message- From: Ted H. Kim [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 23, 2005 3:29 PM To: Kanevsky, Arkady Cc: [EMAIL PROTECTED]; openib-general@openib.org; [EMAIL PROTECTED] Subject: Re: [swg] RE: [openib-general] socket based connectionmodel for IB proposal -round 4 Arkady, I know we originally set out to compress everything down to the minimum to preserve as much ULP specific private data as possible. But it seems to me in the current proposal we have reserved space now which could be used to re-expand the version to major 4-bits and minor-4 bits without harming anything else. Can we entertain that as an option? My rationale is to err on the side of perhaps a little too much version room than too little. This will put it in line with the precedent of SDP. -ted Kanevsky, Arkady wrote: pdf version of the proposal. Arkady Kanevsky email: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 275 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -- -- *From:* Kanevsky, Arkady *Sent:* Wednesday, November 16, 2005 11:59 AM *To:* [EMAIL PROTECTED]; openib-general@openib.org; [EMAIL PROTECTED] *Subject:* [openib-general] socket based connectionmodel for IB proposal -round 4 This version incorporate the feedback on 3 reflectors and yesterday's SWG meeting. Major changes from previous version are: no REQ bit to identify private data formaing - SID range used instead port mapping uses IBTA space and IETF protocol # is encoded in SID protocol version is 4 bits. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 275 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -- Ted H. Kim Sun Microsystems, Inc. [EMAIL PROTECTED] 222 North Sepulveda Blvd., 10th Floor (310) 341-1116 El Segundo, CA 90245 (310) 341-1120 FAX ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] socket based connectionmodel for IB proposal -round 4
pdf version of the proposal. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 275 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 From: Kanevsky, Arkady Sent: Wednesday, November 16, 2005 11:59 AMTo: [EMAIL PROTECTED]; openib-general@openib.org; [EMAIL PROTECTED]Subject: [openib-general] socket based connectionmodel for IB proposal -round 4 This version incorporate the feedback on 3 reflectors and yesterday's SWG meeting. Major changes from previous version are: no REQ bit to identify private data formaing - SID range used instead port mapping uses IBTA space and IETF protocol # is encoded in SID protocol version is 4 bits. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 275 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 IP Address Support by InfiniBand CM_v4.pdf Description: IP Address Support by InfiniBand CM_v4.pdf ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] RE: [dat-discussions] socket based connectionmodel for IB proposal - round 3
The goal that this proposal is to provide underpinning for common RDMA transport CM. Thus, the API ULP (both user space and kernel space) use socket addressing. For ULP addressing this means 5 tuple: protocol, src IP addr, src port, dst IP addr, and dst port. Port is 16 bit entity. The proposal just provide a mechanism for exchanging this 5-tuple between two sides. Which entity is responsible to use the proposed protocol is an interesting one. I was assuming that this will be CM. After all the proposed protocol is CM extension protocol. But it can be another entity module between CM and ULP. Its job will be taking 5 tuple and populating private data and converting dst port to SID. Since OpenIB addr.c already deals with IP to IB address translations it is a logical candidate for it. On remote side it extracts info from private data and populates socket info for Consumer and passes Consumer a pointer to Consumer private data. Another interesting place to deal with is listening point. Since it is common RDMA API, 16 bit port should be use for it also. This means that the same module should locally convert port to IB SID before passing it to CM. CM just ensures that incoming connection request which matches listening SID. While it is possible to do wildcarding on the whole SID, I had not seen it is used selectively on individual bits of a SID or a port. While SDP does the conversion to IB SID from Ethernet port, this proposal shift the responsibility for port and IP address conversion from ULP down. Now lets look at each field proposed to be moved from protocol private data to SID. Protocol version. This mean that in the future if protocol version will be bumped up we will have to change the SID on which Consumer listens on and requests sent to. Not sure how to do that without changing ULP. Does not look like a good idea. IP version. This can be incorporated into SID. But if HCA has multiple IP addresses assigned to it the listening point need to specify its IP address(es). The current verbs and/or API will have to be changed to support it. But if socket is passed to listen on it does have all the needed info. Looks fine. Ethernet Protocol. The same as the one above. Src port. Very questionable. For that listening SID must have wild card for portion of SID where SRC port is incorporated. Since ULP is not aware or ever see it, it is possible. But this pushes the definition of SID beyond it current IBTA spec statement of similar to TCP port number. The query of listen point should also hide the wildcarded SID in this case. DAPL APIs (uDAPL and kDAPL) does not expose local IP address for listen point. An additional API can be added to support passing local socket to listen on instead of Connection Qualifier. Since it is addition no backwards compatibility issues. The current ULPs/Apps will still use the default API address and the protocol assigned SID as connection qualifier. The new API ensures that locally SID conversion takes place. The use of protocol defined range of SIDs ensures that remote side knows to parse private data according to proposed protocol format. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 275 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message- From: Sean Hefty [mailto:[EMAIL PROTECTED] Sent: Friday, November 11, 2005 12:43 PM To: Kanevsky, Arkady Cc: Sean Hefty; [EMAIL PROTECTED]; [EMAIL PROTECTED]; openib-general@openib.org Subject: Re: [openib-general] RE: [dat-discussions] socket based connectionmodel for IB proposal - round 3 Kanevsky, Arkady wrote: So what you are proposing is that Listener will specify IETF port (2 bytes). CM will generate an IB SID to listen on. That SID will have wildcarding for 24 bits. The requestor will specify: version, IP version, SRC port and DST port. Based on that CM will generate the SID to send request to. No, the listener or requester generate the SID, not the IB CM - the same way SDP works today. It will also encode IP addresses into Private data based on IP version. This makes IP addresses, SIDs and private data format interdependent and not orthogonal which it is now. It also changes the meaning of SID which currently has a meaning of TCP port. I'm not proposing this. I'm merely stating that is is a valid option to consider. The private data format and SIDs are not orthogonal anyway. The port number's embedded in the SID, and the SID indicates the format of the private data. They are interdependent by definition. If it's okay to put the destination port number in the SID, why not the protocol type, or IP version? It also does not allow to use the private data formating for other SIDs. Private data is private. It should not be owned, set, interpreted, modified, or touched
RE: [openib-general] RE: [dat-discussions] socket based connectionmodel for IB proposal - round 3
So what you are proposing is that Listener will specify IETF port (2 bytes). CM will generate an IB SID to listen on. That SID will have wildcarding for 24 bits. The requestor will specify: version, IP version, SRC port and DST port. Based on that CM will generate the SID to send request to. It will also encode IP addresses into Private data based on IP version. This makes IP addresses, SIDs and private data format interdependent and not orthogonal which it is now. It also changes the meaning of SID which currently has a meaning of TCP port. It also does not allow to use the private data formating for other SIDs. It looks like a big hack. Is it worth it for extra 4 bytes of private data for Consumers? Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 275 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message- From: Sean Hefty [mailto:[EMAIL PROTECTED] Sent: Thursday, November 10, 2005 6:53 PM To: Kanevsky, Arkady; Sean Hefty Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; openib-general@openib.org Subject: RE: [openib-general] RE: [dat-discussions] socket based connectionmodel for IB proposal - round 3 If you want to maximize consumer usable private data, then you can move the version, IP version, protocol, source and destination ports into the service ID. Not at the expense of redefining what Service ID is. How do you propose to move all these fields into Service ID without violating IBTA spec Annex A3.2.? Remember Service ID is what responder advertize and requestor sends communucation requests to. It may be possible to server to advertize multiple service IDs to cover version and IP version variations but it will not be symmetrical to iWARP. Port is port (service ID) and address is address. Port does not encode IP version. The service ID could be formatted as: Set ID: 24 Version: 4 IP version:4 Src port: 16 Dst port: 16 I don't see how this violates the spec. Beyond the set ID, the rest is defined as any. It's not necessary, but it does save 4 bytes of private data for the user. Separately, if there's any defined mapping to a service ID or set of service IDs, then the service ID indicates the format of the private data. No additional information is needed in the CM REQ, such as using a reserve bit. That is a good point. But this restricts the usage of IP addressing only to these ports. It doesn't restrict the usage at all. It defines a portion of the private data for a specific range of service IDs, the same way it is done for SDP. There's no restriction that other service IDs not use the same format. Even with the proposal to use a reserved bit in the CM, a particular service could format its private data this way, not set the bit, and still be spec compliant. The question is what is easier to check 1 bit or Service ID. Of course, service ID will have to be checked anyhow to direct the request. Exactly. If the service ID is checked anyway, why set the bit? While this overloads the semantic meaning of Service ID it is a viable method. How is this not viable? There's a _working_ implementation today for both userspace and kernel mode clients to connect using IP addressing that didn't require any modifications to the IB CM. To be clear, the CM REQ _carries_ the IP address. There should be no requirement that the CM performs the mapping, and I see no reason why it should even care. Can you elaborate on this? Is this addresses who populates the formated portion of the provate data? I'm referring to who formats the private data and performs the mapping to the service IDs (slide 13) - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] RE: [dat-discussions] socket based connection model for IB proposal - round 3
Fixed the bit value for formating indicator. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 275 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 IP Address Support by InfiniBand CM_v3.pdf Description: IP Address Support by InfiniBand CM_v3.pdf ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] ping over IPoIB does not work between 2 cards on the same host
I have a host with 2 HCAs (dual port each but I only connected one port per machine) connected to a switch. When IPoIB configured I ping cards own IP address it works. I can ping another machines with their HCA cards configured with IPoIB fine. And I can ping both local IP addresses from remote machine(s) Details: ifconfig ib1 192.168.0.1 netmask 255.255.0.0 ifconfig ib3 192.168.0.3 netmask 255.255.0.0 On remote machine: ifconfig ib0 192.168.1.0 netmask 255.255.0.0 Locally: ping -I ib3 192.168.0.3 PING 192.168.0.3 (192.168.97.3) from 192.168.0.3 ib3: 56(84) bytes of data. 64 bytes from 192.168.0.3: icmp_seq=0 ttl=64 time=0.028 ms ping -I ib1 192.168.0.1 PING 192.168.0.1 (192.168.97.1) from 192.168.0.1 ib1: 56(84) bytes of data. 64 bytes from 192.168.0.1: icmp_seq=0 ttl=64 time=0.028 ms # ping -I ib3 192.168.1.0 PING 192.168.1.0 (192.168.1.0) from 192.168.0.3 ib3: 56(84) bytes of data. 64 bytes from 192.168.1.0: icmp_seq=0 ttl=64 time=1.81 ms From remote host: # ping -I ib0 192.168.0.1 PING 192.168.0.1 (192.168.0.1) from 192.168.1.0 ib0: 56(84) bytes of data. 64 bytes from 192.168.0.1: icmp_seq=0 ttl=64 time=0.086 ms # ping -I ib0 192.168.0.3 PING 192.168.0.3 (192.168.0.3) from 192.168.1.0 ib0: 56(84) bytes of data. 64 bytes from 192.168.0.1: icmp_seq=0 ttl=64 time=0.086 ms Locally between 2 cards:# ping -I ib3 192.168.0.1 PING 192.168.0.1 (192.168.0.1) from 192.168.0.3 ib3: 56(84) bytes of data. From 192.168.0.3 icmp_seq=1 Destination Host Unreachable From 192.168.0.3 icmp_seq=2 Destination Host Unreachable From 192.168.0.3 icmp_seq=3 Destination Host Unreachable Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 275 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [swg] RE: [openib-general] RE: [dat-discussions] round 2 - proposal forsocket based connection model
Of course, you can encode versions into service Id. But that will mix concepts. And I do not believe that is worse it to provide a couple more bytes of Consumer private data. This encoding will not be enough to give Consumer 64 bytes of private data. The port numbers are mapped differently for different protocol numbers (families). If we only concern with TCP port mapping this will not be needed. But ULP right now make its decision by standard socket 5-tuple which does include it. I prefer that we do not require any changes in ULP to run over IB. We can do that in the API if there is no need to support more than just TCP. IN this case API can always return the protocol number for TCP to a Consumer. One concern I have is that some existing ULPs (say SDP) rely on the existing format of the private data. Thus, it would not want to use this CM encoding. I do not want to force it to change. Thus, a bit in CM which indicate whether encoding is present looks like a right approach. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message- From: Yaron Haviv [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 26, 2005 12:21 PM To: Kanevsky, Arkady; Sean Hefty Cc: [EMAIL PROTECTED]; openib-general@openib.org; [EMAIL PROTECTED] Subject: [swg] RE: [openib-general] RE: [dat-discussions] round 2 - proposal forsocket based connection model -Original Message- From: [EMAIL PROTECTED] [mailto:openib-general- [EMAIL PROTECTED] On Behalf Of Kanevsky, Arkady Sent: Tuesday, October 25, 2005 1:26 PM To: Sean Hefty Cc: [EMAIL PROTECTED]; openib-general@openib.org; dat- [EMAIL PROTECTED] Subject: RE: [openib-general] RE: [dat-discussions] round 2 - proposal forsocket based connection model Think of a single API that supports iWARP and IB (transport independent API). To a connection listener it provides the IP 5-tuple + private data. For IB it means that CM parses REQ and extracts IP 5-tuple as separate fields from private data. Listener does not parse the private data encoding of the proposal. So CM need to know if it need to encode IP 5-tuple on requestor side and if need to parse on responder side. Arkady Arkady, I agree with Sean you can encode the Dest Port in the ServiceID And if you really want to verify its using that format you can look at the upper 48 bits in the serviceID. We may need to distinguish between Explicit RDMA protocols (iSER, NFS-RDMA, RDP, etc') and Implicit RDMA (SDP, where the Socket application doesn't know it is using RDMA), this can be done in 3 ways: a. port mapper, b. different ServiceID prefix, or c. a bit in the CM REQ Header. Also I'm not sure why we need the Protocol (UDP, TCP, SCTP, ..) since we emulate RDMA we shouldn't care if its TCP or SCTP, and UDP is unconnected and cant drive RDMA anyway Yaron Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message- From: Sean Hefty [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 25, 2005 1:08 PM To: Kanevsky, Arkady Cc: Caitlin Bestler; [EMAIL PROTECTED]; openib-general@openib.org; [EMAIL PROTECTED] Subject: Re: [openib-general] RE: [dat-discussions] round 2 - proposal for socket based connection model Kanevsky, Arkady wrote: Correct. But this does bring the question how responder CM knows that it need to parse the private data. I suspect this will be done via new version of CM. But a suage of some of the CM REQ reserved fields are also possible. Anotherwords the current CM version assumes that CM only supports one version and there is no need to support more than 1 version. The responder knows how to parse the private data based on the service ID that they're listening on. This is how it's done today, and how it will still need to be done. What is the motivation to change it? What data is beyond the addressing? How does the responder know how to interpret that? - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] round 2 - proposal for socketbased connectionmodel
This is the whole purpose of the protocol. It is OS independent and ensures interoperability. Nobody will change their OS protocol implementation so it can communicate to Linux (or any other OS or vendor) that invented its own protocol... It is not OS (linux no exception) job to invent protocols. But I think this argument have been bitten enough already. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message- From: Woodruff, Robert J [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 26, 2005 11:33 AM To: Kanevsky, Arkady; Sean Hefty Cc: [EMAIL PROTECTED]; openib-general@openib.org Subject: RE: [openib-general] round 2 - proposal for socketbased connectionmodel Arkady wrote, This is what we are trying to avoid. ULP should not change regardless whether or not it is running on IB, iWARP, VIA or any other RDMA transport. The whole point of the CMA is that the ULP can code to an API that is independent of RDMA interconnect. The CMA wire protocol can be documented to allow non-Linux hosts to connect to a Linux box using the same protocol. There is no need to change the existing IB CM protocol to accomplish this. All that is needed is to document that CMA protocol (contained in the private data field of the IB CM requests). woody ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] RE: [dat-discussions] round 2 - proposal for socket based connection model
Title: Message Caitlin, how does it change the proposed protocol? Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message-From: Caitlin Bestler [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 25, 2005 12:36 PMTo: [EMAIL PROTECTED]; openib-general@openib.org; [EMAIL PROTECTED]Subject: [openib-general] RE: [dat-discussions] round 2 - proposal for socket based connection model On an IP network, a non-privileged user is generally not capable of forging a source IP address and is typically prevented from using certain source ports. I would propose that the CM [MAY|SHOULD|MUST] enforce that a non-privileged user can only use aSource IP Address and Port that they would have been able to use following the normal stack path (or what it would have been in the case that there is no conventional IP stack associated with this path). So if IPoIB is installed, you would not be able to use any address that you would have been blocked from using over IPoIB. Or at least you would not be guaranteed that you could. I think that MUST is the correct level of enforcement, but it needs to be clear that the CM and OS *MAY* do this checking and that a userspace IB application cannot use the IB stack to perform IP spoofing. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kanevsky, ArkadySent: Tuesday, October 25, 2005 9:00 AMTo: openib-general@openib.org; [EMAIL PROTECTED]; [EMAIL PROTECTED]Subject: [dat-discussions] round 2 - proposal for socket based connection model Dear OpenIB, SWG and DAT members, enclosed is teh second version of the proposal. There are really 2 proposals that are related. The first one is encoding IP 5-tuple into REQ private data with small additional info for versioning and IB capabilities. The second is just a couple of ideas, not areal proposal, on maping of IP ports to IB Service IDs. Thanks everybody for tons of feedback and deep discussions. I appologize if I had missed something. Happy reading, Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 YAHOO! GROUPS LINKS Visit your group "dat-discussions" on the web. To unsubscribe from this group, send an email to:[EMAIL PROTECTED] Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] RE: [dat-discussions] round 2 - proposal for socket based connection model
Correct. But this does bring the question how responder CM knows that it need to parse the private data. I suspect this will be done via new version of CM. But a suage of some of the CM REQ reserved fields are also possible. Anotherwords the current CM version assumes that CM only supports one version and there is no need to support more than 1 version. This proposal may change this assumption. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message- From: Sean Hefty [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 25, 2005 12:56 PM To: Caitlin Bestler Cc: Kanevsky, Arkady; [EMAIL PROTECTED]; openib-general@openib.org; [EMAIL PROTECTED] Subject: Re: [openib-general] RE: [dat-discussions] round 2 - proposal for socket based connection model Caitlin Bestler wrote: I believe it requires a CM protocol version change, or a IP Address Header present bit. Basically, userspace consumers can supply *any* 72 bytes of private data currently. To maintain backwards compatability you need an authenticator that says this IP header data vouched for by privileged components on this end, and that authenticator cannot be within the private data. I believe that the solution is keep the CM protocol as is. The CM private data should be completely controlled by the service. The IB CM does not care if an IP address is in the private data or not. My reading of the proposal is that it defines a private data format that a particular service may or may not use. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] RE: [dat-discussions] round 2 - proposal for socket based connection model
Think of a single API that supports iWARP and IB (transport independent API). To a connection listener it provides the IP 5-tuple + private data. For IB it means that CM parses REQ and extracts IP 5-tuple as separate fields from private data. Listener does not parse the private data encoding of the proposal. So CM need to know if it need to encode IP 5-tuple on requestor side and if need to parse on responder side. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message- From: Sean Hefty [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 25, 2005 1:08 PM To: Kanevsky, Arkady Cc: Caitlin Bestler; [EMAIL PROTECTED]; openib-general@openib.org; [EMAIL PROTECTED] Subject: Re: [openib-general] RE: [dat-discussions] round 2 - proposal for socket based connection model Kanevsky, Arkady wrote: Correct. But this does bring the question how responder CM knows that it need to parse the private data. I suspect this will be done via new version of CM. But a suage of some of the CM REQ reserved fields are also possible. Anotherwords the current CM version assumes that CM only supports one version and there is no need to support more than 1 version. The responder knows how to parse the private data based on the service ID that they're listening on. This is how it's done today, and how it will still need to be done. What is the motivation to change it? What data is beyond the addressing? How does the responder know how to interpret that? - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] RE: [dat-discussions] round 2 - proposal for socket based connection model
Sean, The reason IBTA is interested to address IP address issue is because of multiple UPLs and APIs want to support socket based connection model. Sure each one of them can define its own protocol (for private data). But this will not ensure interoperability. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message- From: Sean Hefty [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 25, 2005 1:34 PM To: Kanevsky, Arkady Cc: Caitlin Bestler; openib-general@openib.org; [EMAIL PROTECTED] Subject: Re: [openib-general] RE: [dat-discussions] round 2 - proposal for socket based connection model Kanevsky, Arkady wrote: Think of a single API that supports iWARP and IB (transport independent API). The CMA implements this today and did not require any changes to the IB CM. To a connection listener it provides the IP 5-tuple + private data. For IB it means that CM parses REQ and extracts IP 5-tuple as separate fields from private data. Why push this down into the CM? The CM should operate on IB addresses, not IP addresses. The mapping of IP addresses to IB addresses is done at a higher level. Listener does not parse the private data encoding of the proposal. The listener is the one who cares about the IP addressing. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] RE: [dat-discussions] round 2 - proposal for socket based connection model
It is APIs not ULPs that are concern. Each ULP can define its own protocol. But APIs can not. But defining a protocol for each ULP is also bad. This proposal defines it for all ULPs. If ULP uses API, it does the parsing. If ULP uses verbs it can do the parsing and encoding itself. But in the later case it will have to have a different ULP CM for each transport. Bad idea. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message- From: Sean Hefty [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 25, 2005 1:52 PM To: Kanevsky, Arkady Cc: Caitlin Bestler; openib-general@openib.org; [EMAIL PROTECTED] Subject: Re: [openib-general] RE: [dat-discussions] round 2 - proposal for socket based connection model Kanevsky, Arkady wrote: Sean, The reason IBTA is interested to address IP address issue is because of multiple UPLs and APIs want to support socket based connection model. Sure each one of them can define its own protocol (for private data). But this will not ensure interoperability. There's no interoperability between different ULPs anyway. Each does define its own protocol. Trying to standardize part of the CM REQ private data doesn't help in this regard. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] round 2 - proposal for socket based connectionmodel
Title: Message Sean, answers in-line. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message-From: Sean Hefty [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 25, 2005 1:05 PMTo: Kanevsky, Arkady; openib-general@openib.org; [EMAIL PROTECTED]Subject: RE: [openib-general] round 2 - proposal for socket based connectionmodel Dear OpenIB, SWG and DAT members, enclosed is teh second version of the proposal. There are really 2 proposals that are related. The first one is encoding IP 5-tuple into REQ private data with small additional info for versioning and IB capabilities. The second is just a couple of ideas, not areal proposal, on maping of IP ports to IB Service IDs. Comments on the private data format: Combine major/minor version into a single field. Theres no advantage to have two fields, so keep it simple.[AK]agree Remove ZB and SI bits. These are unrelated to socket addressing.[AK]That is true these are unrelated to socket addressing. But sinceseveral ULPs over IB need this info it can be added to the generic CM extensionsfor IB. I will rename the proposal to deal with it. I prefer a single private data formating proposalrather then several layered on top ofeach other. If IBTA think this is genericenough and want to redefinesomereserved fields for it - good. This is captured in discussion slides. If the destination port number is encoded in a service ID, then it can be removed from the private data.[AK]This is dependent on how port mapping to Service ID is done.But if SDP willincorporate this into hello-wold protocol thismay still be needed.With 64-bytesConsumer private data requirement relaxed saving 2 bytes will not make much difference. The transport protocol number could also be encoded in the service ID and removed from the private data. Actually, the version, IP version, and source port could all be encoded in the service ID, limiting the private data to just 32 bytes of IP addresses.[AK]EncodingIP version into Service ID sounds strange. Service ID is a pprt equivalent. Sure it is much larger than IP ports but why does CM extensions should encode more than port into it? Even with this Consumer private data is still only 60 bytes (not old 64-bytes requirement). - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] round 2 - proposal for socket based connectionmodel
What are you trying to achieve? I am trying to define an IB REQ protocol extension that support IP connection 5-tuple exchange between connection requestor and responder. And define mapping between IP 5-tuple and IB entities. That way ULP which was written to TCP/IP, UDP/IP, CSTP/IP (and so on) can use RDMA transport without change. To modify ULP to know that it runs on top of IB vs. iWARP vs. (any other RDMA transport) is bad idea. It is one thing to choose proper port to connect. Completely different to ask ULP to parse private data in transport specific way. The same protocol must support both user level ULPs and kernel level ULPs. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message- From: Sean Hefty [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 25, 2005 3:22 PM To: Kanevsky, Arkady Cc: Sean Hefty; openib-general@openib.org; [EMAIL PROTECTED] Subject: Re: [openib-general] round 2 - proposal for socket based connectionmodel Kanevsky, Arkady wrote: Sean, answers in-line. Arkady At this point, I'm just going to disagree with this approach and move on with the current implementation of the CMA. What's needed is a service that provides IB connections using TCP/IP addressing. I don't believe this proposal meets this goal. To meet the requirement of connecting over IB using TCP/IP addressing, I believe that we need a service with a reserved service identifier or range of identifiers, a mechanism for mapping between IP and IB addresses, and a mechanism for reversing the mapping. I don't see where the proposal addresses the bulk of the work that's required, nor do I think that it will present an API to the user that does not expose IB related addressing (such as service IDs). - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [swg] Re: [openib-general] TCP/IP connection service over IB
DAPL also strip this private data header and present to Consumer IP addresses and ports as separate items from Consumer private data. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message- From: Tom Tucker [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 25, 2005 5:52 PM To: Ted H. Kim Cc: [EMAIL PROTECTED]; openib-general Subject: Re: [swg] Re: [openib-general] TCP/IP connection service over IB On Tue, 2005-10-25 at 13:16 -0700, Ted H. Kim wrote: Tom, Some comments inline ... Tom Tucker wrote: I think it's relevant, so let's make sure my assumptions are correct: - The ITAPI will be a ULP on OpenIB ITAPI is like uDAPL, so if uDAPL is a ULP then the answer is yes. The point is that for uDAPL you have the actual app running over uDAPL. So I guess it's a matter of terminology whether uDAPL is a ULP or is it some sort of middleware with the app being the ULP. Yeah, you're right the terminology is probably a little goofy. The reason for the goofosity is that some of the ulp really are protocols (ISER, IPoIB), and some are API (DAPL, MPI). All use the same interface to register with OpenIB. But that said, yes, ITAPI is like uDAPL. - The ITAPI will create the IRD/ORD headers in its private data and submit this as part of its connection establishment. - The ITAPI consumer at the remote peer will use this data to configure it's local QP before accepting the connection Over IB, the IRD/ORD private data will be prepended with a private data header that contains the source and destination IP addresses, source port, etc... The remote peer will not see this data as part of the private data, but rather will see it in the CMA event in the upcall. Over IB, the IRD/ORD data is already built in to the standard CM stuff (i.e. the responder resources and initiator depth fields of REQ and REP). So no additional demands are made on private data for IB in ITAPI for the IOH purpose. Of course the ITAPI app (like a uDAPL app) can also use private data for app specific/ULP reasons. ok -- bad example. Sorry. This is a weird one. On iWARP, you need the private data header to pass this stuff along and on IB, you don't. What I was trying to say is that whatever the private data, on IB it will get a private data header prepended and on iWARP, it won't. Over iWARP/MPA, there will be nothing else in the private data except what was provided by the consumer (ITAPI in this case). The reason being that this extra information (IP addressing info) is in the protocol header proper. Just to restate for clarity, ITAPI for iWARP will use the first 16 bytes of MPA private date for the IOH (IRD/ORD header). The rest is usable for app/ULP reasons. Yessir. And in fact, the ITAPI CM will strip this stuff before presenting it to the app. I should point out that there was once a proposal of doing a RDDP IETF draft which would have sub-divided the MPA private data into a middleware section and an app section. The idea was to be sure that the app/ULP and middleware (e.g. the IOH) uses of private data would not step on each other. I think this idea did not progress, mostly because the author (John Carrier, formerly of Adaptec) changed jobs and was no longer working on iWARP stuff. While not directly proposed, this idea could have been carried over to IB. Some of the ideas on this thread are already implicitly doing this middleware (for IP addressing purpose) vs ULP/app split. I think we are grappling with a lot of these layering issues now. We are also grappling with protocol vs. implementation issues. Keep it coming, because this is exactly the kind of feedback I think we need. -ted ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] round 2 - proposal for socket basedconnection model
No. iWARP does not have to pass this info. The info is needed for IB because ZB and SI were introduced in IBTA 1.2 specs as optional functionality. So if ULP wants to use that functionality it need to find out whether remote side can support it. This is needed for backwards compatibility. For example iSER protocol defines the use of remote invalidate but obviously can not be done if remote side can not support it. I do not recall right now whether iWARP defined that functionality as required or optional. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message- From: Tom Tucker [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 25, 2005 5:56 PM To: Kanevsky, Arkady Cc: [EMAIL PROTECTED]; openib-general@openib.org Subject: RE: [openib-general] round 2 - proposal for socket basedconnection model Arkady: I may actually have a constructive comment about the protocol (private data format). One thing I noticed is that *almost* everything in the private data header is available in the native iWARP protocol header except the ZB and SI bits. If these bits become part of the canonical private data header, then does that require an iWARP transport to use the header too even though only two bits are useful? Sorry if this is a dumb question, Tom On Tue, 2005-10-25 at 16:40 -0500, Tom Tucker wrote: Arkady: I don't think anyone disagrees with your goals. Unfortunately additional requirements on the implementation were coupled with the specification of the private data format (protocol). This peripheral discussion derailed any attempt to discuss the protocol. Attempts to separate the protocol discussion from the implementation failed. And so here we are... On Tue, 2005-10-25 at 15:38 -0400, Kanevsky, Arkady wrote: What are you trying to achieve? I am trying to define an IB REQ protocol extension that support IP connection 5-tuple exchange between connection requestor and responder. And define mapping between IP 5-tuple and IB entities. That way ULP which was written to TCP/IP, UDP/IP, CSTP/IP (and so on) can use RDMA transport without change. To modify ULP to know that it runs on top of IB vs. iWARP vs. (any other RDMA transport) is bad idea. It is one thing to choose proper port to connect. Completely different to ask ULP to parse private data in transport specific way. The same protocol must support both user level ULPs and kernel level ULPs. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message- From: Sean Hefty [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 25, 2005 3:22 PM To: Kanevsky, Arkady Cc: Sean Hefty; openib-general@openib.org; [EMAIL PROTECTED] Subject: Re: [openib-general] round 2 - proposal for socket based connectionmodel Kanevsky, Arkady wrote: Sean, answers in-line. Arkady At this point, I'm just going to disagree with this approach and move on with the current implementation of the CMA. What's needed is a service that provides IB connections using TCP/IP addressing. I don't believe this proposal meets this goal. To meet the requirement of connecting over IB using TCP/IP addressing, I believe that we need a service with a reserved service identifier or range of identifiers, a mechanism for mapping between IP and IB addresses, and a mechanism for reversing the mapping. I don't see where the proposal addresses the bulk of the work that's required, nor do I think that it will present an API to the user that does not expose IB related addressing (such as service IDs). - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] round 2 - proposal for socket based connectionmodel
Sean Hefty wrote: -Original Message- From: Sean Hefty [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 25, 2005 6:44 PM To: Kanevsky, Arkady Cc: Sean Hefty; openib-general@openib.org; [EMAIL PROTECTED] Subject: Re: [openib-general] round 2 - proposal for socket based connectionmodel Kanevsky, Arkady wrote: What are you trying to achieve? I'm trying to define a connection *service* for Infiniband that uses TCP/IP addresses as its user interface. That service will have its own protocol, in much the same way that SDP, SRP, etc. do today. I am trying to define an IB REQ protocol extension that support IP connection 5-tuple exchange between connection requestor and responder. Why? What need is there for a protocol extension to the IB CM? To me, this is similar to setting a bit in the CM REQ to indicate that the private data format looks like SDP's private data. The format of the _private_ data shouldn't be known to the CM; that's why it's private data. There is no requirement that the remote side uses the same Linux CM. So in order to achieve interopability you need a protocol. SDP hello-world protocol is defined for SDP. We are defining an equivalent that is ULP independent. If CM is not involved then it is ULP that populate the 5-tuple info on requestor side and parses it on the remote side. Thus, make ULP CM IB specific. This is what we are trying to avoid. ULP should not change regardless whether or not it is running on IB, iWARP, VIA or any other RDMA transport. iWARP does not need private data to pass 5-tuple. And define mapping between IP 5-tuple and IB entities. No mapping between IP - IB addresses was defined in the proposal. Defining this mapping is required to make this work. Right now, the mapping is the responsibility of every user. That way ULP which was written to TCP/IP, UDP/IP, CSTP/IP (and so on) can use RDMA transport without change. A ULP written to TCP/IP can use an RDMA transport without change. They use SDP. However, an application that wants to take advantage of QP semantics must change. (And if they want to take full advantage of RDMA, they'll likely need to be re-architected as well.) The goal in that case becomes to permit them to establish connections using TCP/IP addresses. To meet this goal, we need to define how to map IP address to and from IB addresses. That mapping is part of the protocol, and is missing from the proposal. And if the application isn't going to know that they're running on Infiniband, then the mapping must also include mapping to a destination service ID. To modify ULP to know that it runs on top of IB vs. iWARP vs. (any other RDMA transport) is bad idea. It is one thing to choose proper port to connect. Completely different to ask ULP to parse private data in transport specific way. The same protocol must support both user level ULPs and kernel level ULPs. Defining an interface that allows a ULP to use either iWarp, IB, or some other random RDMA transport is an implementation issue. However, it requires something that maps IP to IB addresses (including service IDs). To be more concrete, you've gone from having source and destination TCP/IP addresses to including them in a CM REQ. What translated the source and destination IP addresses into GIDs and a PKey? Who converted those into IB routing information? How was the destination of the CM REQ determined? What service ID was selected? IPoIB defines IP - GID Port - IB Service ID (part of this proposal) Pkey is configuration setup done by administrator. Ditto for VLAN. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] configuring ipoib
Title: Message How do you configure ipoib? I used "ifconfig ib0 ip_address" which works fine. But if I have several ports on an HCA how do I specify which port ip_address should be associated with? Ditto if you have multiple cards. Thanks, Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] FW upgrade for TopSpin cards
Roland, sorry to bug you on that but... I have a Cisco HCA (PCI-X) hca_typeMTS23108 hw_rev a1 fw_ver 1.18.0 hca_type and hw_rev are clearly Mellanox nomenclature. I suspect that this is Cisco FW version #. But all OpenIB documentation is with respect to Mellanox nomenclature. For example from http://www.openib.org/docs/ipoib_faq.txt 1. Verify the firmware version via cat /sys/class/infiniband/mthca0/fw_ver For PCI-X HCAs, version 3.2.0 is recommended. For PCIe HCAs, version 4.5.3 is recommended. * Is there analogous documentation for Cisco FW? Where is that FW (this is Cougar card)? Are Cisco FWs and Mellanox FW the same? If yes what is the correspondance between the 2 numbering schemas. While this specific question is for Cougar card, the answer should be generic and cover all HCAs. Can the documentation be updated to cover all supported HW regardless of the vendor? Thanks, Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message- From: Roland Dreier [mailto:[EMAIL PROTECTED] Sent: Thursday, October 20, 2005 1:48 PM To: Kanevsky, Arkady Cc: openib-general@openib.org Subject: Re: [openib-general] FW upgrade for TopSpin cards Arkady I get a bunch of warnings (see below). All of the warnings look benign (although you might want to synchronize the clock between your build system and your file server). Arkady Can I use OpenIB tvflash to upgrade FW on a TopSpin card? Yes. Arkady Can I use OpenIB mstflint for it? Yes. Arkady Which version of the utilities should I use? I would use the latest subversion revision. Arkady Why warning when I build it? Because gcc 4.0 added a bunch of semi-bogus pointer sign warnings, and you clocks are out of synch. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] RE: [openib-general] Re: iWARP emulationprotocol
OK. I will update the proposal for IBTA based on this feedback and all other feedback posted. I will still separate private data usage proposal and port mapping one. If your Apps depends on 64 bytes of private data, please, raise your voice now. ARkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message- From: Richard Frank [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 19, 2005 7:19 PM To: Richard Frank; Lentini, James; Roland Dreier Cc: [EMAIL PROTECTED]; openib-general@openib.org; Davis, Arlin R Subject: Re: [dat-discussions] RE: [openib-general] Re: iWARP emulationprotocol It's probably fine to go ahead and reduce the IPC private data - I think we (Oracle) can work around this. - Original Message - From: Richard Frank [EMAIL PROTECTED] To: James Lentini [EMAIL PROTECTED]; Roland Dreier [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; openib-general@openib.org; Davis, Arlin R [EMAIL PROTECTED] Sent: Wednesday, October 19, 2005 7:12 PM Subject: Re: [dat-discussions] RE: [openib-general] Re: iWARP emulationprotocol Oracle's uDAPL ipc implementation uses 64 bytes of private connection data - currently - some of this is the result of having 64 bytes to use at the start - so we designed around this. We can probably reduce this somewhat. And of course if we want to rewrite our connection handling for uDAPL (add our own wire protocol) we can probably skip using the uDAPL connection data all together. For RDS we use our own connection data sent via datagrams which has always been part of the Oracle UDP ipc implementation. - Original Message - From: Roland Dreier [EMAIL PROTECTED] To: James Lentini [EMAIL PROTECTED] Cc: Richard Frank [EMAIL PROTECTED]; [EMAIL PROTECTED]; openib-general@openib.org; Davis, Arlin R [EMAIL PROTECTED] Sent: Wednesday, October 19, 2005 5:56 PM Subject: Re: [dat-discussions] RE: [openib-general] Re: iWARP emulationprotocol James The D is somewhat misleading. It refers to the James functionality provider to the consumer application. Right, that's what we're talking about. The RDS implementation only needs a few bytes of private data on top of the IP address info. So the RDS implementation itself is clearly OK with any of the proposals being discussed here. However, Rick mentioned that Oracle needs 64 bytes of private data in both directions for connections. My question was how Oracle works on top of RDS, which does not provide any private data to consumers. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] RE: [openib-general] Re: iWARP emulationprotocol
The updated proposal will have IP addresses and TCP ports of src and dst in private data. How TCP ports are mapped to IB service IDs is a separate proposal. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message- From: Sean Hefty [mailto:[EMAIL PROTECTED] Sent: Thursday, October 20, 2005 11:51 AM To: Kanevsky, Arkady Cc: Richard Frank; Lentini, James; Roland Dreier; [EMAIL PROTECTED]; openib-general@openib.org; Davis, Arlin R Subject: Re: [dat-discussions] RE: [openib-general] Re: iWARP emulationprotocol Kanevsky, Arkady wrote: I will update the proposal for IBTA based on this feedback and all other feedback posted. I will still separate private data usage proposal and port mapping one. Again, I think that these should be in the same proposal. The CM REQ carries the IB transport layer address. The goal here is to map another transport layer address to the IB one. The source port is included in the private data. By not including the destination port, there's an assumption that it's provided somewhere else in the CM REQ. We should either make this explicit, or put the destination port in the private data as well. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] RE: [openib-general] Re: iWARP emulationprotocol
with both SRC and DST IP addresses and TCP ports all these models will be supported. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message- From: Caitlin Bestler [mailto:[EMAIL PROTECTED] Sent: Thursday, October 20, 2005 12:26 PM To: Sean Hefty; Kanevsky, Arkady Cc: [EMAIL PROTECTED]; openib-general@openib.org; Lentini, James; Davis, Arlin R Subject: RE: [dat-discussions] RE: [openib-general] Re: iWARP emulationprotocol -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Sean Hefty Sent: Thursday, October 20, 2005 8:51 AM To: Kanevsky, Arkady Cc: [EMAIL PROTECTED]; openib-general@openib.org; Lentini, James; Davis, Arlin R Subject: Re: [dat-discussions] RE: [openib-general] Re: iWARP emulationprotocol Kanevsky, Arkady wrote: I will update the proposal for IBTA based on this feedback and all other feedback posted. I will still separate private data usage proposal and port mapping one. Again, I think that these should be in the same proposal. The CM REQ carries the IB transport layer address. The goal here is to map another transport layer address to the IB one. The source port is included in the private data. By not including the destination port, there's an assumption that it's provided somewhere else in the CM REQ. We should either make this explicit, or put the destination port in the private data as well. Under the general programming model for an IP-centric daemon, the listener can assume that connection requests will be for the TCP port that the listen was issued upon. However, the daemon typically listens on *all* addresses that the system supports. It is not uncommon for the application to note which destination address was actually requested and to vary the service provided based upon that. This is what makes it possible for single machines to host vast numbers of web sites. It is less common, but still requiring support, for the daemon to differentiate service based upon the source address. It is more common to simply refuse service based upon the source address, which can be handled by the CM or firewall itself rather than by the application, but there are exceptions. Some web-sites have intranet versus internet verions. Some file servers control access lists based upon source address. It is actually quite effective when combined with network authentication of source addresses. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] FW upgrade for TopSpin cards
Title: Message I want to upgrade FW on several TopSpin cards I have. There is tvflash utility in gen2/trunk/src/userspace/tvflash I tried to build tvflash on 2.6.13.3 system I have. I get a bunch of warnings (see below). gcc version is gcc version 4.0.0 20050519 (Red Hat 4.0.0-8). What's the story? Can I use OpenIB tvflash to upgrade FW on a TopSpin card? Can I use OpenIB mstflint for it? Which version of the utilities should I use? Why warning when I build it? Arkady ** # make make: Warning: File `.deps/src_tvflash-tvflash.Po' has modification time 1.8e+04 s in the future make all-am make[1]: Entering directory `/u/arkady/openib/gen2/trunk/src/userspace/tvflash' make[1]: Warning: File `.deps/src_tvflash-tvflash.Po' has modification time 1.8e+04 s in the future if gcc -DHAVE_CONFIG_H -I. -I. -I. -Wall -g -O2 -MT src_tvflash-tvflash.o -MD -MP -MF ".deps/src_tvflash-tvflash.Tpo" -c -o src_tvflash-tvflash.o `test -f 'src/tvflash.c' || echo './'`src/tvflash.c; \ then mv -f ".deps/src_tvflash-tvflash.Tpo" ".deps/src_tvflash-tvflash.Po"; else rm -f ".deps/src_tvflash-tvflash.Tpo"; exit 1; fi src/tvflash.c: In function 'parse_guid': src/tvflash.c:112: warning: pointer targets in passing argument 1 of '__builtin_strchr' differ in signedness src/tvflash.c:117: warning: pointer targets in passing argument 1 of 'strrchr' differ in signedness src/tvflash.c:117: warning: pointer targets in assignment differ in signedness src/tvflash.c:135: warning: pointer targets in passing argument 1 of 'strrchr' differ in signedness src/tvflash.c:135: warning: pointer targets in assignment differ in signedness src/tvflash.c:205: warning: pointer targets in passing argument 1 of 'strtol' differ in signedness src/tvflash.c: In function 'identify_board': src/tvflash.c:702: warning: pointer targets in passing argument 1 of 'strncasecmp' differ in signedness src/tvflash.c: In function 'flash_image_read_from_file': src/tvflash.c:828: warning: pointer targets in assignment differ in signedness src/tvflash.c:830: warning: pointer targets in assignment differ in signedness src/tvflash.c:832: warning: pointer targets in assignment differ in signedness src/tvflash.c:844: warning: pointer targets in assignment differ in signedness src/tvflash.c: In function 'flash_check_failsafe': src/tvflash.c:905: warning: pointer targets in passing argument 2 of 'validate_image' differ in signedness src/tvflash.c:911: warning: pointer targets in passing argument 2 of 'validate_image' differ in signedness src/tvflash.c: In function 'create_ver_str': src/tvflash.c:1033: warning: pointer targets in passing argument 1 of 'snprintf' differ in signedness src/tvflash.c:1039: warning: pointer targets in passing argument 1 of 'snprintf' differ in signedness src/tvflash.c:1044: warning: pointer targets in passing argument 1 of 'snprintf' differ in signedness src/tvflash.c:1046: warning: pointer targets in passing argument 1 of 'strlen' differ in signedness src/tvflash.c: In function 'identify_hca': src/tvflash.c:1278: warning: pointer targets in passing argument 1 of 'sscanf' differ in signedness src/tvflash.c: In function 'identify_firmware': src/tvflash.c:1399: warning: pointer targets in passing argument 1 of 'sscanf' differ in signedness src/tvflash.c: In function 'upload_firmware': src/tvflash.c:1813: warning: pointer targets in passing argument 1 of 'parse_guid' differ in signedness src/tvflash.c:1932: warning: pointer targets in passing argument 1 of 'strlen' differ in signedness src/tvflash.c:1932: warning: pointer targets in passing argument 1 of 'strlen' differ in signedness src/tvflash.c:1932: warning: pointer targets in passing argument 1 of '__builtin_strcmp' differ in signedness src/tvflash.c:1932: warning: pointer targets in passing argument 1 of 'strlen' differ in signedness src/tvflash.c:1932: warning: pointer targets in passing argument 1 of '__builtin_strcmp' differ in signedness src/tvflash.c:1932: warning: pointer targets in passing argument 1 of '__builtin_strcmp' differ in signedness src/tvflash.c:1932: warning: pointer targets in passing argument 1 of '__builtin_strcmp' differ in signedness src/tvflash.c:1932: warning: pointer targets in passing argument 1 of 'strncmp' differ in signedness src/tvflash.c:1936: warning: pointer targets in passing argument 1 of 'strlen' differ in signedness src/tvflash.c:1936: warning: pointer targets in passing argument 1 of 'strlen' differ in signedness src/tvflash.c:1936: warning: pointer targets in passing argument 1 of '__builtin_strcmp' differ in signedness src/tvflash.c:1936: warning: pointer targets in passing argument 1 of 'strlen' differ in signedness src/tvflash.c:1936: warning: pointer targets in passing argument 1 of '__builtin_strcmp' differ in signedness src/tvflash.c:1936: warning: pointer targets in passing argument 1 of '__builtin_strcmp' differ in signedness src/tvflash.c:1936: warning: pointer
RE: [openib-general] FW upgrade for TopSpin cards
Thanks Roland. I was worried about pointer sign warnings. Clock is not an issue. Do you plan to fix the srcs so gcc 4.0 warning will not be generated? Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message- From: Roland Dreier [mailto:[EMAIL PROTECTED] Sent: Thursday, October 20, 2005 1:48 PM To: Kanevsky, Arkady Cc: openib-general@openib.org Subject: Re: [openib-general] FW upgrade for TopSpin cards Arkady I get a bunch of warnings (see below). All of the warnings look benign (although you might want to synchronize the clock between your build system and your file server). Arkady Can I use OpenIB tvflash to upgrade FW on a TopSpin card? Yes. Arkady Can I use OpenIB mstflint for it? Yes. Arkady Which version of the utilities should I use? I would use the latest subversion revision. Arkady Why warning when I build it? Because gcc 4.0 added a bunch of semi-bogus pointer sign warnings, and you clocks are out of synch. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [swg] RE: [openib-general] Re: [swg] Re: private data...
But that require changes to CM APIs vs a module on top of it to parse and populate private data field. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message- From: Sean Hefty [mailto:[EMAIL PROTECTED] Sent: Thursday, October 20, 2005 5:23 PM To: 'Fab Tillier'; 'Sean Hefty' Cc: [EMAIL PROTECTED]; openib-general@openib.org Subject: [swg] RE: [openib-general] Re: [swg] Re: private data... The same can be said of the starting local QPN, responder resource, initiator depth, starting PSN, MTU, and so forth. The CM doesn't care about these - the application does, as these settings affect how it configures its QP and what features of its protocol it can use. Not exactly the same. The connection cares about these, and must be included as part of the connection protocol. There are a number of fields that are not used by the CM state machine that are included in these MADs already. These fields are defined in the CM protocol not because they impact MAD processing in the CM, but because they represent minimum information needed to configure a QP and client. Exactly. The IP address does not configure the QP. What you're advocating is that a service ID can support two private data formats depending on if a bit in the CM REQ is set or not. (If only a single format is supported, then the bit is not needed.) This is the wrong place to store this information. The format of the data beyond the addressing information is not conveyed by this bit, so additional information about the private data format is still needed. You can grab several reserved bits from the REQ and define it as a private data version, but then apps that care about this could just as easily record the version in the private data itself. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] RE: [openib-general] Re: iWARP emulationprotocol
Title: Message Arlin, just to clarify, Intel MPI will not have problems with useing less than 64 bytes of private data. Ifa solution will provide you with 48 bytes of private data will it be sufficient? Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message-From: Davis, Arlin R [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 19, 2005 11:30 AMTo: [EMAIL PROTECTED]; Grant GrundlerCc: [EMAIL PROTECTED]; openib-general@openib.orgSubject: RE: [dat-discussions] RE: [openib-general] Re: iWARP emulationprotocol Arkady, Intel MPI (real consumer of uDAPL) has no problem with this change. -arlin From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kanevsky, ArkadySent: Wednesday, October 19, 2005 6:40 AMTo: Grant Grundler; Caitlin BestlerCc: Roland Dreier; [EMAIL PROTECTED]; [EMAIL PROTECTED]; openib-general@openib.orgSubject: [dat-discussions] RE: [openib-general] Re: iWARP emulation protocol Grant,The developers of the application(s) in questions are aware of thediscussion.I will leave it to them to respond.I bring the discussion point at the weekly DAT Collaborative meetingwhich we have every Wednesday.I appologize that the DAT Collaborative charter does not allowto submit contribution without joining DAT Collaborative.But this is no different from Linux not accepting any contrubutionswithout proper license.Byt be rest assure that as a Chair I bring the concernsand suggestions stated in email discussion at the DAT meetings.ArkadyArkady Kanevsky email: [EMAIL PROTECTED]Network Appliance phone: 781-768-5395375 Totten Pond Rd. Fax: 781-895-1195Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message- From: Grant Grundler [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 18, 2005 8:02 PM To: Caitlin Bestler Cc: Grant Grundler; Roland Dreier; Kanevsky, Arkady; [EMAIL PROTECTED]; [EMAIL PROTECTED]; openib-general@openib.org Subject: Re: [openib-general] Re: iWARP emulation protocol On Tue, Oct 18, 2005 at 04:40:54PM -0700, Caitlin Bestler wrote: Roland (and the rest of us) would like to see someone name a real consumer of the proposed interface. ie who depends onthis change? Then the dependency for that use/user can be discussed and appropriate tradeoffs made. Make sense? Unfortunately not every application that is under development, or even deployed, can be discussed in a google-searchable public forum. That especially applies to user-mode development. Well, this is open source. While I don't want to preclude closed source developement, it's usually necessary to have an open source consumer that any open source developer can test with. So I could have actually tested such applications and still not be free to cite them here. Understood. I'm not asking *you* to cite one unless you happen to own one of the consumers. With any luck some of them are following the discussion and will jump in on their own. Unfortunately, since they are developing to uDAPL they are unlikely to be following this discussion. It doesn't help that the DAT yahoo-groups.com mailing list is rejecting my replies. It would be helpful if someone following this forum could share Roland's question with DAT mailing list if it didn't make it there already and possibly explain why naming a consumer is necessary. hth, grant YAHOO! GROUPS LINKS Visit your group "dat-discussions" on the web. To unsubscribe from this group, send an email to:[EMAIL PROTECTED] Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] RE: [openib-general] Re: iWARP emulationprotocol
Sean, if look at the proposal it shows 2 ways to address this. 1. Have 2 protocols. One just send SRC IP address and port, and provdie 64 bytes to ULP. Another one send both SRC and DEST info and leaves 48(+-) bytes of private data for ULP. 2. Have 2 protocols. Split IPv4 and IPv6 methods. For IPv4 send SRC and DST addressing and 64 bytes of ULP private data. For IPv6 we have several options. a. GID=IPv6 address b. use second CM frame to have carry ULP private data. c. others But having multiple versions supported is not pleasant. It looses a simple backwards compatibility of current protocol which just formats CM private data field. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message- From: Sean Hefty [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 19, 2005 1:00 PM To: Richard Frank Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; openib-general@openib.org; Davis, Arlin R Subject: Re: [dat-discussions] RE: [openib-general] Re: iWARP emulationprotocol Richard Frank wrote: Oracle currently depends on 64 bytes of private data for connect and accept. Is any of that data used to exchange address information? It's impossible to provide both the source and destination address in the CM REQ private data and still give the user 64 bytes. The source address is needed for the reverse GID-IP lookup. Can we make due without the destination address? - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] RE: iWARP emulation protocol
uDAPL users. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message- From: Roland Dreier [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 18, 2005 2:19 PM To: Kanevsky, Arkady Cc: Yaron Haviv; openib-general@openib.org; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: iWARP emulation protocol Arkady The proposed protocol will be used by both kernel and user Arkady space Consumers. There are existing Consumers that rely Arkady on 64 bytes of private data. Which consumers are these? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: iWARP emulation protocol (was: [openib-general] RDMA connection andaddress translation API)
Sean, For the REQ to find its way to the destination, the destination address must be known beforehand. We shouldn't need to pass any data in the REP. The CMA passes both the source and destination address information in the REQ, but only uses the destination to validate against a listen request. The source address is passed to the user. CM passes IB addresses of both src and dest in REQ. How locally dest IP address is mapped to dest IB GID|LID is defined by IPoIB. We can request IBTA to define it also. But the goal is to define a protocol part in IBTA. You are correct that if rely on CM storing the IP address of the dest it is not needed to be passed back in REP. If we do not need to know that response came from a different IP address. Or a different port. The slides should also discuss how to map from a TCP/IP address to a service ID, so that a REQ can match up with the correct listener. The approach currently taken by the CMA is to use the openib OUI 48 + TCP port number. Correct. If we want IBTA to define a full mapping of addresses and ports then yes. But that does not change the protocol, it is local agreement that must be the same on both sides of the connection. I will include it in the next version. Thanks, Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: iWARP emulation protocol (was: [openib-general] RDMA connection andaddress translation API)
I think it is better to use some of the CM REQ reserved field for it so it will be separate from Addressing. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message- From: Sean Hefty [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 18, 2005 2:55 PM To: Kanevsky, Arkady Cc: Roland Dreier; Yaron Haviv; [EMAIL PROTECTED]; [EMAIL PROTECTED]; openib-general@openib.org Subject: Re: iWARP emulation protocol (was: [openib-general] RDMA connection andaddress translation API) Kanevsky, Arkady wrote: Enclosed is the proposal to IBTA to add this functionality to CM protocol. A couple of other notes. Combine major/minor version into a single version, which is what you essentially have anyway. I have no clue what zero based virtual address exception means, but that and the SI bit seem out of place in a header containing TCP/IP address information. I would say save the two bits and have a cleaner header. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Re: iWARP emulation protocol
An additional space preserving option that Arkady did not mention is limiting the IP alias service to IPv4 addresses. Anyone who really wants IPv6 addresses can get their SM to assign IPv6 compatible GIDs. Of course the flat IPv6 option is far simpler, and probably should be used unless a specific application is identified where those extra 96 bits makes the difference between making the private data be rewritten or left as is. This can be an extension to proposal 3 of last page. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message- From: Caitlin Bestler [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 18, 2005 3:16 PM To: Roland Dreier; Kanevsky, Arkady Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; openib-general@openib.org Subject: RE: [openib-general] Re: iWARP emulation protocol -Original Message- From: Roland Dreier Sent: Tuesday, October 18, 2005 11:41 AM To: Kanevsky, Arkady Subject: [openib-general] Re: iWARP emulation protocol Arkady uDAPL users. 2) Are there real users or is this a generic uDAPL API thing? uDAPL vs. kDAPL is irrelevant here. The user or Kernel Consumer making the connection does not know whether their peer is running in user or kernel, nor should they. Every discussion of reducing the guaranteed private data size in DAPL has produced adverse reactions from application developers. They're either very good actors or were working on actual applications. An additional space preserving option that Arkady did not mention is limiting the IP alias service to IPv4 addresses. Anyone who really wants IPv6 addresses can get their SM to assign IPv6 compatible GIDs. Of course the flat IPv6 option is far simpler, and probably should be used unless a specific application is identified where those extra 96 bits makes the difference between making the private data be rewritten or left as is. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] RE: iWARP emulation protocol
Sean wrote: I'm not sure how much we should care about higher level abstractions for this discussion. We should do what's right for IB. Abstractions that want to use IP addresses can either use the standard protocol defined by the IBTA or define their own private data. Correct. But we should define standard protocol suited for most apps to avoid creations of multiple apps specific protocols. To me, it seems that the most flexible solution is to pass the source and destination IP address in the CM REQ. I agree. This is the cleanest and most simple to define. But it impacts some existing apps. That is why DAT has 64 bytes private data req. We do not loose too many users by the time we define the complete solution stack. We can then define a standard mapping from TCP port numbers to IB service records, or change the CM version to read into the private data. What's wrong with this approach? It is the standard mapping which we just spend 1 hour discussing at SWG. What is that standard mapping if it is native IB? IPoIB as intermediate layer? SDP as intermediate layer? What is the standard TCP port for iSER (pick your ULP) native over RDMA vs. the same ULP over IPoIB? This have to be defined. But is it part of the IP address and TCP port info sharing between 2 sides of the connection proposal or a separate proposal? I think it is separate proposal but both will have to be in place to support iWARP emulation. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] license mismatches
I had reviewed the licenses used by files in https://openib.org/svn/gen2/trunk. The following .c and .h files do not match the OpenIB licenses: https://openib.org/svn/gen2/trunk/src/userspace/tvflash/src/tvflash.c https://openib.org/svn/gen2/trunk/src/userspace/tvflash/src/firmware.h https://openib.org/svn/gen2/trunk/src/userspace/examples/aio/ttcp.aio.c https://openib.org/svn/gen2/trunk/src/userspace/management/osm/complib/M akefile.mlx https://openib.org/svn/gen2/trunk/src/userspace/management/osm/opensm/os m_indent all files in directories: https://openib.org/svn/gen2/trunk/src/userspace/mstflint/ https://openib.org/svn/gen2/trunk/src/userspace/mpi/ files in directory https://openib.org/svn/gen2/trunk/src/userspace/libsdp/src/ have the right licenses but the copyright message does not match the OpenIB copyright. Several files do not have any licences, like Makefile, configure and map files. For example, https://openib.org/svn/gen2/trunk/src/userspace/libibcm/src/libibcm.map https://openib.org/svn/gen2/trunk/src/userspace/libibcm/Makefile.am I think this is OK. I suspect that all these are oversites and all the files should be available under both BSD and GPL2 licenses. Thanks, Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [iSER]How to get the dat_headers_1_1.tgz
The files are available to all. The posting to reflector are for members only. If you still have problems they can be made available at http://www.datcollaborative.org/. 1.2 headers are available on it. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message- From: Tom Duffy [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 03, 2005 2:41 PM To: Ian Jiang Cc: openib-general@openib.org; [EMAIL PROTECTED]; Kanevsky, Arkady Subject: Re: [openib-general] [iSER]How to get the dat_headers_1_1.tgz On Wed, 2005-08-03 at 17:14 +0800, Ian Jiang wrote: It's known to all that the kDAPL 1.1 is needed to build the iSER. I failed to get http://groups.yahoo.com/group/dat-discussions/files/dat_header s_1_1.tgz because only the members of the group could access this file This is dumb. Can Arkady just open the files up to anyone? If not, groups.yahoo.com should not be used for an open source project. -tduffy ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [iSER]How to get the dat_headers_1_1.tgz
Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message- From: Tom Duffy [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 03, 2005 3:37 PM To: Kanevsky, Arkady Cc: Ian Jiang; [EMAIL PROTECTED]; openib-general@openib.org Subject: RE: [openib-general] [iSER]How to get the dat_headers_1_1.tgz On Wed, 2005-08-03 at 15:18 -0400, Kanevsky, Arkady wrote: The files are available to all. http://groups.yahoo.com/group/dat-discussions/files/dat_header s_1_1.tgz To access Yahoo! Groups... you need a Yahoo! ID. Don't have a Yahoo! ID? Signing up is easy. that is *not* available to all... -tduffy dat_headers_1_1.tgz Description: dat_headers_1_1.tgz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] comments on DAT registry in OpenIB
Title: Message Dear DAT and OpenIB members, There isa debate going on on OpenIB and DAT reflectors which is going around about kDAT registry for Linux. I would like to review the requirements wehad agreedat DAT collaborativeand captured in the kDAT and uDAT specs and review DAT registry in OpenIB from that prospective. I would like to make it clear that I do NOT speak on behalf of DAT Collaborative but as just one of its members. Ability of Consumers to open IA based on its name OpenIB supports it Support for Consumers to get a list of available IAs to open OpenIB kDAT and uDAT registry provide this OpenIB kDAT registry no longer provide Provider attributes as stated above Preserves dat_registry_list_providers for kDAPL OpenIB changed dat_provider_info format so binary compatibility not preserved, but source compatibility is preserved. dat_provider_info differs between uDAPL and kDAPL in OpenIB Ability to enumerate available IAs and their attributes OpenIB supports that for uDAPL unchanged from DAPL SF RI for kDAPL openIB supports a single type of thread-safety defined by the Linux kernel and the version of Linux kernel defines the kDAPL APIs that kernel version supports. for kDAPL query will not return DAT version and thread safety Provider attribute. Map IA_name to Provider library (kDAPL or uDAPL) OpenIB kDAPL and uDAPL support this Ability for DAT providers to dynamically register and deregister DAPLProvider OpenIB supports that for kDAPL and uDAPL All existing registry APIs at DAPL SF RI are preserved Single static DAT registry - platform specific kDAT and uDAT specs explicitly state that the DAT registry is defined by the platform and DAT collaborative provided an example of Registry for Linux and Windows and agree that the DAT provided registry should be used by all providers. This ensures that DAT Registry will support all Providers and DAT registry from one vendor does not block other providers. OpenIB kDAT registry is the Linux platform DAT registry which achieves the goal of supporting all kDAPL providers. It also provides additional benefit that it is Linux core which maintain kDAT registry instead of DAT Collaborative OpenIB uDAT registry remains the DAT collaborative one unchanged. We can discuss whether or not we want to get uDAT registry closer to the OpenIB kDAT one The DAT registry for kDAPL and uDAPLare differentat DAPL SF RI and OpenIB maintains it. Some changes may be needed for kDAPL Registry hot plug support for OpenIB. How it may impact uDAPL registry. ia_name is under system admin control remains the same following a platform convention IA can represent single port several ports several HBAs or RNICs multiple IAs represent the same port OpenIB kDAPL currently implements #1. Members can submit code patches to support other choices OpenIB uDAPL remains the same with current implementation providing #1 under Provider control. Support for Consumers to get a list of available IAs to open OpenIB kDAT and uDAT registry provide this OpenIB kDAT registry no longer provide Provider attributes as stated above DAT registry supports loading multiple DAPLProviders intothe same address space. A Provider library loaded into an address space once A Provider library unloaded only when all open instances of its IAs are closed The same Provider library can be loaded into multiple address spaces OpenIB uDAPL continues to provide it OpenIB kDAPL supports it DAT registry shall support polymorphism (Provider independency) Consumer call DAT functions by the DAT handle independently from Provider is used DAT registry provides redirection dat_ia_open is Provider specific and sets up redirection table per address space per Provider first time open ensures that table redirection for a Provider is set up OpenIB kDAT and uDAT registry provide that OpenIB kDAT registry preserves the DAT redirection table as defined by DAT Collaborative OpenIB kDAT registry preserves DAT_provider structure need to file errata to DAT to move dat_ia_close after dat_ia_query to match DAPL SF RI and OpenIB one for kDAPL and uDAPL The DAT_handle structure first field provides a pointer for redirection OpenIB kDAT and uDAT registry support this DAT registry
RE: [openib-general] IB Address Translation service
Some historical perspective - ATS was defined prior to IPoIB. The requirements. DAT has two needs: 1. forward translation: given an IP address returns back IB GID/LID. 2. reverse translation: given IB GID/LID returns back an IP address of the requestor. ULPs: NFS, DAFS. SDP encoded IP addresses into its headers. But DAT is API and cannot define a protocol for it. Abstract address translation is a good idea. For IB we can use ATS or IPoIB. For iWARP it will be no-op. We must ensure that the DAPL that we submit to Linux can be layered on top of all RDMA transports. Since IPoIB had not had plugfest/connectathon or some other interop that demonstrate ARP and RARP I suggest we have both ATS and IPoIB support. ATS has been fully successfully tested at DAPL Plugfest. In DAPL we had not assessed the HA requirements implications on address translations which is currently under discussion. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message- From: Tom Duffy [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 01, 2005 6:02 PM To: Yaron Haviv Cc: openib-general@openib.org Subject: RE: [openib-general] IB Address Translation service [ putting back on list ] On Wed, 2005-03-02 at 00:29 +0200, Yaron Haviv wrote: Did you try RARP with IPoIB ? I have not. I thought that there is some issue that it doesn't work Currently, the rarpd only works with ethernet, but I don't see why this couldn't be fixed. Also I hope you can comment on the other ib_at capabilities which are more important than ATS I don't mind the idea of abstracting out address translation. I think maybe this is a premature optimization and we should see how each ULP uses/does it first, then abstract out common code. Otherwise, I feel neither strongly for or against your proposal. -tduffy ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: FW: [openib-general] Minutes from DAPL BOF at OpenIB Workshop
woody, If Open IB wants to go with LGPL license it is fine with us. We will need to take a voit on DAT Collaborative onit also. But from what I see on reflector LGPL license is outside the current bylaws of Open IB and there is a discussion on it going on. SO until this issue is resolved on Open IB we will add GPL license to uDAPL (and kDAPL) and will work on getting it ready for submission to Open IB. Once the license the GPL license is approved by DAT we can move the dev work on Open IB SF. If Arlin is available he can start the work now. Lets first identify what areas require changes. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance phone: 781-768-5395 375 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -Original Message- From: Woodruff, Robert J [mailto:[EMAIL PROTECTED] Sent: Monday, February 14, 2005 7:48 PM To: Kanevsky, Arkady Cc: openib-general@openib.org; Davis, Arlin R; Matt Leininger Subject: RE: FW: [openib-general] Minutes from DAPL BOF at OpenIB Workshop Hi Arkady, As I mentioned in the BOF, I have a person (Arlin Davis) that can help with developing a uDAPL provider for the openib.org verbs. After discussing it more with folks here, is seems to us that perhaps for the uDAPL user-mode library, it be provided to openib.org under a dual BSD + LGPL library rather than a BSD + GPL since people normally want to use LGPL for libraries. Also, I think that we can go ahead and start porting using the BSD license available from sourceforge today. Once the port is complete, we can submit it to openib.org under the dual license and submit any changes back to the main sourceforge project under the BSD license, or you can simply accept the changes under the BSD license, thus for uDAPL there is no need to wait for the expanded licensing terms of the sourceforge project. Once the user-mode verbs, user-mode CM, SA support is available from openib.org, we can get started. If this is OK with folks, I'll have Arlin start to take a look at this. Sound OK ? woody ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general