IBTF IO Memory [PSARC/2008/630 FastTrack timeout 10/17/2008]
Okay we are past the timeout, plus the participants have now finished the discussion. I am going to close the case as approved. -ted Ted H. Kim Sun Microsystems, Inc. ted.kim at sun.com 222 North Sepulveda Blvd., 10th Floor (310) 341-1116 El Segundo, CA 90245 (310) 341-1120 FAX
IBTF IO Memory [PSARC/2008/630 FastTrack timeout 10/17/2008]
Final version of the case -- including a summary of the last parts of the discussion, about motivation, versioning and scope. -ted Diffs: 11c11,14 such as memory DR. --- such as memory DR. In fact, one of the primary motivations of the work is to enable fixing various IB bugs on large systems, where we need to be careful about being in/out of the cage and the possible impact to memory DR. 37c40,44 the TI and CI as noted below. --- the TI and CI as noted below. Each ABI version is incompatible with each other and so this change will require a recompile of kernel ULPs, framework and HCA drivers. All IB kernel components are delivered by Sun engineering and are delivered together in our releases, so this type of ABI change is not noticed by our customers. 38a46,49 The scope of the project includes changes at the framework and HCA driver level, which will be delivered together. The ULPs using these new features, however, will be phased according to customer based priorities. A. Background The DDI distinguishes between different types of memory. Memory from ddi_dma_mem_alloc(9F) is usable for DMA and takes into account various factors such alignment and other device attributes. Memory from kmem_(z)alloc is not guaranteed to be usable for DMA, though most of the time it does work for that purpose, because of the capability of modern platforms. Nevertheless, there is value in maintaining these DDI distinctions, especially when considering certain platform issues such as memory DR. In fact, one of the primary motivations of the work is to enable fixing various IB bugs on large systems, where we need to be careful about being in/out of the cage and the possible impact to memory DR. In the context of InfiniBand, registered memory is the target of DMA operations. This case introduces new InfiniBand related interfaces analogous to the ddi_dma_mem_alloc family of functions to IBTF (InfiniBand Transport Framework, PSARC/2002/132 and follow-on cases). This addition will help the InfiniBand stack maintain the proper DDI memory distinctions important for certain types of platforms. B. Proposal The proposal is to make additions to the IBTF Channel and Transport interfaces. The functionality added to the Transport Interface (TI) is used by the ULPs to allocate memory suitable for DMA and IB memory registration. In turn, the framework uses new entry points in the Channel Interface (CI) to request memory allocation from the underlying HCA driver. These interfaces are basically a wrapper for DDI functions which on the one hand abstract away HCA device specific details at the ULP level, but at the same time allow for the HCA driver to adjust the memory attributes (alignment, etc.) as necessary for efficiency. These additions include an IBTF ABI change, so this case also marks an internal flag day, incrementing our interface version numbers for both the TI and CI as noted below. Each ABI version is incompatible with each other and so this change will require a recompile of kernel ULPs, framework and HCA drivers. All IB kernel components are delivered by Sun engineering and are delivered together in our releases, so this type of ABI change is not noticed by our customers. The scope of the project includes changes at the framework and HCA driver level, which will be delivered together. The ULPs using these new features, however, will be phased according to customer based priorities. All interface additions and changes in this proposal have a micro/patch binding. Transport Interface (ON Consolidation Private): ibt_alloc_io_mem() - Allocates DMA memory (at the transport level) ibt_free_io_mem() - Deallocates DMA memory IBTI_V3 - TI version change Channel Interface (ON Consolidation Private): ibc_alloc_io_mem() - Allocates DMA memory (at the HCA driver level) ibc_free_io_mem() - Deallocates DMA memory IBCI_V3 - CI version change C. Summary of Changes by man page See materials directory for copies of man pages. Modified man pages have change bars in the left margin. ibci.9 - modfied (new CI entry points added) ibc_alloc_io_mem.9e - new (alloc free CI entry points) ibt_alloc_io_mem.9f - new (alloc free TI functions) ibt_clnt_modinfo_t.9s - modified (IBTI_V3 version) ibc_hca_info_t.9s - modified (IBCI_V3 version) ibc_operations_t.9s - modified (new CI entry points added) -- Ted H. Kim Sun Microsystems, Inc. ted.kim at sun.com 222 North Sepulveda Blvd., 10th Floor (310) 341-1116 El Segundo, CA 90245 (310) 341-1120 FAX
IBTF IO Memory [PSARC/2008/630 FastTrack timeout 10/17/2008]
On 10/15/08 12:07, Ted H. Kim wrote: Kais, Kais Belgaied wrote: - Case boundary question: since this marks a flag day for both TI and CI, can you list the components that are affected by this flag day? Most of the IB modules in ON - framework: IBTL IB ULPs (TI): IPonIB, SDP, NFS/RDMA, uDAPL HCA Drivers (CI): Tavor, Hermon at the risk of re-stating the obvious, all changes in the above 3 sets of components are in-scope of this case, right? - I am not clear on the consumer side of this new interface: What prompts a ULP to start using this interface? Is it expected to attempt ibt_alloc_io_mem() until it exhausts all resources? It would be easier to assess the completeness and the usefulness of the TI if you either extended the case's scope to include at least the changes on one transport consumer or gave a real example thereof. We are in the process of fixing bugs in certain IB ULPs to be good citizens in big SPARC platforms with memory DR where we have to be careful about what is in/out of the cage. The current plan for ULP usage (i.e. TI usage) is related to this motivation. - mi_ibt_version seems to be an enumeration of apparently mutually exclusive values IBTI_V{1,2,3} yet the definition suggests a combination of independent (discrete) capabilities (FMR support, DMA wrapper support, etc.) The features are examples of what was included at each version change. More discussion of the relationship between ABI and features below ... . Is there any consumer of this interface that uses DMA wapper but not FMR? Well to be honest FMR in the current form turned out to be a failure. So no one uses FMR in ON right now. But I think more generally what is going to happen is that the features will be used independently of each other, since they are generally not related to each other. . For future evolution, is the mi_ibt_version always intended to express a monotonically increasing set of capabilities (capabs of V(n+1) includes all capabs of N(n)) ? Yes, that is the intent, but it is not guaranteed. However, as you might imagine, it would involve a great deal of discussion/agreement to remove anything and the ARC would be in the loop. Basically I'm trying to see if information of different nature if being encoded in the same field. Without slipping in a design discussion you should consider if two fileds are more appropriate: 1 version (number or enum) and one capabs (bitmask). There are in fact capability bitmasks elsewhere. In IB there are a number of optional features. So the bitmasks generally are for saying you have these optional features. But the version number is more an ABI thing, and it is mistake to conflate the too, though the reason we have to change ABI is that new features demand more fields in the structs, etc. so are you fixing the mistake of conflating the capabs + version ? I still see the updated material unchanged on that. Kais.
IBTF IO Memory [PSARC/2008/630 FastTrack timeout 10/17/2008]
Kais Belgaied wrote: - Case boundary question: since this marks a flag day for both TI and CI, can you list the components that are affected by this flag day? Most of the IB modules in ON - framework: IBTL IB ULPs (TI): IPonIB, SDP, NFS/RDMA, uDAPL HCA Drivers (CI): Tavor, Hermon at the risk of re-stating the obvious, all changes in the above 3 sets of components are in-scope of this case, right? Yes - mi_ibt_version seems to be an enumeration of apparently mutually exclusive values IBTI_V{1,2,3} yet the definition suggests a combination of independent (discrete) capabilities (FMR support, DMA wrapper support, etc.) The features are examples of what was included at each version change. More discussion of the relationship between ABI and features below ... . Is there any consumer of this interface that uses DMA wapper but not FMR? Well to be honest FMR in the current form turned out to be a failure. So no one uses FMR in ON right now. But I think more generally what is going to happen is that the features will be used independently of each other, since they are generally not related to each other. . For future evolution, is the mi_ibt_version always intended to express a monotonically increasing set of capabilities (capabs of V(n+1) includes all capabs of N(n)) ? Yes, that is the intent, but it is not guaranteed. However, as you might imagine, it would involve a great deal of discussion/agreement to remove anything and the ARC would be in the loop. Basically I'm trying to see if information of different nature if being encoded in the same field. Without slipping in a design discussion you should consider if two fileds are more appropriate: 1 version (number or enum) and one capabs (bitmask). There are in fact capability bitmasks elsewhere. In IB there are a number of optional features. So the bitmasks generally are for saying you have these optional features. But the version number is more an ABI thing, and it is mistake to conflate the too, though the reason we have to change ABI is that new features demand more fields in the structs, etc. so are you fixing the mistake of conflating the capabs + version ? I still see the updated material unchanged on that. I think it's a mistake in *understanding* to not distinguish API from ABI. I am trying to clarify that. I don't think it is a mistake in the design. I am also not sure where this is leading. Are you suggesting some specific change to the case? -ted -- Ted H. Kim Sun Microsystems, Inc. ted.kim at sun.com 222 North Sepulveda Blvd., 10th Floor (310) 341-1116 El Segundo, CA 90245 (310) 341-1120 FAX
IBTF IO Memory [PSARC/2008/630 FastTrack timeout 10/17/2008]
Ted H. Kim wrote: Kais Belgaied wrote: - Case boundary question: since this marks a flag day for both TI and CI, can you list the components that are affected by this flag day? Most of the IB modules in ON - framework: IBTL IB ULPs (TI): IPonIB, SDP, NFS/RDMA, uDAPL HCA Drivers (CI): Tavor, Hermon at the risk of re-stating the obvious, all changes in the above 3 sets of components are in-scope of this case, right? Yes Actually, I think I made a mistake to say this. It's true to the extent that the three sets of components will use the interfaces here. But I think saying an unqualified yes means we have to deliver all modified components at once. And so I am going to say the right answer is YES to only the framework and HCA drivers. This allows us to deliver the changes to the framework and HCA drivers first. And then phase the delivery of the ULP mods. That way if there are customer priorities, we can deliver the ones they want first without having to wait to have all of the ULPs modified. -ted -- Ted H. Kim Sun Microsystems, Inc. ted.kim at sun.com 222 North Sepulveda Blvd., 10th Floor (310) 341-1116 El Segundo, CA 90245 (310) 341-1120 FAX
IBTF IO Memory [PSARC/2008/630 FastTrack timeout 10/17/2008]
On 10/20/08 15:01, Ted H. Kim wrote: I am also not sure where this is leading. Are you suggesting some specific change to the case? I'm not clear on the future compatibility expectations around the interface introduced by this case: I asked whether versions are incremental all the time and you answered yes ((capabs of V(n+1) includes all capabs of V(n)) I asked if some capabs can be used independently, you also said yes, which suggests (capabs of V(n+1)) don't necessarily have to include all capabs of V(n). Capabs are independent. The former means that an IBT client module written to V(n) is guaranteed to work unmodified on a framework+HCAs that evolved to V(n+1) or later. The latter means modules may break or may continue to work. No backward compatibility is guaranteed. Choose one semantic for the interface and clearly document it in the case. Kais. -ted
IBTF IO Memory [PSARC/2008/630 FastTrack timeout 10/17/2008]
We will modify the case to say: IBTI_V3 is incompatible with other IBTI versions. IBCI_V3 is incompatible with other IBCI versions. When IBTI changes, you have to recompile all the in-kernel ULPs and the IBTF framework. When IBCI changes, you have to recompile the IBTF framework and all the HCA drivers. -ted Kais Belgaied wrote: On 10/20/08 15:01, Ted H. Kim wrote: I am also not sure where this is leading. Are you suggesting some specific change to the case? I'm not clear on the future compatibility expectations around the interface introduced by this case: I asked whether versions are incremental all the time and you answered yes ((capabs of V(n+1) includes all capabs of V(n)) I asked if some capabs can be used independently, you also said yes, which suggests (capabs of V(n+1)) don't necessarily have to include all capabs of V(n). Capabs are independent. The former means that an IBT client module written to V(n) is guaranteed to work unmodified on a framework+HCAs that evolved to V(n+1) or later. The latter means modules may break or may continue to work. No backward compatibility is guaranteed. Choose one semantic for the interface and clearly document it in the case. Kais. -- Ted H. Kim Sun Microsystems, Inc. ted.kim at sun.com 222 North Sepulveda Blvd., 10th Floor (310) 341-1116 El Segundo, CA 90245 (310) 341-1120 FAX
IBTF IO Memory [PSARC/2008/630 FastTrack timeout 10/17/2008]
Minor nit: materials/ibc_operations_t.9s Add ibc_alloc_io_mem(9E) and ibt_alloc_io_mem(9F) to the See also: +1 -- - Rick Matthews email: Rick.Matthews at sun.com Sun Microsystems, Inc. phone:+1(651) 554-1518 1270 Eagan Industrial Road phone(internal): 54418 Suite 160 fax: +1(651) 554-1540 Eagan, MN 55121-1231 USAmain: +1(651) 554-1500 -
IBTF IO Memory [PSARC/2008/630 FastTrack timeout 10/17/2008]
- Case boundary question: since this marks a flag day for both TI and CI, can you list the components that are affected by this flag day? - I am not clear on the consumer side of this new interface: What prompts a ULP to start using this interface? Is it expected to attempt ibt_alloc_io_mem() until it exhausts all resources? It would be easier to assess the completeness and the usefulness of the TI if you either extended the case's scope to include at least the changes on one transport consumer or gave a real example thereof. - mi_ibt_version seems to be an enumeration of apparently mutually exclusive values IBTI_V{1,2,3} yet the definition suggests a combination of independent (discrete) capabilities (FMR support, DMA wrapper support, etc.) . Is there any consumer of this interface that uses DMA wapper but not FMR? . For future evolution, is the mi_ibt_version always intended to express a monotonically increasing set of capabilities (capabs of V(n+1) includes all capabs of N(n)) ? Basically I'm trying to see if information of different nature if being encoded in the same field. Without slipping in a design discussion you should consider if two fileds are more appropriate: 1 version (number or enum) and one capabs (bitmask). - Under what condition can the caller of ibt_alloc_io_mem()/ibt_free_io_mem() expect the following error to be returned? 59 IBT_MR_ACCESS_REQ_INVALID Invalid Access Control Specified. 60 Remote Write or Remote Atomic access is 61 requested without specifying Local Write. - in ibc_alloc_io_mem.9e these two sections are in conflict: 11 ibt_status_t prefix_ibc_alloc_io_mem(ibc_hca_hdl_t hca_hdl, 12 size_t size, ibt_mr_flags_t mr_flag, caddr_t *kaddrp, 13 ibc_mem_alloc_hdl_t *mem_alloc_hdl); and 23 hca_hdl IBTF channel Interface (TI) HCA Handle previously obtained 24 by calling ibt_open_hca(9F). 25 Kais. On 10/10/08 11:04, Ted Kim wrote: Template Version: @(#)sac_nextcase %I% %G% SMI This information is Copyright 2008 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: IBTF IO Memory 1.2. Name of Document Author/Supplier: Author: Lida HornI 1.3 Date of This Document: 10 October, 2008 4. Technical Description A. Background The DDI distinguishes between different types of memory. Memory from ddi_dma_mem_alloc(9F) is usable for DMA and takes into account various factors such alignment and other device attributes. Memory from kmem_(z)alloc is not guaranteed to be usable for DMA, though most of the time it does work for that purpose, because of the capability of modern platforms. Nevertheless, there is value in maintaining these DDI distinctions, especially when considering certain platform issues such as memory DR. In the context of InfiniBand, registered memory is the target of DMA operations. This case introduces new InfiniBand related interfaces analogous to the ddi_dma_mem_alloc family of functions to IBTF (InfiniBand Transport Framework, PSARC/2002/132 and follow-on cases). This addition will help the InfiniBand stack maintain the proper DDI memory distinctions important for certain types of platforms. B. Proposal The proposal is to make additions to the IBTF Channel and Transport interfaces. The functionality added to the Transport Interface (TI) is used by the ULPs to allocate memory suitable for DMA and IB memory registration. In turn, the framework uses new entry points in the Channel Interface (CI) to request memory allocation from the underlying HCA driver. These interfaces are basically a wrapper for DDI functions which on the one hand abstract away HCA device specific details at the ULP level, but at the same time allow for the HCA driver to adjust the memory attributes (alignment, etc.) as necessary for efficiency. These additions include an IBTF ABI change, so this case also marks an internal flag day, incrementing our interface version numbers for both the TI and CI as noted below. All interface additions and changes in this proposal have a micro/patch binding. Transport Interface (ON Consolidation Private): ibt_alloc_io_mem() - Allocates DMA memory (at the transport level) ibt_free_io_mem() - Deallocates DMA memory IBTI_V3 - TI version change Channel Interface (ON Consolidation Private): ibc_alloc_io_mem() - Allocates DMA memory (at the HCA driver level) ibc_free_io_mem() - Deallocates DMA memory IBCI_V3 - CI version change C. Summary of Changes by man page See materials directory for copies of man pages. Modified man pages have change bars in the left margin. ibci.9 - modfied (new CI entry points added) ibc_alloc_io_mem.9e - new (alloc free CI entry points) ibt_alloc_io_mem.9f - new (alloc free TI functions)
IBTF IO Memory [PSARC/2008/630 FastTrack timeout 10/17/2008]
Okay, that's a trivial edit, we will do it no problem. Rick Matthews wrote: Minor nit: materials/ibc_operations_t.9s Add ibc_alloc_io_mem(9E) and ibt_alloc_io_mem(9F) to the See also: +1 -- Ted H. Kim Sun Microsystems, Inc. ted.kim at sun.com 222 North Sepulveda Blvd., 10th Floor (310) 341-1116 El Segundo, CA 90245 (310) 341-1120 FAX
IBTF IO Memory [PSARC/2008/630 FastTrack timeout 10/17/2008]
Folks, We fixed up the man pages in the materials directory, so there are new versions of 1. ibt_alloc_io_mem.9f - remove the error about incompatible permissions 2. ibc_alloc_io_mem.9e - fix HCA handle description, plus the same fix as #1 here too 3. ibci.9 - add the see also ref (but only the ibc ref as this is describing CI things) I think this takes care of edits brought up so far. -ted -- Ted H. Kim Sun Microsystems, Inc. ted.kim at sun.com 222 North Sepulveda Blvd., 10th Floor (310) 341-1116 El Segundo, CA 90245 (310) 341-1120 FAX
IBTF IO Memory [PSARC/2008/630 FastTrack timeout 10/17/2008]
Ted H. Kim wrote: Folks, We fixed up the man pages in the materials directory, so there are new versions of 1. ibt_alloc_io_mem.9f - remove the error about incompatible permissions 2. ibc_alloc_io_mem.9e - fix HCA handle description, plus the same fix as #1 here too 3. ibci.9 - add the see also ref (but only the ibc ref as this is describing CI things) Sorry I mean ibc_operation_t.9s for this last item. I think this takes care of edits brought up so far. -ted -- Ted H. Kim Sun Microsystems, Inc. ted.kim at sun.com 222 North Sepulveda Blvd., 10th Floor (310) 341-1116 El Segundo, CA 90245 (310) 341-1120 FAX