IBTF IO Memory [PSARC/2008/630 FastTrack timeout 10/17/2008]

2008-10-22 Thread Ted Kim
Okay we are past the timeout, plus the participants
have now finished the discussion.

I am going to close the case as approved.

-ted

Ted H. Kim
Sun Microsystems, Inc.   ted.kim at sun.com
222 North Sepulveda Blvd., 10th Floor   (310) 341-1116
El Segundo, CA  90245  (310) 341-1120 FAX



IBTF IO Memory [PSARC/2008/630 FastTrack timeout 10/17/2008]

2008-10-21 Thread Ted H. Kim
Final version of the case -- including a summary of
the last parts of the discussion, about motivation,
versioning and scope.

-ted

Diffs:
11c11,14
 such as memory DR.
---
  such as memory DR. In fact, one of the primary motivations of the work
  is to enable fixing various IB bugs on large systems, where we need to
  be careful about being in/out of the cage and the possible impact to
  memory DR.
37c40,44
 the TI and CI as noted below.
---
  the TI and CI as noted below. Each ABI version is incompatible with
  each other and so this change will require a recompile of kernel ULPs,
  framework and HCA drivers. All IB kernel components are delivered by
  Sun engineering and are delivered together in our releases, so this
  type of ABI change is not noticed by our customers.
38a46,49
  The scope of the project includes changes at the framework and HCA
  driver level, which will be delivered together. The ULPs using these
  new features, however, will be phased according to customer based
  priorities.



A. Background

The DDI distinguishes between different types of memory. Memory from
ddi_dma_mem_alloc(9F) is usable for DMA and takes into account various
factors such alignment and other device attributes. Memory from
kmem_(z)alloc is not guaranteed to be usable for DMA, though most of
the time it does work for that purpose, because of the capability of
modern platforms. Nevertheless, there is value in maintaining these
DDI distinctions, especially when considering certain platform issues
such as memory DR. In fact, one of the primary motivations of the work
is to enable fixing various IB bugs on large systems, where we need to
be careful about being in/out of the cage and the possible impact to
memory DR.

In the context of InfiniBand, registered memory is the target of DMA
operations. This case introduces new InfiniBand related interfaces
analogous to the ddi_dma_mem_alloc family of functions to IBTF
(InfiniBand Transport Framework, PSARC/2002/132 and follow-on
cases). This addition will help the InfiniBand stack maintain the
proper DDI memory distinctions important for certain types of
platforms.


B. Proposal

The proposal is to make additions to the IBTF Channel and Transport
interfaces. The functionality added to the Transport Interface (TI) is
used by the ULPs to allocate memory suitable for DMA and IB memory
registration. In turn, the framework uses new entry points in the
Channel Interface (CI) to request memory allocation from the
underlying HCA driver. These interfaces are basically a wrapper for
DDI functions which on the one hand abstract away HCA device specific
details at the ULP level, but at the same time allow for the HCA
driver to adjust the memory attributes (alignment, etc.) as necessary
for efficiency.

These additions include an IBTF ABI change, so this case also marks an
internal flag day, incrementing our interface version numbers for both
the TI and CI as noted below. Each ABI version is incompatible with
each other and so this change will require a recompile of kernel ULPs,
framework and HCA drivers. All IB kernel components are delivered by
Sun engineering and are delivered together in our releases, so this
type of ABI change is not noticed by our customers.

The scope of the project includes changes at the framework and HCA
driver level, which will be delivered together. The ULPs using these
new features, however, will be phased according to customer based
priorities.

All interface additions and changes in this proposal have a
micro/patch binding.

Transport Interface (ON Consolidation Private):

   ibt_alloc_io_mem() - Allocates DMA memory (at the transport level)
   ibt_free_io_mem() -  Deallocates DMA memory
   IBTI_V3 - TI version change

Channel Interface (ON Consolidation Private):

   ibc_alloc_io_mem() - Allocates DMA memory (at the HCA driver level)
   ibc_free_io_mem() - Deallocates DMA memory
   IBCI_V3 - CI version change


C. Summary of Changes by man page

See materials directory for copies of man pages. Modified man pages
have change bars in the left margin.

   ibci.9 - modfied (new CI entry points added)

   ibc_alloc_io_mem.9e - new (alloc  free CI entry points)

   ibt_alloc_io_mem.9f - new (alloc  free TI functions)

   ibt_clnt_modinfo_t.9s - modified (IBTI_V3 version)
   ibc_hca_info_t.9s - modified (IBCI_V3 version)
   ibc_operations_t.9s - modified (new CI entry points added)

-- 
Ted H. Kim
Sun Microsystems, Inc.  ted.kim at sun.com
222 North Sepulveda Blvd., 10th Floor   (310) 341-1116
El Segundo, CA  90245   (310) 341-1120 FAX



IBTF IO Memory [PSARC/2008/630 FastTrack timeout 10/17/2008]

2008-10-20 Thread Kais Belgaied
On 10/15/08 12:07, Ted H. Kim wrote:
 Kais,


 Kais Belgaied wrote:
 - Case boundary question: since this marks a flag day for both TI and 
 CI, can you list the components
  that are affected by this flag day?

 Most of the IB modules in ON -
 framework: IBTL
 IB ULPs (TI): IPonIB, SDP, NFS/RDMA, uDAPL
 HCA Drivers (CI): Tavor, Hermon

at the risk of re-stating the obvious,  all changes in the above 3 sets 
of components are in-scope of this case,
right?



 - I am not clear on the consumer side of this new  interface: What 
 prompts a ULP to start using this interface?
  Is it expected to attempt ibt_alloc_io_mem() until it  exhausts all 
 resources?
  It would be easier to assess the completeness and the usefulness of 
 the TI if you either extended the case's scope
  to include at least the changes on one transport consumer or gave a 
 real example thereof.

 We are in the process of fixing bugs in certain IB ULPs to
 be good citizens in big SPARC platforms with memory DR where
 we have to be careful about what is in/out of the cage.
 The current plan for ULP usage (i.e. TI usage) is related
 to this motivation.


 - mi_ibt_version seems to be an enumeration of apparently mutually 
 exclusive values  IBTI_V{1,2,3}
  yet the definition suggests a combination of independent (discrete) 
 capabilities
  (FMR support, DMA wrapper support, etc.)

 The features are examples of what was included at each version change.
 More discussion of the relationship between ABI and features below ...


 . Is there any consumer of this interface that uses DMA wapper 
 but not FMR?

 Well to be honest FMR in the current form turned out to be a failure.
 So no one uses FMR in ON right now.

 But I think more generally what is going to happen is that
 the features will be used independently of each other,
 since they are generally not related to each other.


   . For future evolution, is the mi_ibt_version always intended to 
 express a monotonically increasing set
of capabilities (capabs of V(n+1) includes all capabs of N(n)) ?

 Yes, that is the intent, but it is not guaranteed. However,
 as you might imagine, it would involve a great deal of
 discussion/agreement to remove anything and the ARC would be
 in the loop.


   Basically I'm trying to see if information of different nature  if 
 being encoded in the same field. Without slipping
  in a design discussion you should consider if two fileds are more 
 appropriate: 1 version (number or enum) and one capabs (bitmask).

 There are in fact capability bitmasks elsewhere.
 In IB there are a number of optional features. So the bitmasks
 generally are for saying you have these optional features.

 But the version number is more an ABI thing, and it is mistake
 to conflate the too, though the reason we have to change ABI
 is that new features demand more fields in the structs, etc.


so are you fixing the mistake of conflating the capabs + version ? I 
still see the updated material unchanged
on that.





Kais.



IBTF IO Memory [PSARC/2008/630 FastTrack timeout 10/17/2008]

2008-10-20 Thread Ted H. Kim


Kais Belgaied wrote:
 - Case boundary question: since this marks a flag day for both TI and 
 CI, can you list the components
  that are affected by this flag day?

 Most of the IB modules in ON -
 framework: IBTL
 IB ULPs (TI): IPonIB, SDP, NFS/RDMA, uDAPL
 HCA Drivers (CI): Tavor, Hermon
 
 at the risk of re-stating the obvious,  all changes in the above 3 sets 
 of components are in-scope of this case,
 right?

Yes

 - mi_ibt_version seems to be an enumeration of apparently mutually 
 exclusive values  IBTI_V{1,2,3}
  yet the definition suggests a combination of independent (discrete) 
 capabilities
  (FMR support, DMA wrapper support, etc.)

 The features are examples of what was included at each version change.
 More discussion of the relationship between ABI and features below ...

 . Is there any consumer of this interface that uses DMA wapper 
 but not FMR?

 Well to be honest FMR in the current form turned out to be a failure.
 So no one uses FMR in ON right now.

 But I think more generally what is going to happen is that
 the features will be used independently of each other,
 since they are generally not related to each other.


   . For future evolution, is the mi_ibt_version always intended to 
 express a monotonically increasing set
of capabilities (capabs of V(n+1) includes all capabs of N(n)) ?

 Yes, that is the intent, but it is not guaranteed. However,
 as you might imagine, it would involve a great deal of
 discussion/agreement to remove anything and the ARC would be
 in the loop.


   Basically I'm trying to see if information of different nature  if 
 being encoded in the same field. Without slipping
  in a design discussion you should consider if two fileds are more 
 appropriate: 1 version (number or enum) and one capabs (bitmask).

 There are in fact capability bitmasks elsewhere.
 In IB there are a number of optional features. So the bitmasks
 generally are for saying you have these optional features.

 But the version number is more an ABI thing, and it is mistake
 to conflate the too, though the reason we have to change ABI
 is that new features demand more fields in the structs, etc.
 
 so are you fixing the mistake of conflating the capabs + version ? I 
 still see the updated material unchanged
 on that.

I think it's a mistake in *understanding* to not distinguish API
from ABI. I am trying to clarify that.
I don't think it is a mistake in the design.

I am also not sure where this is leading.
Are you suggesting some specific change to the case?

-ted

-- 
Ted H. Kim
Sun Microsystems, Inc.  ted.kim at sun.com
222 North Sepulveda Blvd., 10th Floor   (310) 341-1116
El Segundo, CA  90245   (310) 341-1120 FAX



IBTF IO Memory [PSARC/2008/630 FastTrack timeout 10/17/2008]

2008-10-20 Thread Ted H. Kim


Ted H. Kim wrote:
 
 
 Kais Belgaied wrote:
 - Case boundary question: since this marks a flag day for both TI 
 and CI, can you list the components
  that are affected by this flag day?

 Most of the IB modules in ON -
 framework: IBTL
 IB ULPs (TI): IPonIB, SDP, NFS/RDMA, uDAPL
 HCA Drivers (CI): Tavor, Hermon

 at the risk of re-stating the obvious,  all changes in the above 3 
 sets of components are in-scope of this case,
 right?
 
 Yes

Actually, I think I made a mistake to say this.
It's true to the extent that the three sets of components
will use the interfaces here.

But I think saying an unqualified yes means we have to deliver
all modified components at once. And so I am going
to say the right answer is YES to only the framework
and HCA drivers.

This allows us to deliver the changes to the framework
and HCA drivers first. And then phase the delivery
of the ULP mods. That way if there are customer priorities,
we can deliver the ones they want first
without having to wait to have all of the ULPs modified.

-ted

-- 
Ted H. Kim
Sun Microsystems, Inc.  ted.kim at sun.com
222 North Sepulveda Blvd., 10th Floor   (310) 341-1116
El Segundo, CA  90245   (310) 341-1120 FAX



IBTF IO Memory [PSARC/2008/630 FastTrack timeout 10/17/2008]

2008-10-20 Thread Kais Belgaied
On 10/20/08 15:01, Ted H. Kim wrote:

 I am also not sure where this is leading.
 Are you suggesting some specific change to the case?

I'm not clear on the future compatibility expectations around the 
interface introduced by this case:

 I asked whether versions are incremental all the time and you answered yes
((capabs of V(n+1) includes all capabs of V(n))
I asked if some capabs can be used independently, you also said  yes, 
which suggests
(capabs of V(n+1)) don't necessarily have to include all capabs of V(n). 
Capabs are independent.

The former means that an IBT client module written to V(n) is guaranteed 
to work unmodified on a framework+HCAs
that evolved to V(n+1) or later.

The latter means modules may break or may continue to work. No backward 
compatibility is guaranteed.

Choose one semantic for the interface and clearly document it in the case.

Kais.


 -ted





IBTF IO Memory [PSARC/2008/630 FastTrack timeout 10/17/2008]

2008-10-20 Thread Ted H. Kim
We will modify the case to say:

IBTI_V3 is incompatible with other IBTI versions.
IBCI_V3 is incompatible with other IBCI versions.

When IBTI changes, you have to recompile
all the in-kernel ULPs and the IBTF framework.

When IBCI changes, you have to recompile
the IBTF framework and all the HCA drivers.

-ted



Kais Belgaied wrote:
 On 10/20/08 15:01, Ted H. Kim wrote:

 I am also not sure where this is leading.
 Are you suggesting some specific change to the case?
 
 I'm not clear on the future compatibility expectations around the 
 interface introduced by this case:
 
 I asked whether versions are incremental all the time and you answered yes
 ((capabs of V(n+1) includes all capabs of V(n))
 I asked if some capabs can be used independently, you also said  yes, 
 which suggests
 (capabs of V(n+1)) don't necessarily have to include all capabs of V(n). 
 Capabs are independent.
 
 The former means that an IBT client module written to V(n) is guaranteed 
 to work unmodified on a framework+HCAs
 that evolved to V(n+1) or later.
 
 The latter means modules may break or may continue to work. No backward 
 compatibility is guaranteed.
 
 Choose one semantic for the interface and clearly document it in the case.
 
Kais.


-- 
Ted H. Kim
Sun Microsystems, Inc.  ted.kim at sun.com
222 North Sepulveda Blvd., 10th Floor   (310) 341-1116
El Segundo, CA  90245   (310) 341-1120 FAX



IBTF IO Memory [PSARC/2008/630 FastTrack timeout 10/17/2008]

2008-10-15 Thread Rick Matthews
Minor nit:
materials/ibc_operations_t.9s

Add ibc_alloc_io_mem(9E) and ibt_alloc_io_mem(9F) to the See also:

+1
-- 

-
Rick Matthews   email: Rick.Matthews at sun.com
Sun Microsystems, Inc.  phone:+1(651) 554-1518
1270 Eagan Industrial Road  phone(internal): 54418
Suite 160   fax:  +1(651) 554-1540
Eagan, MN 55121-1231 USAmain: +1(651) 554-1500  
-




IBTF IO Memory [PSARC/2008/630 FastTrack timeout 10/17/2008]

2008-10-15 Thread Kais Belgaied
- Case boundary question: since this marks a flag day for both TI and 
CI, can you list the components
  that are affected by this flag day?

- I am not clear on the consumer side of this new  interface: What 
prompts a ULP to start using this interface?
  Is it expected to attempt ibt_alloc_io_mem() until it  exhausts all 
resources?
  It would be easier to assess the completeness and the usefulness of 
the TI if you either extended the case's scope
  to include at least the changes on one transport consumer or gave a 
real example thereof.
 
- mi_ibt_version seems to be an enumeration of apparently mutually 
exclusive values  IBTI_V{1,2,3}
  yet the definition suggests a combination of independent (discrete) 
capabilities
  (FMR support, DMA wrapper support, etc.)
 . Is there any consumer of this interface that uses DMA wapper but 
not FMR?
   . For future evolution, is the mi_ibt_version always intended to 
express a monotonically increasing set
of capabilities (capabs of V(n+1) includes all capabs of N(n)) ?
   Basically I'm trying to see if information of different nature  if 
being encoded in the same field. Without slipping
  in a design discussion you should consider if two fileds are more 
appropriate: 1 version (number or enum) and one capabs (bitmask).

- Under what condition can the caller of 
ibt_alloc_io_mem()/ibt_free_io_mem()  expect the following error
  to be returned?
 59 IBT_MR_ACCESS_REQ_INVALID   Invalid Access Control Specified.
 60 Remote Write or Remote Atomic access is
 61 requested without specifying Local 
Write.

 - in ibc_alloc_io_mem.9e
 these two sections are in conflict:
  11 ibt_status_t prefix_ibc_alloc_io_mem(ibc_hca_hdl_t hca_hdl,
  12 size_t size, ibt_mr_flags_t mr_flag, caddr_t *kaddrp,
  13 ibc_mem_alloc_hdl_t *mem_alloc_hdl);
and
  23 hca_hdl   IBTF channel Interface (TI) HCA Handle previously 
obtained
  24   by calling ibt_open_hca(9F).
  25

Kais.

On 10/10/08 11:04, Ted Kim wrote:
 Template Version: @(#)sac_nextcase %I% %G% SMI
 This information is Copyright 2008 Sun Microsystems
 1. Introduction
 1.1. Project/Component Working Name:
IBTF IO Memory
 1.2. Name of Document Author/Supplier:
Author:  Lida HornI
 1.3  Date of This Document:
   10 October, 2008
 4. Technical Description

 A. Background

 The DDI distinguishes between different types of memory. Memory from
 ddi_dma_mem_alloc(9F) is usable for DMA and takes into account various
 factors such alignment and other device attributes. Memory from
 kmem_(z)alloc is not guaranteed to be usable for DMA, though most of
 the time it does work for that purpose, because of the capability of
 modern platforms. Nevertheless, there is value in maintaining these
 DDI distinctions, especially when considering certain platform issues
 such as memory DR.

 In the context of InfiniBand, registered memory is the target of DMA
 operations. This case introduces new InfiniBand related interfaces
 analogous to the ddi_dma_mem_alloc family of functions to IBTF
 (InfiniBand Transport Framework, PSARC/2002/132 and follow-on
 cases). This addition will help the InfiniBand stack maintain the
 proper DDI memory distinctions important for certain types of
 platforms.


 B. Proposal

 The proposal is to make additions to the IBTF Channel and Transport
 interfaces. The functionality added to the Transport Interface (TI) is
 used by the ULPs to allocate memory suitable for DMA and IB memory
 registration. In turn, the framework uses new entry points in the
 Channel Interface (CI) to request memory allocation from the
 underlying HCA driver. These interfaces are basically a wrapper for
 DDI functions which on the one hand abstract away HCA device specific
 details at the ULP level, but at the same time allow for the HCA
 driver to adjust the memory attributes (alignment, etc.) as necessary
 for efficiency.

 These additions include an IBTF ABI change, so this case also marks an
 internal flag day, incrementing our interface version numbers for both
 the TI and CI as noted below.


 All interface additions and changes in this proposal have a
 micro/patch binding.

 Transport Interface (ON Consolidation Private):

   ibt_alloc_io_mem() - Allocates DMA memory (at the transport level)
   ibt_free_io_mem() -  Deallocates DMA memory 
   IBTI_V3 - TI version change
  
 Channel Interface (ON Consolidation Private):

   ibc_alloc_io_mem() - Allocates DMA memory (at the HCA driver level)
   ibc_free_io_mem() - Deallocates DMA memory 
   IBCI_V3 - CI version change


 C. Summary of Changes by man page 

 See materials directory for copies of man pages. Modified man pages
 have change bars in the left margin.

   ibci.9 - modfied (new CI entry points added)

   ibc_alloc_io_mem.9e - new (alloc  free CI entry points)

   ibt_alloc_io_mem.9f - new (alloc  free TI functions)

   

IBTF IO Memory [PSARC/2008/630 FastTrack timeout 10/17/2008]

2008-10-15 Thread Ted H. Kim
Okay, that's a trivial edit, we will do it no problem.

Rick Matthews wrote:
 Minor nit:
materials/ibc_operations_t.9s
 
Add ibc_alloc_io_mem(9E) and ibt_alloc_io_mem(9F) to the See also:
 
 +1

-- 
Ted H. Kim
Sun Microsystems, Inc.  ted.kim at sun.com
222 North Sepulveda Blvd., 10th Floor   (310) 341-1116
El Segundo, CA  90245   (310) 341-1120 FAX



IBTF IO Memory [PSARC/2008/630 FastTrack timeout 10/17/2008]

2008-10-15 Thread Ted H. Kim
Folks,

We fixed up the man pages in the materials directory,
so there are new versions of

1. ibt_alloc_io_mem.9f - remove the error about incompatible
  permissions

2. ibc_alloc_io_mem.9e - fix HCA handle description, plus
  the same fix as #1 here too

3. ibci.9 - add the see also ref (but only the ibc ref
  as this is describing CI things)

I think this takes care of edits brought up so far.

-ted


-- 
Ted H. Kim
Sun Microsystems, Inc.  ted.kim at sun.com
222 North Sepulveda Blvd., 10th Floor   (310) 341-1116
El Segundo, CA  90245   (310) 341-1120 FAX



IBTF IO Memory [PSARC/2008/630 FastTrack timeout 10/17/2008]

2008-10-15 Thread Ted H. Kim


Ted H. Kim wrote:
 Folks,
 
 We fixed up the man pages in the materials directory,
 so there are new versions of
 
 1. ibt_alloc_io_mem.9f - remove the error about incompatible
  permissions
 
 2. ibc_alloc_io_mem.9e - fix HCA handle description, plus
  the same fix as #1 here too
 
 3. ibci.9 - add the see also ref (but only the ibc ref
  as this is describing CI things)

Sorry I mean ibc_operation_t.9s for this last item.


 
 I think this takes care of edits brought up so far.
 
 -ted
 
 

-- 
Ted H. Kim
Sun Microsystems, Inc.  ted.kim at sun.com
222 North Sepulveda Blvd., 10th Floor   (310) 341-1116
El Segundo, CA  90245   (310) 341-1120 FAX