Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI
This information is Copyright 2009 Sun Microsystems
1. Introduction
    1.1. Project/Component Working Name:
         IPoIB Connected Mode
    1.2. Name of Document Author/Supplier:
         Author:  Kevin Ge
    1.3  Date of This Document:
        30 October, 2009
4. Technical Description

A. Overview
-----------

   This case proposes changes to the Solaris kernel to provide support
   for "Connected Mode" in the IPoIB driver ibd(7D) (described in [1]
   and [2]).

   The Infiniband Architecture [3] defines multiple "transport service
   types", including Unreliable Datagram (UD), Reliable Connected (RC)
   and Unreliable Connected (UC). Current ibd (based on [4]) runs in
   "Datagram Mode" over the UD transport service type. Connected Mode
   (described in [5]) can use either UC and/or RC.

   This IPoIB-CM project uses RC, because of the desire to
   inter-operate with Linux which also uses RC. The main advantage of
   Connected Mode is better performance (higher throughput and lower
   CPU utilization) based on using very large MTUs (see below for more
   discussion). Connected Mode, though, can have the disadvantage of
   consuming more resources, especially when scaling up to a large
   cluster (due to using an InfiniBand connection to each destination).

   Note that this case only covers all necessary changes to support
   IPoIB driver running in Connected Mode over RC. Other enhancements
   are outside the scope of this case.

   A micro/patch binding is asserted for this proposal.

B. Connected Mode IPoIB driver
------------------------------

   The revised ibd(7D) driver will support both Connected and Datagram
   mode. The features from the current Datagram mode ibd driver will
   be inherited. The remainder of this section discusses interface
   additions for the Connected mode capable driver.


B.1 Switching between datagram and connected mode

   The existing ibd driver in OpenSolaris and Solaris 10 does not
   ship with a driver .conf file. However, the Connected Mode support
   described in this case introduces a new parameter 'enable_rc' that
   may be set via the ibd driver .conf file.

   This parameter specifies whether each ibd instance defaults to
   using Connected Mode over RC or not.

       # 1: unicast packets will be sent over Reliable Connected Mode
       # 0: unicast packets will be sent over Unreliable Datagram Mode
       #
       # Each element in the list below maps to the corresponding ibd
       # instance; the first element is for ibd instance 0, the second
       # element is for instance 1 and so on.
       #
       enable_rc=1,1,0,0;

   Please note that Connected Mode support in IPoIB is optional as per
   [5]. Therefore, if Connected Mode is not available for a remote
   node, the Datagram mode will automatically be used for that
   destination by the ibd driver. Therefore, the only meaning of
   'enable_rc' is to decide whether to try Connected Mode first or
   not, and whether to advertise this as a capability supported by
   this instance or not.

   The default value for 'enable_rc' for each instance is 0. Hence
   without a ibd.conf file, Datagram mode will be used. We intend to
   ship a driver .conf file for ibd in ONNV (and hence OpenSolaris)
   with enable_rc set to all ones (enabling Connected Mode by
   default on all instances) for the best performance.

   However, for Solaris 10, we have received business guidance to have
   an "opt-in" approach due to a desire for greater stability in
   established enterprise environments. We will do this by not
   shipping the .conf file. Therefore, by default Solaris 10 will be
   Datagram mode. It will take an explicit administrator action
   (setting enable_rc) to cause Solaris 10 to use Connected Mode.
  
   OFED (Linux IB) originally made Connected Mode opt-in too. However,
   later OFED made it the default. We don't intend to change it later
   to be the default in Solaris 10. However, Solaris Next, being
   descended from ONNV, will have it as default.

   An edited ibd(7D) manpage documenting this change is in the
   materials directory.

B.2 Change of default MTU size

   Connected Mode by virtue of using the RC transport service type
   offers link MTUs of up to 2^31-4 octets in length. Thus, the use of
   Connected Mode can offer benefits by supporting very large MTUs.
   Datagram Mode using UD is limited to 4092 (4K-4) octets, though
   commonly only 2044 (2K-4) is offered.

   Due to the limits of the TCP/IP protocol, it makes sense to only
   offer up to 65535 (64K-1) bytes. OFED (i.e. Linux IB) uses 65520
   (64K-16) byte MTU for alignment reasons. To inter-operate with
   OFED at the best performance, we also adopt 65520 as the default
   MTU of the Connected Mode.


C. Interfaces
-------------
+-------------------------------------------------------------------+
|                     Interfaces Exported                           |
+---------------------------+------------------+--------------------+
|    Interface Name         |  Classification  |      Comment       |
+---------------------------+------------------+--------------------+
|/kernel/drv/ibd.conf*      |   Uncommitted    | Configuration file |
+---------------------------+------------------+--------------------+
 * = only for OpenSolaris


D. References
-------------

   [1] IP over InfiniBand, PSARC/2001/289

   [2] IPoIB Conversion to GLDv3, PSARC/2007/636

   [3] InfiniBand Architecture Specification Volume 1, Release 1.2.1,
     InfiniBand Trade Association, 2007.
     http://www.infinibandta.org/content/pages.php?pg=technology_download

   [4] Transmission of IP over InfiniBand (IPoIB), RFC 4391, IETF,
       http://www.ietf.org/rfc/rfc4391.txt

   [5] IP over InfiniBand: Connected Mode, RFC 4755, IETF,
       http://www.ietf.org/rfc/rfc4755.txt

6. Resources and Schedule
    6.4. Steering Committee requested information
        6.4.1. Consolidation C-team Name:
                ON
    6.5. ARC review type: FastTrack
    6.6. ARC Exposure: open

Reply via email to