okay, volatile it is. -ted
Garrett D'Amore wrote: > Ted H. Kim wrote: >> There is a extensive revision of the dladm >> support for IPoIB coming where Brussels >> support and a change in the administrative >> model will be dealt with. But that may be >> a ways off (est. 2010.Q2?) and in the >> meantime, people are screaming for >> the performance that Connected Mode gives, >> so we don't want to wait for that. > > So, lets make the .conf setting Volatile, since we expect to change it > in less than a year to a Brussels setting. This will allow people to > use it, but with an admonition not to get too fond of the driver.conf > setting. > > Personally, I think Brussels support is so easy to implement that I'm > not sure I understand why this can't be done almost immediately. > > - Garrett > >> >> -ted >> >> Garrett D'Amore wrote: >>> I feel very strongly that I'd prefer to avoid the use of a >>> driver.conf for this, and instead handle it as a Brussels property, >>> at least on Solaris Nevada. (This will support administration via >>> dladm, and ultimately also ndd, though we don't like to say that. ;-) >>> >>> If you need to use a driver.conf for Solaris 10, that's OK I suppose >>> (although an ndd tunable would be better there too, since it doesn't >>> require the driver to be unloaded and reloaded to change the setting >>> -- which can be very challenging for administrators to figure out.) >>> >>> I feel TCR-strong on this -- if it were a full case I'd insist that >>> this be part of the spec before I'd vote to approve. >>> >>> Is the project team amenable to making this change, or do they have >>> some other reason why driver.conf values need to be used instead. >>> >>> Also, I'd like the mtu to be set via Brussels as well, if it isn't >>> already handled that way. >>> >>> - Garrett >>> >>> Ted Kim wrote: >>>> Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI >>>> This information is Copyright 2009 Sun Microsystems >>>> 1. Introduction >>>> 1.1. Project/Component Working Name: >>>> IPoIB Connected Mode >>>> 1.2. Name of Document Author/Supplier: >>>> Author: Kevin Ge >>>> 1.3 Date of This Document: >>>> 30 October, 2009 >>>> 4. Technical Description >>>> >>>> A. Overview >>>> ----------- >>>> >>>> This case proposes changes to the Solaris kernel to provide support >>>> for "Connected Mode" in the IPoIB driver ibd(7D) (described in [1] >>>> and [2]). >>>> >>>> The Infiniband Architecture [3] defines multiple "transport service >>>> types", including Unreliable Datagram (UD), Reliable Connected (RC) >>>> and Unreliable Connected (UC). Current ibd (based on [4]) runs in >>>> "Datagram Mode" over the UD transport service type. Connected Mode >>>> (described in [5]) can use either UC and/or RC. >>>> >>>> This IPoIB-CM project uses RC, because of the desire to >>>> inter-operate with Linux which also uses RC. The main advantage of >>>> Connected Mode is better performance (higher throughput and lower >>>> CPU utilization) based on using very large MTUs (see below for more >>>> discussion). Connected Mode, though, can have the disadvantage of >>>> consuming more resources, especially when scaling up to a large >>>> cluster (due to using an InfiniBand connection to each destination). >>>> >>>> Note that this case only covers all necessary changes to support >>>> IPoIB driver running in Connected Mode over RC. Other enhancements >>>> are outside the scope of this case. >>>> >>>> A micro/patch binding is asserted for this proposal. >>>> >>>> B. Connected Mode IPoIB driver >>>> ------------------------------ >>>> >>>> The revised ibd(7D) driver will support both Connected and Datagram >>>> mode. The features from the current Datagram mode ibd driver will >>>> be inherited. The remainder of this section discusses interface >>>> additions for the Connected mode capable driver. >>>> >>>> >>>> B.1 Switching between datagram and connected mode >>>> >>>> The existing ibd driver in OpenSolaris and Solaris 10 does not >>>> ship with a driver .conf file. However, the Connected Mode support >>>> described in this case introduces a new parameter 'enable_rc' that >>>> may be set via the ibd driver .conf file. >>>> >>>> This parameter specifies whether each ibd instance defaults to >>>> using Connected Mode over RC or not. >>>> >>>> # 1: unicast packets will be sent over Reliable Connected Mode >>>> # 0: unicast packets will be sent over Unreliable Datagram Mode >>>> # >>>> # Each element in the list below maps to the corresponding ibd >>>> # instance; the first element is for ibd instance 0, the second >>>> # element is for instance 1 and so on. >>>> # >>>> enable_rc=1,1,0,0; >>>> >>>> Please note that Connected Mode support in IPoIB is optional as per >>>> [5]. Therefore, if Connected Mode is not available for a remote >>>> node, the Datagram mode will automatically be used for that >>>> destination by the ibd driver. Therefore, the only meaning of >>>> 'enable_rc' is to decide whether to try Connected Mode first or >>>> not, and whether to advertise this as a capability supported by >>>> this instance or not. >>>> >>>> The default value for 'enable_rc' for each instance is 0. Hence >>>> without a ibd.conf file, Datagram mode will be used. We intend to >>>> ship a driver .conf file for ibd in ONNV (and hence OpenSolaris) >>>> with enable_rc set to all ones (enabling Connected Mode by >>>> default on all instances) for the best performance. >>>> >>>> However, for Solaris 10, we have received business guidance to have >>>> an "opt-in" approach due to a desire for greater stability in >>>> established enterprise environments. We will do this by not >>>> shipping the .conf file. Therefore, by default Solaris 10 will be >>>> Datagram mode. It will take an explicit administrator action >>>> (setting enable_rc) to cause Solaris 10 to use Connected Mode. >>>> OFED (Linux IB) originally made Connected Mode opt-in too. >>>> However, >>>> later OFED made it the default. We don't intend to change it later >>>> to be the default in Solaris 10. However, Solaris Next, being >>>> descended from ONNV, will have it as default. >>>> >>>> An edited ibd(7D) manpage documenting this change is in the >>>> materials directory. >>>> >>>> B.2 Change of default MTU size >>>> >>>> Connected Mode by virtue of using the RC transport service type >>>> offers link MTUs of up to 2^31-4 octets in length. Thus, the use of >>>> Connected Mode can offer benefits by supporting very large MTUs. >>>> Datagram Mode using UD is limited to 4092 (4K-4) octets, though >>>> commonly only 2044 (2K-4) is offered. >>>> >>>> Due to the limits of the TCP/IP protocol, it makes sense to only >>>> offer up to 65535 (64K-1) bytes. OFED (i.e. Linux IB) uses 65520 >>>> (64K-16) byte MTU for alignment reasons. To inter-operate with >>>> OFED at the best performance, we also adopt 65520 as the default >>>> MTU of the Connected Mode. >>>> >>>> >>>> C. Interfaces >>>> ------------- >>>> +-------------------------------------------------------------------+ >>>> | Interfaces Exported | >>>> +---------------------------+------------------+--------------------+ >>>> | Interface Name | Classification | Comment | >>>> +---------------------------+------------------+--------------------+ >>>> |/kernel/drv/ibd.conf* | Uncommitted | Configuration file | >>>> +---------------------------+------------------+--------------------+ >>>> * = only for OpenSolaris >>>> >>>> >>>> D. References >>>> ------------- >>>> >>>> [1] IP over InfiniBand, PSARC/2001/289 >>>> >>>> [2] IPoIB Conversion to GLDv3, PSARC/2007/636 >>>> >>>> [3] InfiniBand Architecture Specification Volume 1, Release 1.2.1, >>>> InfiniBand Trade Association, 2007. >>>> >>>> http://www.infinibandta.org/content/pages.php?pg=technology_download >>>> >>>> [4] Transmission of IP over InfiniBand (IPoIB), RFC 4391, IETF, >>>> http://www.ietf.org/rfc/rfc4391.txt >>>> >>>> [5] IP over InfiniBand: Connected Mode, RFC 4755, IETF, >>>> http://www.ietf.org/rfc/rfc4755.txt >>>> >>>> 6. Resources and Schedule >>>> 6.4. Steering Committee requested information >>>> 6.4.1. Consolidation C-team Name: >>>> ON >>>> 6.5. ARC review type: FastTrack >>>> 6.6. ARC Exposure: open >>>> >>>> >>> >> > -- Ted H. Kim Sun Microsystems, Inc. ted.kim at sun.com 222 North Sepulveda Blvd., 10th Floor (310) 341-1116 El Segundo, CA 90245 (310) 341-1120 FAX