I am sponsoring this fasttrack for Peter Cudhea. Requested binding is Patch, timeout is 02/25/2009. There is an IO controller profile attributes table, described in section 4.2.3, in the case materials directory.
- John This information is Copyright 2009 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: COMSTAR Infiniband SRP Target 1.2. Name of Document Author/Supplier: Peter.Cudhea at sun.com 1.3. Date of This Document: 02/12/09 4. Technical Description COMSTAR Infiniband SRP Target ------------------------- 4.1. Problem OpenSolaris currently lacks a target driver for the SCSI RDMA Protocol (SRP). SRP accelerates the SCSI protocol by mapping the data transfer phases of SCSI commands to RDMA operations. As a result an SRP initiator should be able to read and write data from a COMSTAR SRP target at high data rates with relatively low CPU utilization. SRP is an alternative to iSER (PSARC 2008/395) for accessing SCSI storage over an Infiniband fabric. Both protocols are seeing demand in the market. In particular, we need an SRP target in COMSTAR (PSARC 2007/523) to enable VMware connectivity to OpenSolaris based open storage. VMware ESX, for example, supports only SRP (not iSER) for block-based storage connectivity over Infiniband. 4.2. Proposal The project will deliver a target implementation of SCSI RDMA Protocol represented as a COMSTAR STMF port provider. We include a minimal implementation of the Infiniband Device Management Agen as a consumer to IBTF (PSARC 2002/132). This agent allows initiator systems to query the capabilities of the target. The SRP port provider will register its targets with the IB DM Agent to allow the targets to be discovered by SRP initiators. 4.2.1. COMSTAR SRP Target (srpt) When the SRP target service is enabled, it will register as a COMSTAR port provider using STMF. This port provider will use the IB transport framework (IBTF) to enumerate all the HCAs on the system by GUID. Each IB HCA will be reflected to STMF as a COMSTAR target named 'eui.<HCA-GUID>'. For example, for an IB HCA with a HCA GUID of 0003BA0001002E49 the STMF target name will be 'eui.0003BA0001002E49'. STMF commands may then be used to assign these targets to host groups and to create views that determine which backing stores are accessible to which targets. STMF commands may also be used to mark each target as either offline or online. All of the physical IB ports on an HCA will treated as part of the same STMF target. In IB target terms, each Host Channel Adapter (HCA) is treated as a Target Channel Adapter (TCA) with a single IO Unit containing a single IO controller. Multiple physical ports on the HCA are not exposed as separate virtual target-side resources. When the SRP service is enabled and the STMF target for a particular HCA is marked online, each port on that HCA will be configured to listen for incoming connections to the SRP service. A virtual I/O Controller representing the target will also be registered with the minimal IB DM Agent as described in section 4.2.2. The SRP target capability will be represented as an SMF service using the FMRI svc:/system/ibsrp/target:default. This service will be disabled by default. No new 'rights profiles' will be defined; each administrative method for this service (e.g. start and stop) will use a credential with user=root, group=root, and privileges=basic,sys_devices. The ibsrp/target service will be dependent on STMF (svc:/system/stmf:default) and on the IB Device Management Agent described in section 4.2.2 (svc:/system/ibdma:default). The SRP target will be implemented as pseudo device driver 'srpt' which is a child under the Infiniband 'ib' nexus. 4.2.2 Minimal Infiniband Device-Management Agent (ibdma) In order for IB initiators to enumerate the available storage, this project also includes a minimal "device management agent" for Infiniband, as described in section 16.3 of the Infiniband Architecture Spec. (Section 16.3 corresponds to version 1 of the IB Device Management protocol, which is distinct from the version 2 Device Management protocol as defined in Annex A8.) Infiniband "device management" services are used by IB initiator systems to enumerate the IO Units (IOUs) and IO Controllers (IOCs) on the target system, and to query the services supported on each IO Controller. For this case, SRP initiators in particular use DM services to enumerate all the IOCs that are SRP-capable IOCs, and to enumerate the SRP services that are available through each IOC. SRP is the only target-side service that makes use of IB DM services for discovery that the Infiniband group expects OpenSolaris to support. If other target-side services are added in the future that require a more fully-functioning DM Agent, then this minimal agent can be expanded and extended as required. While the API we introduce in section 4.2.2.3 to register new services as IO Controllers is somewhat general, and could in principle be used by other target-side services, we are keeping the new API interface as Project Private to provide maximum flexibility to enhance or extend the details in the future. We implement only the query-oriented subset of the DM protocol that is necessary for SRP discovery. The full DM protocol, among other uses, allows initiator systems to request target-side device management services such as testing devices and retrieving device diagnostic codes. The SRP spec in section B.5 spells out its requirements: The IB I/O unit shall include an IB device management agent to provide the IOUnitInfo, IOControllerProfile, and ServiceEntries attributes. The query-oriented subset we implement falls short of full compliance with the IB DM protocol as specified in the IB Arch spec. This subset is consistent with how initiators actually use DM Agent services for discovery. We return appropriate error codes (either "method not supported" or "combination of method and attribute not supported" for all other DM requests. The IB Device Management agent will be represented as an SMF service using the FMRI svc:/system/ibdma:default. This service will be disabled by default. No new 'rights profiles' will be defined; each administrative method for this service (e.g. start and stop) will use a credential with user=root, group=root, and privileges=basic,sys_devices. The IB DM agent will be implemented as a pseudo device driver 'ibdma' under /kernel/misc. 4.2.2.1 Listening for DM agent requests To support this case, the IB group will add a new value IBT_DMA to the enumeration ibt_clnt_modinfo_t that is used as an argument to ibt_attach. The IBT_DMA value is reserved for Sun Internal use only, and is used to directly support the IB Device Management Agent. The minimal IB DM agent, as specified in section B.5 of the SRP spec (SCSI Architecture Mapping) will respond to DevMgtGet requests with the following Attribute IDs: Attribute ID Attribute Name OpenSolaris Comments 0x01 ClassPortInfo "HELLO" message to determine protocol class match 0x02 IOUnitInfo Enumerate the "virtual IOCs" available on an IO Unit. Each HCA is represented as an IO Unit. 0x10 IOControllerProfile Retrieve IO Controller Profile information for a specific IOC 0x12 ServiceEntries Enumerate Service Names and Service IDs for a given IOC The manual page changes for adding IBT_DMA are: ------- ibt_attach.9f ------- 75a76,77 > IBT_DMA For Sun Internal use only. ------- ibt_clnt_modinfo_t.9s ------- 55a56,57 > IBT_DMA For Sun Internal use only. 4.2.2.2 Enumeration of HCAs as I/O Units Infiniband target services are made available through I/O Controllers that are associated with I/O Units. See for example Figure B.3 in Annex B of the SRP Specification for an SRP-specific picture of this architecture. In the general IB architecture, a particular I/O Unit could be visible through several different Target CAs and thus via the different IB ports on those CAs. The IO Controllers and services available through an IOU are generally consistent no matter which port is used to make a connection to that IOU. In simple terms, once a connection has reached an IO Unit, it can make use of the services provided by that I/O Unit. For the minimal IB DM Agent, we choose to represent each HCA as a separate I/O Unit. The key to the minimal IB DM Agent is that it requires no administration. By using the HCA GUID as the I/O Unit GUID, we avoid the need to administer I/O Units as separate entities. Alternative no-administration models would be to treat the entire target system as a single I/O Unit or to treat each individual port on each HCA as an I/O Unit. The current HCA-as-an-IOU model was chosen because it is familiar to current users of IB target-side services on other systems such as Linux. This model works the way users of IB services expect it to. As soon as the ibdma service is enabled, the target system will: o Modify the "port profile" for each HCA port to indicate that "device management is supported" on that port. This is part of the "port capabilities mask". o Listen for IB Device Management MADs and respond with errors to those requests that are not supported. o Enumerate each HCA on the system as an available IO Unit. o Until specific target-side services are registered using the API defined in section 4.2.2.3, the IO Units will report they have no contained IO Controllers. Virtual IOCs are tied to a specific service and do not exist until a specific service is enabled. 4.2.2.3 Registering a Virtual I/O Controller for a Supported Service In IB, each IOC is associated with an "IO Controller profile" that defines the specific services that are available via that IOC. For example, as described above the Infiniband Annex to the SRP specification defines both the IO/Controller profile and the service name to use for SRP. ibdma_ioc_register Register a "Virtual IO Controller". ibdma_ioc_unregister Unregister a "Virtual IO Controller" ibdma_ioc_update Modify the characteristics of an existing virtual IO Controller. To register a new virtual IO Controller, the caller specifies the GUID of the parent IO Unit, an IOC profile describing the virtual service, and a specific list of target-side Service Entries associated with the virtual IOC. As far as the IB DM Agent is concerned, the set of advertised IOCs is arbitrary. A single IO Unit could support multiple different target-side services each of which would be represented as a separate virtual IO Controller. Similarly, the target-side service can decide whether or not to create multiple virtual IOCs representing the different physical ports on the HCA. It is up to each service individually to determine the particular IO controllers to emulate. The IB DM Agent simply advertises these virtual IO controllers to initiator systems. 4.2.3. The SRP Virtual I/O Controller For SRP, a single virtual IO Controller is created for each HCA. Each virtual IOC is marked as being SRP-capable by using parameters from the standard SRP IO Controller Profile which appears in Annex B of the SRP spec. The specific parameters used in this IO profile are available in the materials directory for this case in a document called SRPT-IOC-parameters.txt. 4.3. Risks and Assumptions The current implementation relies on IB "shared receive queues" which are not available in all IB HCAs or drivers. All Sun supported HCAs and drivers do support this capability. In particular, the tavor, arbel, and (since snv_107) hermon drivers do support this capability. 4.4. How will you know when you are done?: Linux and VMWare ESX initiators based on the OFED stack can reliably access COMSTAR storage through the SRP target port provider. 4.5. Interfaces: -------------------------------------------------------------- EXPORTED INTERFACES Interface Level Comments -------------------------------------------------------------- ibdma_ioc_register Project Private ibdma_ioc_unregister Project Private ibdma_ioc_update Project Private eui.<HCA-GUID> Committed Naming convention for STMF SRP targets IOU GUID = HCA GUID Committed SRP initiators see stable targets IOC GUID = HCA GUID Committed SRP initiators see stable targets SCSI RDMA Protocol Committed Defined by T10 standard for SRP Device Management Protocol Committed Section 16.3 (query subset) of IB architecture IBT_DMA Consolidation Private arg to ibt_attach -------------------------------------------------------------- IMPORTED INTERFACES Interface Level Comments -------------------------------------------------------------- IBTF Consolidation Private STMF Consolidation Private IBT_DMA Consolidation Private Exported from IBTF Imported to SRP --------------------------------------------------------------- 4.6. Doc Impact: Man pages for srpt and ibdma device drivers. IBTF internal man page updates (as described in 4.2.2.1) for ibt_attach(9f) and ibt_clnt_modinfo_t(9s). We will need to add a section in the OpenSolaris COMSTAR Administration Guide that describes how to provision different kinds of COMSTAR storage targets. Currently, this guide describes only how to provision Fibre Channel storage targets. The new documentation should be coordinated among all the new COMSTAR port providers such as iSCSI, iSER, FCoE, and SAS. 4.7. Admin/Config Impact: None 4.10. Packaging & Delivery: Packages SUNWsrptr x86: /kernel/drv/srpt /kernel/drv/amd64/srpt /kernel/drv/srpt.conf /lib/svc/method/svc-srpt /var/svc/manifest/system/ibsrp/target.xml sparc: /kernel/drv/sparcv9/srpt /kernel/drv/srpt.conf /lib/svc/method/svc-srpt /var/svc/manifest/system/ibsrp/target.xml SUNWibdmar x86: /kernel/misc/ibdma /kernel/misc/amd64/ibdma /lib/svc/method/svc-ibdma /var/svc/manifest/system/ibdma.xml sparc: /kernel/misc/sparcv9/ibdma /lib/svc/method/svc-ibdma /var/svc/manifest/system/ibdma.xml 5. References Infiniband Transport Framework (IBTF) [PSARC 2002/132] COMSTAR SCSI Transport Framework (STMF) [PSARC 2007/523] SCSI Architecture Model 2 (SAM-2, T10/1157-D) (t10.org) SCSI-3 Primary Commands (SPC-3, T10/1416-D) (t10.org) SCSI RDMA Protocol revision 16a of the SRP T10/1415-D specification of July 3rd, 2002 (http://www.t10.org/ftp/t10/drafts/srp/srp-r16a.pdf) Infiniband Architecture Specification v1.2.1 (Infiniband Trade Association) (http://www.infinibandta.org/specs/)