I'm sponsoring this fast track for Govinda Tatti and the PCI team.

This project introduces new DDI interfaces, and changes PCITool's command line
syntax in an incompatible way.  However, the change is intended to correct
an incompatibility with respect to CLIP, and the original code has only been
integrated in the last couple of builds of Nevada, so we believe it is
an opportune time to fix this.

The project is seeking Minor Commitment, since the interfaces are primarily
intended for consumption by Crossbow which is not available in Solaris 10.

Man pages, headers, and supporting materials are also located in the
case directory under "materials/"

        - Garrett

Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI
This information is Copyright 2009 Sun Microsystems
1. Introduction
    1.1. Project/Component Working Name:
         Interrupt affinity interfaces and PCITool enhancements
    1.2. Name of Document Author/Supplier:
         Author:  Govinda Tatti
    1.3  Date of This Document:
        03 June, 2009
4. Technical Description

Template Version: @(#)sac_nextcase 1.9 06/02/09 SMI
This information is Copyright 2009 Sun Microsystems
1. Introduction
    1.1 Project/Component Working Name:
        Interrupt Affinity Interfaces and PCITool Enhancements
    1.2 Name of Document Author/Supplier:
        Author:  Govinda Tatti
    1.3 Date of This Document:
        02 June, 2009
4. Technical Description
4.1 Project Summary

    This project provides a mechanism for device drivers, IO frameworks such
    as Crossbow, and for the users who want to know the current CPU binding
    for their interrupts and fine tune those bindings to achieve maximum IO
    performance.

    The first phase of this project delivers the simple DDI interrupt affinity
    interfaces to allow a device driver to retrieve the current interrupt
    target CPU and to express its interrupt target preference. In addition,
    it will deliver some PCITool enhancement to retarget MSI/X interrupts.
    In the next phase, these simple DDI interrupt affinity interfaces will be
    replaced with hint or preference based interfaces. Plus, the DDI interrupt
    framework and platform specific implementation will be modified to query
    the NUMA-IO framework for optimal interrupt target CPU before configuring
    the platform interrupt targeting hardware logic.

4.2 Problem and Requirements

    Modern IO bus technologies support large numbers of interrupts. A single
    PCI or PCIe device could use up to 32 MSI interrupts, or 2048 MSI-X
    interrupts. The IRM project (PSARC/2008/628) fixed the MSI-X allocation
    limit issue and solved part of an IO performance problem. Other part of
    this problem is how to fine tune the CPU bindings for these multiple MSI-X
    interrupts to achieve the expected IO performance.

    Currently there is a need for Solaris device drivers such as NIC (10G),
    HBA (Emulex) and IO frameworks such as Crossbow to retrieve and reroute
    the target CPU for their interrupts. For example, Crossbow provides a
    framework by which NIC resources such as Rx and Tx rings are exposed to
    the MAC layer. The MAC layer doles out these resources to VNICs when they
    get created while reserving a fixed amount for the primary NIC. CPUs,
    on which the processing of packets take place, can be specified at VNIC
    creation time or later.  If they are specified, the interrupts associated
    with the Rx/Tx rings need to be re-targeted to the specified CPUs. A
    mechanism by which we can re-target a specific MSI-X interrupt to a
    different CPU is needed. This is for the virtualization part of Crossbow.

    For optimal performance of regular NICs (as well as VNICs), the poll thread 
    associated with an Rx ring should be bound to the same CPU as the interrupt 
    CPU. So given an interrupt handle and a CPU, we need a mechanism to retarget
    the interrupt to the specified CPU. This has become a major issue (on
    Maramba) for performance when multiple 10 Gig NICs are present. The poll
    threads belonging to one NIC can end up running on CPUs which is taking
    interrupts from another NIC.

    Presently Crossbow uses the PCITool ioctls (sys/pci_tools.h) to re-target
    fixed interrupts from inside the kernel. The interface provided is not ideal
    for doing this kind of work from inside the kernel. A better interface is
    needed here. Also this mechanism currently does not work for MSI-Xs on
    SPARC platforms. This should be addressed.

    To achieve the above objectives, the following interfaces are required:

    1. Given an interrupt handle (ddi_intr_handle_t) that is associated with an
       Rx/Tx ring, provide the CPU (processorid_t) to which interrupt is going.

    2. Given an interrupt handle (ddi_intr_handle_t) that is associated with an 
       Rx/Tx ring and a CPU, bind the interrupt to the specified CPU.

4.3 Changes From the Previous Case

    This project is an extension to the approved cases, PSARC/2004/253
    "Advanced DDI Interrupt Interfaces" and PSARC/2008/628 "Interrupt Resource
    Management". The existing DDI interrupt interfaces are not changed. But
    some new DDI interrupt interfaces are added, which extend the capabilities
    of the existing interfaces.

    The changes include:
    - A new function (ddi_intr_get_affinity(9f)) to return the interrupt
      target CPU for a given DDI interrupt handle h.
    - A new function (ddi_intr_set_affinity(9f)) to set the interrupt target
      CPU for a given DDI interrupt handle h.
    - Modify ddi_intr_get_cap(9f) function to return the new capability flag
      DDI_INTR_FLAG_RETARGETABLE indicating all the interrupts are retargetable
      for the current interrupt type in use.
    - A new PCITool option, -m to retarget MSI/X interrupts.

4.4 Competitive Analysis

    Linux and Microsoft OSs already provides the interrupt retarget interfaces
    of some fashion to their device drivers. So, it is important to provide
    similar features to Solaris device drivers to achieve individual device
    performance and also, overall IO performance on all of Sun's platforms in
    order to remain competitive in the marketplace.

4.5 Project Description

4.5.1 Interrupt Affinity Interfaces

    The basic strategy is to provide an opportunity for device drivers to
    provide its input in selecting the proper interrupt target CPU (such as
    CPU# or preference) for its interrupts. The device drivers or IO frameworks
    will call the proposed affinity interfaces either during its initialization 
    or run time to optimize its IO performance based on the available resources
    such as DMA channels, rings, interrupts allocated and current CPU bindings.

    typedef processorid_t ddi_intr_target_t;

    int ddi_intr_get_affinity(ddi_intr_handle_t h, ddi_intr_target_t *tgt_p);
    int ddi_intr_set_affinity(ddi_intr_handle_t h, ddi_intr_target_t tgt);

    These interfaces are optional to the device drivers, so drivers that don't
    use it still work even if the system has implemented this feature. And
    conversely, drivers that do use it also work if the system does not
    implement the support. 

    This case also includes the contract for Crossbow framework to use these
    interrupt affinity interfaces in place of existing PCITool ioctl interfaces.

    Constraints:
    a) Set affinity limitations for certain interrupt types 
       Fixed or INTx interrupts could be either exclusive or sharable depending 
       on hardware. Because there is no good way to detect that, the current
       implementation will refuse any set affinity requests for INTx interrupts.

       On x86 platforms, multiple MSI interrupts of a single PCI function need
       to be rerouted together since all MSI interrupts share the same MSI
       address, which in turn includes same CPU number. Hence the current x86
       implementation will refuse any set affinity requests for MSI interrupts.
       The future phase of this project may support MSI group retarget, similar
       to PCITool method.

    b) CPU offline considerations
       CPUs may be online/offlined through administrative interfaces. When
       a CPU is offlined, all of the interrupts targeting it are re-targeted.
       The OS will pick any set of the surviving CPUs for re-targeting. The
       OS is under no obligation to maintain drivers' interrupt affinity
       preferences.

       The first phase of this project will not provide any callback on CPU
       online/offline events. Such callback events need to be defined in the
       future. If a driver or framework is interested in maintaining optimal
       CPU targeting, it should monitor its interrupt CPU bindings on a regular
       basis using ddi_intr_get_affinity(9f) or register a callback to receive
       various CPU specific events using register_cpu_setup_func(). Where as,
       the userland entities should subscribe to CPU DR specific sysevents.

4.5.2 PCITool Enhancements

    Current syntax:
        pcitool pci@<unit-address> -i ino=ino
        [ -r [ -c ] | -w cpu=CPU [ -g ] ] [ -v ] [ -q ]

    Proposed syntax:
        pcitool pci@<unit-address> -i <ino#> | all
        [ -r [ -c ] | -w <cpu#> [ -g ] ] [ -v ] [ -q ]
  
        pcitool pci@<unit-address> -m <msi#> | all
        [ -r [ -c ] | -w <cpu#> [ -g ] ] [ -v ] [ -q ]

    The PCItool is a low-level tool which provides a facility for getting and
    setting interrupt routing information. This project is making some minor
    syntax changes to PCITool since the current syntax is not compliant with
    existing userland guidelines.

    In addition, this project is adding a new "-m" option to retrieve and
    reroute the interrupt target CPU for MSI/Xs on SPARC platforms.
  
    On SPARC platforms, the INO is mapped to an interrupt mondo, and where as
    one or more MSI/Xs are mapped to an INO. So, INO and MSI/Xs are individually
    retargetable. Use "-i " option to retrieve or reroute a given INO, and
    where as use "-m" option for MSI/Xs.
   
    On x86 platforms, both INOs and MSI/Xs are mapped to the same interrupt
    vectors. Use "-i" option to retrieve and reroute any interrupt vectors
    (both INO and MSI/Xs). So, "-m" option is not required on x86 platforms.
    Hence it is not supported.
  
4.6 Interfaces

4.6.1 Exported Interfaces

    Interface                   Stability       Comments
    ----------------------------+---------------+--------------------------
    ddi_intr_target_t           Project         Interrupt target CPU
                                Private
    ddi_intr_get_affinity       Project         Get interrupt target CPU
                                Private
    ddi_intr_set_affinity       Project         Set interrupt target CPU
                                Private
    -----------------------------------------------------------------------

4.6.2 Imported Interfaces

    Interface                   Stability       Comments
    ----------------------------+---------------+--------------------------
    DDI_INTR_FLAG_RETARGETABLE  Project         Return this new flag (RO) to
                                Private         ddi_intr_get_cap() callers if
                                                current interrupt type in use
                                                is retargetable

    pcitool                     Project         Minor syntax changes. Added
                                Private         new -m option for MSI/Xs.
    -----------------------------------------------------------------------

5. References
   [1]  Solaris Interrupt Project Webpage
        http://pciexpress.sfbay/intr

   [2]  Advanced DDI Interrupt Functions - PSARC/2004/253
        http://sac.sfbay.sun.com/PSARC/2004/253

   [3]  Interrupt Resource Management - PSARC/2008/628
        http://sac.sfbay.sun.com/PSARC/2008/628

   [4]  PCITool and its nexus ioctl support - PSARC/2005/232
        http://sac.sfbay.sun.com/PSARC/2005/232

   [5]  PCITool Public Interrupts - PSARC/2009/215
        http://sac.sfbay.sun.com/PSARC/2009/215

6. Resources and Schedule
    6.4 Steering Committee requested information
        6.4.1 Consolidation C-team Name:
                ON
    6.5 ARC review type: FastTrack
    6.6 ARC Exposure: open


6. Resources and Schedule
    6.4. Steering Committee requested information
        6.4.1. Consolidation C-team Name:
                ON
    6.5. ARC review type: FastTrack
    6.6. ARC Exposure: open


Reply via email to