Hi Sylvain,
  Glad to know we are on the same page. I haven't updated the spec with this proposal yet, in case I got more comments :). I will do so by today.

Thanks,
Sundar

On 5/30/2018 12:34 AM, Sylvain Bauza wrote:


On Wed, May 30, 2018 at 1:33 AM, Nadathur, Sundar <sundar.nadat...@intel.com <mailto:sundar.nadat...@intel.com>> wrote:

    Hi all,
       The Cyborg/Nova scheduling spec [1] details what traits will be
    applied to the resource providers that represent devices like
    GPUs. Some of the traits referred to vendor names. I got feedback
    that traits must not refer to products or specific models of
    devices. I agree. However, we need some reference to device types
    to enable matching the VM driver with the device.

    TL;DR We need some reference to device types, but we don't need
    product names. I will update the spec [1] to clarify that. Rest of
    this email clarifies why we need device types in traits, and what
    traits we propose to include.

    In general, an accelerator device is operated by two pieces of
    software: a driver in the kernel (which may discover and handle
    the PF for SR-IOV  devices), and a driver/library in the guest
    (which may handle the assigned VF).

    The device assigned to the VM must match the driver/library
    packaged in the VM. For this, the request must explicitly state
    what category of devices it needs. For example, if the VM needs a
    GPU, it needs to say whether it needs an AMD GPU or an Nvidia GPU,
    since it may have the driver/libraries for that vendor alone. It
    may also need to state what version of Cuda is needed, if it is a
    Nvidia GPU. These aspects are necessarily vendor-specific.


FWIW, the vGPU implementation for Nova also has the same concern. We want to provide traits for explicitly say "use this vGPU type" but given it's related to a specific vendor, we can't just say "ask for this frame buffer size, or just for the display heads", but rather "we need a vGPU accepting Quadro vDWS license".

    Further, one driver/library version may handle multiple devices.
    Since a new driver version may be backwards compatible, multiple
    driver versions may manage the same device. The
    development/release of the driver/library inside the VM should be
    independent of the kernel driver for that device.


I agree.

    For FPGAs, there is an additional twist as the VM may need
    specific bitstream(s), and they match only specific device/region
    types. The bitstream for a device from a vendor will not fit any
    other device from the same vendor, let alone other vendors. IOW,
    the region type is specific not just to a vendor but to a device
    type within the vendor. So, it is essential to identify the device
    type.

    So, the proposed set of RCs and traits are as below. As we learn
    more about actual usages by operators, we may need to evolve this set.

      * There is a resource class per device category e.g.
        CUSTOM_ACCELERATOR_GPU, CUSTOM_ACCELERATOR_FPGA.
      * The resource provider that represents a device has the
        following traits:
          o Vendor/Category trait: e.g. CUSTOM_GPU_AMD,
            CUSTOM_FPGA_XILINX.
          o Device type trait which is a refinement of vendor/category
            trait e.g. CUSTOM_FPGA_XILINX_VU9P.

            NOTE: This is not a product or model, at least for FPGAs.
            Multiple products may use the same FPGA chip.
            NOTE: The reason for having both the vendor/category and
            this one is that a flavor may ask for either, depending on
            the granularity desired. IOW, if one driver can handle all
            devices from a vendor (*eye roll*), the flavor can ask for
            the vendor/category trait alone. If there are separate
            drivers for different device families from the same
            vendor, the flavor must specify the trait for the device
            family.
            NOTE: The equivalent trait for GPUs may be like
            CUSTOM_GPU_NVIDIA_P90, but I'll let others decide if that
            is a product or not.


I was about to propose the same for vGPUs in Nova, ie. using custom traits. The only concern is that we need operators to set the traits directly using osc-placement instead of having Nova magically provide those traits. But anyway, given operators need to set the vGPU types they want, I think it's acceptable.


          o For FPGAs, we have additional traits:
              + Functionality trait: e.g. CUSTOM_FPGA_COMPUTE,
                CUSTOM_FPGA_NETWORK, CUSTOM_FPGA_STORAGE
              + Region type ID.  e.g. CUSTOM_FPGA_INTEL_REGION_<uuid>.
              + Optionally, a function ID, indicating what function is
                currently programmed in the region RP. e.g.
                CUSTOM_FPGA_INTEL_FUNCTION_<uuid>. Not all
                implementations may provide it. The function trait may
                change on reprogramming, but it is not expected to be
                frequent.
              + Possibly, CUSTOM_PROGRAMMABLE as a separate trait.

    [1] https://review.openstack.org/#/c/554717/
    <https://review.openstack.org/#/c/554717/>



I'll try to review the spec as soon as I can.

-Sylvain



    Thanks.

    Regards,
    Sundar

    __________________________________________________________________________
    OpenStack Development Mailing List (not for usage questions)
    Unsubscribe:
    openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
    <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
    <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>




__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to