Hi, I'll add a few edits other than those that Borislav made. (nice review job, BP)
On 8/27/20 8:06 AM, Fenghua Yu wrote: > From: Ashok Raj <ashok....@intel.com> > > ENQCMD and Data Streaming Accelerator (DSA) and all of their associated > features are a complicated stack with lots of interconnected pieces. > This documentation provides a big picture overview for all of the > features. > > Signed-off-by: Ashok Raj <ashok....@intel.com> > Co-developed-by: Fenghua Yu <fenghua...@intel.com> > Signed-off-by: Fenghua Yu <fenghua...@intel.com> > Reviewed-by: Tony Luck <tony.l...@intel.com> > --- > v7: > - Change the doc for updating PASID by IPI and context switch (Andy). > > v3: > - Replace deprecated intel_svm_bind_mm() by iommu_sva_bind_mm() (Baolu) > - Fix a couple of typos (Baolu) > > v2: > - Fix the doc format and add the doc in toctree (Thomas) > - Modify the doc for better description (Thomas, Tony, Dave) > > Documentation/x86/index.rst | 1 + > Documentation/x86/sva.rst | 254 ++++++++++++++++++++++++++++++++++++ > 2 files changed, 255 insertions(+) > create mode 100644 Documentation/x86/sva.rst > diff --git a/Documentation/x86/sva.rst b/Documentation/x86/sva.rst > new file mode 100644 > index 000000000000..6e7ac565e127 > --- /dev/null > +++ b/Documentation/x86/sva.rst > @@ -0,0 +1,254 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +=========================================== > +Shared Virtual Addressing (SVA) with ENQCMD > +=========================================== > + > +Background > +========== > + ... > + > +Shared Hardware Workqueues > +========================== > + > +Unlike Single Root I/O Virtualization (SRIOV), Scalable IOV (SIOV) permits > +the use of Shared Work Queues (SWQ) by both applications and Virtual > +Machines (VM's). This allows better hardware utilization vs. hard > +partitioning resources that could result in under utilization. In order to > +allow the hardware to distinguish the context for which work is being > +executed in the hardware by SWQ interface, SIOV uses Process Address Space > +ID (PASID), which is a 20bit number defined by the PCIe SIG. 20-bit > + > +PASID value is encoded in all transactions from the device. This allows the > +IOMMU to track I/O on a per-PASID granularity in addition to using the PCIe > +Resource Identifier (RID) which is the Bus/Device/Function. > + > + > +ENQCMD > +====== > + ... > + > +Process Address Space Tagging > +============================= > + ... > + > +PASID Management > +================ > + ... > + > +Relationships > +============= > + > + * Each process has many threads, but only one PASID (end with) PASID. > + * Devices have a limited number (~10's to 1000's) of hardware > + workqueues and each portal maps down to a single workqueue. > + The device driver manages allocating hardware workqueues. > + * A single mmap() maps a single hardware workqueue as a "portal" (end with) . > + * For each device with which a process interacts, there must be > + one or more mmap()'d portals. > + * Many threads within a process can share a single portal to access > + a single device. > + * Multiple processes can separately mmap() the same portal, in > + which case they still share one device hardware workqueue. > + * The single process-wide PASID is used by all threads to interact > + with all devices. There is not, for instance, a PASID for each > + thread or each thread<->device pair. > + > +FAQ > +=== > + > +* What is SVA/SVM? > + > +Shared Virtual Addressing (SVA) permits I/O hardware and the processor to > +work in the same address space. In short, sharing the address space. Some > +call it Shared Virtual Memory (SVM), but Linux community wanted to avoid waned to avoid confusing > +it with Posix Shared Memory and Secure Virtual Machines which were terms POSIX > +already in circulation. > + > +* What is a PASID? > + > +A Process Address Space ID (PASID) is a PCIe-defined TLP Prefix. A PASID is ah, BP already commented about using acronyms to define acronyms. :) > +a 20 bit number allocated and managed by the OS. PASID is included in all 20-bit > +transactions between the platform and the device. > + > +* How are shared work queues different? > + > +Traditionally to allow user space applications interact with hardware, > +there is a separate instance required per process. For example, consider > +doorbells as a mechanism of informing hardware about work to process. Each > +doorbell is required to be spaced 4k (or page-size) apart for process > +isolation. This requires hardware to provision that space and reserve in reserve it in > +MMIO. This doesn't scale as the number of threads becomes quite large. The > +hardware also manages the queue depth for Shared Work Queues (SWQ), and > +consumers don't need to track queue depth. If there is no space to accept > +a command, the device will return an error indicating retry. Also > +submitting a command to an MMIO address that can't accept ENQCMD will > +return retry in response. In the new DMWr PCIe terminology, devices need to so how does a submitter know whether a return of "retry" means no_space or invalid_for_this_device? > +support DMWr completer capability. In addition it requires all switch ports > +to support DMWr routing and must be enabled by the PCIe subsystem, much > +like how PCIe Atomics() are managed for instance. > + > +SWQ allows hardware to provision just a single address in the device. When > +used with ENQCMD to submit work, the device can distinguish the process > +submitting the work since it will include the PASID assigned to that > +process. This decreases the pressure of hardware requiring to support > +hardware to scale to a large number of processes. > + > +* Is this the same as a user space device driver? > + > +Communicating with the device via the shared work queue is much simpler > +than a full blown user space driver. The kernel driver does all the > +initialization of the hardware. User space only needs to worry about > +submitting work and processing completions. > + > +* Is this the same as SR-IOV? > + > +Single Root I/O Virtualization (SR-IOV) focuses on providing independent In 2 other places, SR-IOV is just SRIOV. Please be consistent. > +hardware interfaces for virtualizing hardware. Hence its required to be > +almost fully functional interface to software supporting the traditional > +BAR's, space for interrupts via MSI-x, its own register layout. BARs, MSI-X, > +Virtual Functions (VFs) are assisted by the Physical Function (PF) > +driver. > + > +Scalable I/O Virtualization builds on the PASID concept to create device > +instances for virtualization. SIOV requires host software to assist in > +creating virtual devices, each virtual device is represented by a PASID devices; each > +along with the BDF of the device. This allows device hardware to optimize what is BDF? ah, bus/device/function. still, not nice here. > +device resource creation and can grow dynamically on demand. SR-IOV creation > +and management is very static in nature. Consult references below for more > +details. > + > +* Why not just create a virtual function for each app? > + > +Creating PCIe SRIOV type virtual functions (VF) are expensive. They create is > +duplicated hardware for PCI config space requirements, Interrupts such as requirements -- interrupts > +MSIx for instance. Resources such as interrupts have to be hard partitioned MSI-X > +between VF's at creation time, and cannot scale dynamically on demand. The VFs > +VF's are not completely independent from the Physical function (PF). Most VFs > +VF's require some communication and assistance from the PF driver. SIOV VFs > +creates a software defined device. Where all the configuration and control > +aspects are mediated via the slow path. The work submission and completion > +happen without any mediation. ... -- ~Randy