> -----Original Message----- > From: Nikos Dragazis <ndraga...@arrikto.com> > Sent: 21 July 2020 17:34 > To: Thanos Makatos <thanos.maka...@nutanix.com> > Cc: qemu-devel@nongnu.org; benjamin.wal...@intel.com; > elena.ufimts...@oracle.com; tomassetti.and...@gmail.com; John G > Johnson <john.g.john...@oracle.com>; jag.ra...@oracle.com; Swapnil > Ingle <swapnil.in...@nutanix.com>; james.r.har...@intel.com; > konrad.w...@oracle.com; yuvalkash...@gmail.com; dgilb...@redhat.com; > Raphael Norwitz <raphael.norw...@nutanix.com>; ism...@linux.com; > alex.william...@redhat.com; kanth.ghatr...@oracle.com; > stefa...@redhat.com; Felipe Franciosi <fel...@nutanix.com>; > marcandre.lur...@redhat.com; tina.zh...@intel.com; > changpeng....@intel.com > Subject: Re: [PATCH v3] introduce VFIO-over-socket protocol specificaion > > Hi Thanos, > > I had a quick look on the spec. Leaving some comments inline. > > On 17/7/20 2:20 μ.μ., Thanos Makatos wrote: > > > This patch introduces the VFIO-over-socket protocol specification, which > > is designed to allow devices to be emulated outside QEMU, in a separate > > process. VFIO-over-socket reuses the existing VFIO defines, structs and > > concepts. > > > > It has been earlier discussed as an RFC in: > > "RFC: use VFIO over a UNIX domain socket to implement device offloading" > > > > Signed-off-by: John G Johnson <john.g.john...@oracle.com> > > Signed-off-by: Thanos Makatos <thanos.maka...@nutanix.com> > > > > --- > > > > Changed since v1: > > * fix coding style issues > > * update MAINTAINERS for VFIO-over-socket > > * add vfio-over-socket to ToC > > > > Changed since v2: > > * fix whitespace > > > > Regarding the build failure, I have not been able to reproduce it locally > > using the docker image on my Debian 10.4 machine. > > --- > > MAINTAINERS | 6 + > > docs/devel/index.rst | 1 + > > docs/devel/vfio-over-socket.rst | 1135 > +++++++++++++++++++++++++++++++++++++++ > > 3 files changed, 1142 insertions(+) > > create mode 100644 docs/devel/vfio-over-socket.rst > > > > diff --git a/MAINTAINERS b/MAINTAINERS > > index 030faf0..bb81590 100644 > > --- a/MAINTAINERS > > +++ b/MAINTAINERS > > @@ -1732,6 +1732,12 @@ F: hw/vfio/ap.c > > F: docs/system/s390x/vfio-ap.rst > > L: qemu-s3...@nongnu.org > > > > +VFIO-over-socket > > +M: John G Johnson <john.g.john...@oracle.com> > > +M: Thanos Makatos <thanos.maka...@nutanix.com> > > +S: Supported > > +F: docs/devel/vfio-over-socket.rst > > + > > vhost > > M: Michael S. Tsirkin <m...@redhat.com> > > S: Supported > > diff --git a/docs/devel/index.rst b/docs/devel/index.rst > > index ae6eac7..0439460 100644 > > --- a/docs/devel/index.rst > > +++ b/docs/devel/index.rst > > @@ -30,3 +30,4 @@ Contents: > > reset > > s390-dasd-ipl > > clocks > > + vfio-over-socket > > diff --git a/docs/devel/vfio-over-socket.rst b/docs/devel/vfio-over- > socket.rst > > new file mode 100644 > > index 0000000..b474f23 > > --- /dev/null > > +++ b/docs/devel/vfio-over-socket.rst > > @@ -0,0 +1,1135 @@ > > +*************************************** > > +VFIO-over-socket Protocol Specification > > +*************************************** > > + > > +Version 0.1 > > + > > +Introduction > > +============ > > +VFIO-over-socket, also known as vfio-user, is a protocol that allows a > device > > I think there is no point in having two names for the same protocol, > "vfio-over-socket" and "vfio-user".
Yes, we'll use vfio-user from now on. > > > +to be virtualized in a separate process outside of QEMU. VFIO-over- > socket > > +devices consist of a generic VFIO device type, living inside QEMU, which > we > > +call the client, and the core device implementation, living outside QEMU, > which > > +we call the server. VFIO-over-socket can be the main transport > mechanism for > > +multi-process QEMU, however it can be used by other applications > offering > > +device virtualization. Explaining the advantages of a > > +disaggregated/multi-process QEMU, and device virtualization outside > QEMU in > > +general, is beyond the scope of this document. > > + > > +This document focuses on specifying the VFIO-over-socket protocol. VFIO > has > > +been chosen for the following reasons: > > + > > +1) It is a mature and stable API, backed by an extensively used > framework. > > +2) The existing VFIO client implementation (qemu/hw/vfio/) can be > largely > > + reused. > > + > > +In a proof of concept implementation it has been demonstrated that > using VFIO > > +over a UNIX domain socket is a viable option. VFIO-over-socket is > designed with > > +QEMU in mind, however it could be used by other client applications. The > > +VFIO-over-socket protocol does not require that QEMU's VFIO client > > +implementation is used in QEMU. None of the VFIO kernel modules are > required > > +for supporting the protocol, neither in the client nor the server, only the > > +source header files are used. > > + > > +The main idea is to allow a virtual device to function in a separate > > process > in > > +the same host over a UNIX domain socket. A UNIX domain socket > (AF_UNIX) is > > +chosen because we can trivially send file descriptors over it, which in > > turn > > +allows: > > + > > +* Sharing of guest memory for DMA with the virtual device process. > > +* Sharing of virtual device memory with the guest for fast MMIO. > > +* Efficient sharing of eventfd's for triggering interrupts. > > + > > +However, other socket types could be used which allows the virtual > device > > +process to run in a separate guest in the same host (AF_VSOCK) or > remotely > > +(AF_INET). Theoretically the underlying transport doesn't necessarily have > to > > +be a socket, however we don't examine such alternatives. In this > document we > > +focus on using a UNIX domain socket and introduce basic support for the > other > > +two types of sockets without considering performance implications. > > + > > +This document does not yet describe any internal details of the server- > side > > +implementation, however QEMU's VFIO client implementation will have > to be > > +adapted according to this protocol in order to support VFIO-over-socket > virtual > > +devices. > > + > > +VFIO > > +==== > > +VFIO is a framework that allows a physical device to be securely passed > through > > +to a user space process; the kernel does not drive the device at all. > > I would remove the last part: "the kernel does not drive the device at > all". Isn't that quite inaccurate? The kernel does drive the device with > the vfio driver. The user space driver needs the vfio driver in order to > do certain things like, for example, write on a port or receive > notifications for device interrupts. In that sense you're right, what we meant to say is that the device-specific driver doesn't drive the device. We'll rephrase it. > > > +Typically, the user space process is a VM and the device is passed through > to > > +it in order to achieve high performance. VFIO provides an API and the > required > > +functionality in the kernel. QEMU has adopted VFIO to allow a guest > virtual > > +machine to directly access physical devices, instead of emulating them in > > Maybe s/guest virtual machine/guest ? OK. > > > +software > > Missing dot here. OK. > > > + > > +VFIO-over-socket reuses the core VFIO concepts defined in its API, but > > +implements them as messages to be sent over a UNIX-domain socket. It > does not > > s/UNIX-domain/UNIX domain (just to have the same name everywhere) > > > +change the kernel-based VFIO in any way, in fact none of the VFIO kernel > > +modules need to be loaded to use VFIO-over-socket. It is also possible for > QEMU > > +to concurrently use the current kernel-based VFIO for one guest device, > and use > > +VFIO-over-socket for another device in the same guest. > > + > > +VFIO Device Model > > +----------------- > > +A device under VFIO presents a standard VFIO model to the user process. > Many > > +of the VFIO operations in the existing kernel model use the ioctl() system > > +call, and references to the existing model are called the ioctl() > > +implementation in this document. > > + > > +The following sections describe the set of messages that implement the > VFIO > > +device model over a UNIX domain socket. In many cases, the messages > are direct > > +translations of data structures used in the ioctl() implementation. > Messages > > +derived from ioctl()s will have a name derived from the ioctl() command > name. > > +E.g., the VFIO_GET_INFO ioctl() command becomes a > VFIO_USER_GET_INFO message. > > +The purpose for this reuse is to share as much code as feasible with the > > s/for/of OK. > > > +ioctl() implementation. > > + > > +Client and Server > > +^^^^^^^^^^^^^^^^^ > > +The socket connects two processes together: a client process and a server > > +process. In the context of this document, the client process is the process > > +emulating a guest virtual machine, such as QEMU. The server process is a > > +process that provides device emulation. > > + > > +Connection Initiation > > +^^^^^^^^^^^^^^^^^^^^^ > > +After the client connects to the server, the initial server message is > > +VFIO_USER_VERSION to propose a protocol version and set of capabilities > to > > +apply to the session. The client replies with a compatible version and set > of > > +capabilities it will support, or closes the connection if it cannot > > support the > > +advertised version. > > + > > +Guest Memory Configuration > > +^^^^^^^^^^^^^^^^^^^^^^^^^^ > > +The client uses VFIO_USER_DMA_MAP and VFIO_USER_DMA_UNMAP > messages to inform > > +the server of the valid guest DMA ranges that the server can access on > behalf > > +of a device. Guest memory may be accessed by the server via > VFIO_USER_DMA_READ > > +and VFIO_USER_DMA_WRITE messages over the socket. > > + > > +An optimization for server access to guest memory is for the client to > provide > > +file descriptors the server can mmap() to directly access guest memory. > Note > > +that mmap() privileges cannot be revoked by the client, therefore file > > +descriptors should only be exported in environments where the client > trusts the > > +server not to corrupt guest memory. > > + > > +Device Information > > +^^^^^^^^^^^^^^^^^^ > > +The client uses a VFIO_USER_DEVICE_GET_INFO message to query the > server for > > +information about the device. This information includes: > > + > > +* The device type and capabilities, > > +* the number of memory regions, and > > +* the device presents to the guest the number of interrupt types the > device > > + supports. > > + > > +Region Information > > +^^^^^^^^^^^^^^^^^^ > > +The client uses VFIO_USER_DEVICE_GET_REGION_INFO messages to > query the server > > +for information about the device's memory regions. This information > describes: > > + > > +* Read and write permissions, whether it can be memory mapped, and > whether it > > + supports additional capabilities. > > +* Region index, size, and offset. > > + > > +When a region can be mapped by the client, the server provides a file > > +descriptor which the client can mmap(). The server is responsible for > polling > > +for client updates to memory mapped regions. > > + > > +Region Capabilities > > +""""""""""""""""""" > > +Some regions have additional capabilities that cannot be described > adequately > > +by the region info data structure. These capabilities are returned in the > > +region info reply in a list similar to PCI capabilities in a PCI device's > > +configuration space. > > + > > +Sparse Regions > > +"""""""""""""" > > +A region can be memory-mappable in whole or in part. When only a > subset of a > > +region can be mapped by the client, a > VFIO_REGION_INFO_CAP_SPARSE_MMAP > > +capability is included in the region info reply. This capability describes > > +which portions can be mapped by the client. > > + > > +For example, in a virtual NVMe controller, sparse regions can be used so > that > > +accesses to the NVMe registers (found in the beginning of BAR0) are > trapped (an > > +infrequent an event), while allowing direct access to the doorbells (an > > s/an event/event OK. > > > +extremely frequent event as every I/O submission requires a write to > BAR0), > > +found right after the NVMe registers in BAR0. > > + > > +Interrupts > > +^^^^^^^^^^ > > +The client uses VFIO_USER_DEVICE_GET_IRQ_INFO messages to query > the server for > > +the device's interrupt types. The interrupt types are specific to the bus > the > > +device is attached to, and the client is expected to know the capabilities > > of > > +each interrupt type. The server can signal an interrupt either with > > +VFIO_USER_VM_INTERRUPT messages over the socket, or can directly > inject > > +interrupts into the guest via an event file descriptor. The client > > configures > > +how the server signals an interrupt with VFIO_USER_SET_IRQS messages. > > + > > +Device Read and Write > > +^^^^^^^^^^^^^^^^^^^^^ > > +When the guest executes load or store operations to device memory, the > client > > +forwards these operations to the server with VFIO_USER_REGION_READ > or > > +VFIO_USER_REGION_WRITE messages. The server will reply with data > from the > > +device on read operations or an acknowledgement on write operations. > > + > > +DMA > > +^^^ > > +When a device performs DMA accesses to guest memory, the server will > forward > > +them to the client with VFIO_USER_DMA_READ and > VFIO_USER_DMA_WRITE messages. > > +These messages can only be used to access guest memory the client has > > +configured into the server. > > + > > +Protocol Specification > > +====================== > > +To distinguish from the base VFIO symbols, all VFIO-over-socket symbols > are > > +prefixed with vfio_user or VFIO_USER. In revision 0.1, all data is in the > > +little-endian format, although this may be relaxed in future revision in > cases > > +where the client and server are both big-endian. The messages are > formatted > > +for seamless reuse of the native VFIO structs. A server can serve: > > + > > +1) multiple clients, and/or > > +2) multiple virtual devices, belonging to one or more clients. > > + > > +Therefore each message requires a header that uniquely identifies the > virtual > > +device. It is a server-side implementation detail whether a single server > > +handles multiple virtual devices from the same or multiple guests. > > + > > +Socket > > +------ > > +A single UNIX domain socket is assumed to be used for each device. The > location > > Is it correct for a spec to assume things? We actually want to say that "a single UNIX domain socket is used for each device", we'll correct it. > > > +of the socket is implementation-specific. Multiplexing clients, devices, > > and > > +servers over the same socket is not supported in this version of the > protocol, > > +but a device ID field exists in the message header so that a future support > can > > +be added without a major version change. > > + > > +Authentication > > +-------------- > > +For AF_UNIX, we rely on OS mandatory access controls on the socket > files, > > +therefore it is up to the management layer to set up the socket as > required. > > +Socket types than span guests or hosts will require a proper > authentication > > +mechanism. Defining that mechanism is deferred to a future version of > the > > +protocol. > > + > > +Request Concurrency > > +------------------- > > +There can be multiple outstanding requests per virtual device, e.g. a > > +frame buffer where the guest does multiple stores to the virtual device. > The > > +server can execute and reorder non-conflicting requests in parallel, > depending > > +on the device semantics. > > + > > +Socket Disconnection Behavior > > +----------------------------- > > +The server and the client can disconnect from each other, either > intentionally > > +or unexpectedly. Both the client and the server need to know how to > handle such > > +events. > > + > > +Server Disconnection > > +^^^^^^^^^^^^^^^^^^^^ > > +A server disconnecting from the client may indicate that: > > + > > +1) A virtual device has been restarted, either intentionally (e.g. because > > of > a > > +device update) or unintentionally (e.g. because of a crash). In any case, > the > > +virtual device will come back so the client should not do anything (e.g. > simply > > +reconnect and retry failed operations). > > + > > Indentation issue ^^ (also remove the space, there are no spaces between > the > elements in the other numbered lists). OK. > > > +2) A virtual device has been shut down with no intention to be restarted. > > + > > +It is impossible for the client to know whether or not a failure is > > +intermittent or innocuous and should be retried, therefore the client > should > > +attempt to reconnect to the socket. Since an intentional server restart > (e.g. > > +due to an upgrade) might take some time, a reasonable timeout should > be used. > > +In cases where the disconnection is expected (e.g. the guest shutting > down), no > > +new requests will be sent anyway so this situation doesn't pose a > problem. The > > +control stack will clean up accordingly. > > + > > +Parametrizing this behaviour by having the virtual device advertise a > > +reasonable reconnect is deferred to a future version of the protocol. > > + > > +Client Disconnection > > +^^^^^^^^^^^^^^^^^^^^ > > +The client disconnecting from the server primarily means that the QEMU > process > > +has exited. Currently this means that the guest is shut down so the device > is > > +no longer needed therefore the server can automatically exit. However, > there > > +can be cases where a client disconnect should not result in a server exit: > > s/disconnect/disconnection They're synonyms. > > > + > > +1) A single server serving multiple clients. > > +2) A multi-process QEMU upgrading itself step by step, which isn't yet > > + implemented. > > + > > +Therefore in order for the protocol to be forward compatible the server > should > > +take no action when the client disconnects. If anything happens to the > client > > +process the control stack will know about it and can clean up resources > > +accordingly. > > + > > +Request Retry and Response Timeout > > +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > +QEMU's VFIO retries certain operations if they fail. While this makes sense > for > > +real HW, we don't know for sure whether it makes sense for virtual > devices. A > > +failed request is a request that has been successfully sent and has been > > +responded to with an error code. Failure to send the request in the first > place > > +(e.g. because the socket is disconnected) is a different type of error > examined > > +earlier in the disconnect section. > > + > > +Defining a retry and timeout scheme if deferred to a future version of the > > s/if/is OK. > > > +protocol. > > + > > +Commands > > +-------- > > +The following table lists the VFIO message command IDs, and whether > the > > +message request is sent from the client or the server. > > + > > ++----------------------------------+---------+-------------------+ > > +| Name | Command | Request Direction | > > > ++==================================+=========+=========== > ========+ > > +| VFIO_USER_VERSION | 1 | server → client | > > ++----------------------------------+---------+-------------------+ > > +| VFIO_USER_DMA_MAP | 2 | client → server | > > ++----------------------------------+---------+-------------------+ > > +| VFIO_USER_DMA_UNMAP | 3 | client → server | > > ++----------------------------------+---------+-------------------+ > > +| VFIO_USER_DEVICE_GET_INFO | 4 | client → server | > > ++----------------------------------+---------+-------------------+ > > +| VFIO_USER_DEVICE_GET_REGION_INFO | 5 | client → server | > > ++----------------------------------+---------+-------------------+ > > +| VFIO_USER_DEVICE_GET_IRQ_INFO | 6 | client → server | > > ++----------------------------------+---------+-------------------+ > > +| VFIO_USER_DEVICE_SET_IRQS | 7 | client → server | > > ++----------------------------------+---------+-------------------+ > > +| VFIO_USER_REGION_READ | 8 | client → server | > > ++----------------------------------+---------+-------------------+ > > +| VFIO_USER_REGION_WRITE | 9 | client → server | > > ++----------------------------------+---------+-------------------+ > > +| VFIO_USER_DMA_READ | 10 | server → client | > > ++----------------------------------+---------+-------------------+ > > +| VFIO_USER_DMA_READ | 11 | server → client | > > Isn't that VFIO_USER_DMA_WRITE? Correct. > > > ++----------------------------------+---------+-------------------+ > > +| VFIO_USER_VM_INTERRUPT | 12 | server → client | > > ++----------------------------------+---------+-------------------+ > > +| VFIO_DEVICE_RESET | 13 | client → server | > > ++----------------------------------+---------+-------------------+ > > + > > +Header > > +------ > > +All messages are preceded by a 16 byte header that contains basic > information > > +about the message. The header is followed by message-specific data > described > > +in the sections below. > > + > > ++----------------+--------+-------------+ > > +| Name | Offset | Size | > > ++================+========+=============+ > > +| Device ID | 0 | 2 | > > ++----------------+--------+-------------+ > > +| Message ID | 2 | 2 | > > ++----------------+--------+-------------+ > > +| Command | 4 | 4 | > > ++----------------+--------+-------------+ > > +| Message size | 8 | 4 | > > ++----------------+--------+-------------+ > > +| Flags | 12 | 4 | > > ++----------------+--------+-------------+ > > +| | +-----+------------+ | > > +| | | Bit | Definition | | > > +| | +=====+============+ | > > +| | | 0 | Reply | | > > +| | +-----+------------+ | > > +| | | 1 | No_reply | | > > +| | +-----+------------+ | > > ++----------------+--------+-------------+ > > +| <message data> | 16 | variable | > > ++----------------+--------+-------------+ > > + > > +* Device ID identifies the destination device of the message. This field is > > + reserved when the server only supports one device per socket. > > +* Message ID identifies the message, and is used in the message > acknowledgement. > > +* Command specifies the command to be executed, listed in the > Command Table. > > +* Message size contains the size of the entire message, including the > header. > > +* Flags contains attributes of the message: > > + > > + * The reply bit differentiates request messages from reply messages. A > reply > > + message acknowledges a previous request with the same message ID. > > + * No_reply indicates that no reply is needed for this request. This is > > + commonly used when multiple requests are sent, and only the last > needs > > + acknowledgement. > > + > > +VFIO_USER_VERSION > > +----------------- > > + > > +Message format > > +^^^^^^^^^^^^^^ > > + > > ++--------------+------------------------+ > > +| Name | Value | > > ++==============+========================+ > > +| Device ID | 0 | > > ++--------------+------------------------+ > > +| Message ID | <ID> | > > ++--------------+------------------------+ > > +| Command | 1 | > > ++--------------+------------------------+ > > +| Message size | 16 + version length | > > ++--------------+------------------------+ > > +| Flags | Reply bit set in reply | > > ++--------------+------------------------+ > > +| Version | JSON byte array | > > ++--------------+------------------------+ > > + > > +This is the initial message sent by the server after the socket connection > > is > > +established. The version is in JSON format, and the following objects must > be > > +included: > > + > > ++--------------+--------+---------------------------------------------------+ > > +| Name | Type | Description > > | > > > ++==============+========+================================ > ===================+ > > +| version | object | {“major”: <number>, “minor”: <number>} > > | > > +| | | Version supported by the sender, e.g. “0.1”. > > | > > ++--------------+--------+---------------------------------------------------+ > > +| type | string | Fixed to “vfio-user”. > > | > > ++--------------+--------+---------------------------------------------------+ > > +| capabilities | array | Reserved. Can be omitted for v0.1, otherwise must > | > > +| | | be empty. > > | > > ++--------------+--------+---------------------------------------------------+ > > + > > +Versioning and Feature Support > > +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > +Upon accepting a connection, the server must send a > VFIO_USER_VERSION message > > +proposing a protocol version and a set of capabilities. The client compares > > +these with the versions and capabilities it supports and sends a > > +VFIO_USER_VERSION reply according to the following rules. > > + > > +* The major version in the reply must be the same as proposed. If the > client > > + does not support the proposed major, it closes the connection. > > +* The minor version in the reply must be equal to or less than the minor > > + version proposed. > > +* The capability list must be a subset of those proposed. If the client > > + requires a capability the server did not include, it closes the > > connection. > > +* If type is not “vfio-user”, the client closes the connection. > > + > > +The protocol major version will only change when incompatible protocol > changes > > +are made, such as changing the message format. The minor version may > change > > +when compatible changes are made, such as adding new messages or > capabilities, > > +Both the client and server must support all minor versions less than the > > +maximum minor version it supports. E.g., an implementation that > supports > > +version 1.3 must also support 1.0 through 1.2. > > + > > +VFIO_USER_DMA_MAP > > +----------------- > > + > > +VFIO_USER_DMA_UNMAP > > +------------------- > > + > > +Message Format > > +^^^^^^^^^^^^^^ > > + > > ++--------------+------------------------+ > > +| Name | Value | > > ++==============+========================+ > > +| Device ID | 0 | > > ++--------------+------------------------+ > > +| Message ID | <ID> | > > ++--------------+------------------------+ > > +| Command | MAP=2, UNMAP=3 | > > ++--------------+------------------------+ > > +| Message size | 16 + table size | > > ++--------------+------------------------+ > > +| Flags | Reply bit set in reply | > > ++--------------+------------------------+ > > +| Table | array of table entries | > > ++--------------+------------------------+ > > + > > +This message is sent by the client to the server to inform it of the guest > > +memory regions the device can access. It must be sent before the device > can > > +perform any DMA to the guest. It is normally sent directly after the > version > > +handshake is completed, but may also occur when memory is added or > subtracted > > +in the guest. > > + > > +The table is an array of the following structure. This structure is 32 > > bytes > > +in size, so the message size will be 16 + (# of table entries * 32). If a > > +region being added can be directly mapped by the server, an array of file > > +descriptors will be sent as part of the message meta-data. Each region > entry > > +will have a corresponding file descriptor. On AF_UNIX sockets, the file > > +descriptors will be passed as SCM_RIGHTS type ancillary data. > > + > > +Table entry format > > +^^^^^^^^^^^^^^^^^^ > > + > > ++-------------+--------+-------------+ > > +| Name | Offset | Size | > > ++=============+========+=============+ > > +| Address | 0 | 8 | > > ++-------------+--------+-------------+ > > +| Size | 8 | 8 | > > ++-------------+--------+-------------+ > > +| Offset | 16 | 8 | > > ++-------------+--------+-------------+ > > +| Protections | 24 | 4 | > > ++-------------+--------+-------------+ > > +| Flags | 28 | 4 | > > ++-------------+--------+-------------+ > > +| | +-----+------------+ | > > +| | | Bit | Definition | | > > +| | +=====+============+ | > > +| | | 0 | Mappable | | > > +| | +-----+------------+ | > > ++-------------+--------+-------------+ > > + > > +* Address is the base DMA address of the region. > > +* Size is the size of the region. > > +* Offset is the file offset of the region with respect to the associated > > file > > + descriptor. > > +* Protections are the region's protection attributes as encoded in > > + ``<sys/mman.h>``. > > +* Flags contain the following region attributes: > > + > > + * Mappable indicate the region can be mapped via the mmap() system > call using > > + the file descriptor provided in the message meta-data. > > + > > +VFIO_USER_DEVICE_GET_INFO > > +------------------------- > > + > > +Message format > > +^^^^^^^^^^^^^^ > > + > > ++--------------+----------------------------+ > > +| Name | Value | > > ++==============+============================+ > > +| Device ID | <ID> | > > ++--------------+----------------------------+ > > +| Message ID | <ID> | > > ++--------------+----------------------------+ > > +| Command | 4 | > > ++--------------+----------------------------+ > > +| Message size | 16 in request, 32 in reply | > > ++--------------+----------------------------+ > > +| Flags | Reply bit set in reply | > > ++--------------+----------------------------+ > > +| Device info | VFIO device info | > > ++--------------+----------------------------+ > > + > > +This message is sent by the client to the server to query for basic > information > > +about the device. Only the message header is needed in the request > message. > > +The VFIO device info structure is defined in ``<sys/vfio.h>`` (``struct > > +vfio_device_info``). > > + > > +VFIO device info format > > +^^^^^^^^^^^^^^^^^^^^^^^ > > + > > ++-------------+--------+--------------------------+ > > +| Name | Offset | Size | > > ++=============+========+==========================+ > > +| argsz | 16 | 4 | > > ++-------------+--------+--------------------------+ > > +| flags | 20 | 4 | > > ++-------------+--------+--------------------------+ > > +| | +-----+-------------------------+ | > > +| | | Bit | Definition | | > > +| | +=====+=========================+ | > > +| | | 0 | VFIO_DEVICE_FLAGS_RESET | | > > +| | +-----+-------------------------+ | > > +| | | 1 | VFIO_DEVICE_FLAGS_PCI | | > > +| | +-----+-------------------------+ | > > ++-------------+--------+--------------------------+ > > +| num_regions | 24 | 4 | > > ++-------------+--------+--------------------------+ > > +| num_irqs | 28 | 4 | > > ++-------------+--------+--------------------------+ > > + > > +* argz is reserved in vfio-user, it is only used in the ioctl() VFIO > > + implementation. > > +* flags contains the following device attributes. > > + > > + * VFIO_DEVICE_FLAGS_RESET indicates the device supports the > > + VFIO_USER_DEVICE_RESET message. > > + * VFIO_DEVICE_FLAGS_PCI indicates the device is a PCI device. > > + > > +* num_regions is the number of memory regions the device exposes. > > +* num_irqs is the number of distinct interrupt types the device supports. > > + > > +This version of the protocol only supports PCI devices. Additional devices > may > > +be supported in future versions. > > + > > +VFIO_USER_DEVICE_GET_REGION_INFO > > +-------------------------------- > > + > > +Message format > > +^^^^^^^^^^^^^^ > > + > > ++--------------+------------------+ > > +| Name | Value | > > ++==============+==================+ > > +| Device ID | <ID> | > > ++--------------+------------------+ > > +| Message ID | <ID> | > > ++--------------+------------------+ > > +| Command | 5 | > > ++--------------+------------------+ > > +| Message size | 48 + any caps | > > ++--------------+------------------+ > > +| Flags Reply | bit set in reply | > > ++--------------+------------------+ > > +| Region info | VFIO region info | > > ++--------------+------------------+ > > + > > +This message is sent by the client to the server to query for information > about > > +device memory regions. The VFIO region info structure is defined in > > +``<sys/vfio.h>`` (``struct vfio_region_info``). > > + > > +VFIO region info format > > +^^^^^^^^^^^^^^^^^^^^^^^ > > + > > ++------------+--------+------------------------------+ > > +| Name | Offset | Size | > > ++============+========+==============================+ > > +| argsz | 16 | 4 | > > ++------------+--------+------------------------------+ > > +| flags | 20 | 4 | > > ++------------+--------+------------------------------+ > > +| | +-----+-----------------------------+ | > > +| | | Bit | Definition | | > > +| | +=====+=============================+ | > > +| | | 0 | VFIO_REGION_INFO_FLAG_READ | | > > +| | +-----+-----------------------------+ | > > +| | | 1 | VFIO_REGION_INFO_FLAG_WRITE | | > > +| | +-----+-----------------------------+ | > > +| | | 2 | VFIO_REGION_INFO_FLAG_MMAP | | > > +| | +-----+-----------------------------+ | > > +| | | 3 | VFIO_REGION_INFO_FLAG_CAPS | | > > +| | +-----+-----------------------------+ | > > ++------------+--------+------------------------------+ > > +| index | 24 | 4 | > > ++------------+--------+------------------------------+ > > +| cap_offset | 28 | 4 | > > ++------------+--------+------------------------------+ > > +| size | 32 | 8 | > > ++------------+--------+------------------------------+ > > +| offset | 40 | 8 | > > ++------------+--------+------------------------------+ > > + > > +* argz is reserved in vfio-user, it is only used in the ioctl() VFIO > > + implementation. > > +* flags are attributes of the region: > > + > > + * VFIO_REGION_INFO_FLAG_READ allows client read access to the > region. > > + * VFIO_REGION_INFO_FLAG_WRITE allows client write access region. > > s/region/to the region OK. > > > + * VFIO_REGION_INFO_FLAG_MMAP specifies the client can mmap() the > region. When > > + this flag is set, the reply will include a file descriptor in its > > meta-data. > > + On AF_UNIX sockets, the file descriptors will be passed as SCM_RIGHTS > type > > + ancillary data. > > + * VFIO_REGION_INFO_FLAG_CAPS indicates additional capabilities > found in the > > + reply. > > + > > +* index is the index of memory region being queried, it is the only field > that > > + is required to be set in the request message. > > +* cap_offset describes where additional region capabilities can be found. > > + cap_offset is relative to the beginning of the VFIO region info > > structure. > > + The data structure it points is a VFIO cap header defined in > ``<sys/vfio.h>``. > > +* size is the size of the region. > > +* offset is the offset given to the mmap() system call for regions with the > > + MMAP attribute. It is also used as the base offset when mapping a VFIO > > + sparse mmap area, described below. > > + > > +VFIO Region capabilities > > +^^^^^^^^^^^^^^^^^^^^^^^^ > > +The VFIO region information can also include a capabilities list. This > > list is > > +similar to a PCI capability list - each entry has a common header that > > +identifies a capability and where the next capability in the list can be > found. > > +The VFIO capability header format is defined in ``<sys/vfio.h>`` (``struct > > +vfio_info_cap_header``). > > + > > +VFIO cap header format > > +^^^^^^^^^^^^^^^^^^^^^^ > > + > > ++---------+--------+------+ > > +| Name | Offset | Size | > > ++=========+========+======+ > > +| id | 0 | 2 | > > ++---------+--------+------+ > > +| version | 2 | 2 | > > ++---------+--------+------+ > > +| next | 4 | 4 | > > ++---------+--------+------+ > > + > > +* id is the capability identity. > > +* version is a capability-specific version number. > > +* next specifies the offset of the next capability in the capability list. > > It > > + is relative to the beginning of the VFIO region info structure. > > + > > +VFIO sparse mmap > > +^^^^^^^^^^^^^^^^ > > + > > ++------------------+----------------------------------+ > > +| Name | Value | > > ++==================+==================================+ > > +| id | VFIO_REGION_INFO_CAP_SPARSE_MMAP | > > ++------------------+----------------------------------+ > > +| version | 0x1 | > > ++------------------+----------------------------------+ > > +| next | <next> | > > ++------------------+----------------------------------+ > > +| sparse mmap info | VFIO region info sparse mmap | > > ++------------------+----------------------------------+ > > + > > +The only capability supported in this version of the protocol is for sparse > > +mmap. This capability is defined when only a subrange of the region > supports > > +direct access by the client via mmap(). The VFIO sparse mmap area is > defined in > > +``<sys/vfio.h>`` (``struct vfio_region_sparse_mmap_area``). > > + > > +VFIO region info cap sparse mmap > > +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > ++----------+--------+------+ > > +| Name | Offset | Size | > > ++==========+========+======+ > > +| nr_areas | 0 | 4 | > > ++----------+--------+------+ > > +| reserved | 4 | 4 | > > ++----------+--------+------+ > > +| offset | 8 | 8 | > > ++----------+--------+------+ > > +| size | 16 | 9 | > > ++----------+--------+------+ > > +| ... | | | > > ++----------+--------+------+ > > + > > +* nr_areas is the number of sparse mmap areas in the region. > > +* offset and size describe a single area that can be mapped by the client. > > + There will be nr_areas pairs of offset and size. The offset will be > > added to > > + the base offset given in the VFIO_USER_DEVICE_GET_REGION_INFO to > form the > > + offset argument of the subsequent mmap() call. > > + > > +The VFIO sparse mmap area is defined in ``<sys/vfio.h>`` (``struct > > +vfio_region_info_cap_sparse_mmap``). > > + > > +VFIO_USER_DEVICE_GET_IRQ_INFO > > +----------------------------- > > + > > +Message format > > +^^^^^^^^^^^^^^ > > + > > ++--------------+------------------------+ > > +| Name | Value | > > ++==============+========================+ > > +| Device ID | <ID> | > > ++--------------+------------------------+ > > +| Message ID | <ID> | > > ++--------------+------------------------+ > > +| Command | 6 | > > ++--------------+------------------------+ > > +| Message size | 32 | > > ++--------------+------------------------+ > > +| Flags | Reply bit set in reply | > > ++--------------+------------------------+ > > +| IRQ info | VFIO IRQ info | > > ++--------------+------------------------+ > > + > > +This message is sent by the client to the server to query for information > about > > +device interrupt types. The VFIO IRQ info structure is defined in > > +``<sys/vfio.h>`` (``struct vfio_irq_info``). > > + > > +VFIO IRQ info format > > +^^^^^^^^^^^^^^^^^^^^ > > + > > ++-------+--------+---------------------------+ > > +| Name | Offset | Size | > > ++=======+========+===========================+ > > +| argsz | 16 | 4 | > > ++-------+--------+---------------------------+ > > +| flags | 20 | 4 | > > ++-------+--------+---------------------------+ > > +| | +-----+--------------------------+ | > > +| | | Bit | Definition | | > > +| | +=====+==========================+ | > > +| | | 0 | VFIO_IRQ_INFO_EVENTFD | | > > +| | +-----+--------------------------+ | > > +| | | 1 | VFIO_IRQ_INFO_MASKABLE | | > > +| | +-----+--------------------------+ | > > +| | | 2 | VFIO_IRQ_INFO_AUTOMASKED | | > > +| | +-----+--------------------------+ | > > +| | | 3 | VFIO_IRQ_INFO_NORESIZE | | > > +| | +-----+--------------------------+ | > > ++-------+--------+---------------------------+ > > +| index | 24 | 4 | > > ++-------+--------+---------------------------+ > > +| count | 28 | 4 | > > ++-------+--------+---------------------------+ > > + > > +* argz is reserved in vfio-user, it is only used in the ioctl() VFIO > > + implementation. > > +* flags defines IRQ attributes: > > + > > + * VFIO_IRQ_INFO_EVENTFD indicates the IRQ type can support server > eventfd > > + signalling. > > + * VFIO_IRQ_INFO_MASKABLE indicates that the IRQ type supports the > MASK and > > + UNMASK actions in a VFIO_USER_DEVICE_SET_IRQS message. > > + * VFIO_IRQ_INFO_AUTOMASKED indicates the IRQ type masks itself > after being > > + triggered, and the client must send an UNMASK action to receive new > > + interrupts. > > + * VFIO_IRQ_INFO_NORESIZE indicates VFIO_USER_SET_IRQS operations > setup > > + interrupts as a set, and new subindexes cannot be enabled without > disabling > > + the entire type. > > + > > +* index is the index of IRQ type being queried, it is the only field that > > is > > + required to be set in the request message. > > +* count describes the number of interrupts of the queried type. > > + > > +VFIO_USER_DEVICE_SET_IRQS > > +------------------------- > > + > > +Message format > > +^^^^^^^^^^^^^^ > > + > > ++--------------+------------------------+ > > +| Name | Value | > > +| Device ID | <ID> | > > ++--------------+------------------------+ > > +| Message ID | <ID> | > > ++--------------+------------------------+ > > +| Command | 7 | > > ++--------------+------------------------+ > > +| Message size | 36 + any data | > > ++--------------+------------------------+ > > +| Flags | Reply bit set in reply | > > ++--------------+------------------------+ > > +| IRQ set | VFIO IRQ set | > > ++--------------+------------------------+ > > + > > +This message is sent by the client to the server to set actions for device > > +interrupt types. The VFIO IRQ set structure is defined in ``<sys/vfio.h>`` > > +(``struct vfio_irq_set``). > > + > > +VFIO IRQ info format > > +^^^^^^^^^^^^^^^^^^^^ > > + > > ++-------+--------+------------------------------+ > > +| Name | Offset | Size | > > ++=======+========+==============================+ > > +| argsz | 6 | 4 | > > ++-------+--------+------------------------------+ > > +| flags | 20 | 4 | > > ++-------+--------+------------------------------+ > > +| | +-----+-----------------------------+ | > > +| | | Bit | Definition | | > > +| | +=====+=============================+ | > > +| | | 0 | VFIO_IRQ_SET_DATA_NONE | | > > +| | +-----+-----------------------------+ | > > +| | | 1 | VFIO_IRQ_SET_DATA_BOOL | | > > +| | +-----+-----------------------------+ | > > +| | | 2 | VFIO_IRQ_SET_DATA_EVENTFD | | > > +| | +-----+-----------------------------+ | > > +| | | 3 | VFIO_IRQ_SET_ACTION_MASK | | > > +| | +-----+-----------------------------+ | > > +| | | 4 | VFIO_IRQ_SET_ACTION_UNMASK | | > > +| | +-----+-----------------------------+ | > > +| | | 5 | VFIO_IRQ_SET_ACTION_TRIGGER | | > > +| | +-----+-----------------------------+ | > > ++-------+--------+------------------------------+ > > +| index | 24 | 4 | > > ++-------+--------+------------------------------+ > > +| start | 28 | 4 | > > ++-------+--------+------------------------------+ > > +| count | 32 | 4 | > > ++-------+--------+------------------------------+ > > +| data | 36 | variable | > > ++-------+--------+------------------------------+ > > + > > +* argz is reserved in vfio-user, it is only used in the ioctl() VFIO > > + implementation. > > +* flags defines the action performed on the interrupt range. The DATA > flags > > + describe the data field sent in the message; the ACTION flags describe > the > > + action to be performed. The flags are mutually exclusive for both sets. > > + > > + * VFIO_IRQ_SET_DATA_NONE indicates there is no data field in the > request. The > > + action is performed unconditionally. > > + * VFIO_IRQ_SET_DATA_BOOL indicates the data field is an array of > boolean > > + bytes. The action is performed if the corresponding boolean is true. > > + * VFIO_IRQ_SET_DATA_EVENTFD indicates an array of event file > descriptors was > > + sent in the message meta-data. These descriptors will be signalled > when the > > + action defined by the action flags occurs. In AF_UNIX sockets, the > > + descriptors are sent as SCM_RIGHTS type ancillary data. > > + * VFIO_IRQ_SET_ACTION_MASK indicates a masking event. It can be > used with > > + VFIO_IRQ_SET_DATA_BOOL or VFIO_IRQ_SET_DATA_NONE to mask an > interrupt, or > > + with VFIO_IRQ_SET_DATA_EVENTFD to generate an event when the > guest masks > > + the interrupt. > > + * VFIO_IRQ_SET_ACTION_UNMASK indicates an unmasking event. It can > be used > > + with VFIO_IRQ_SET_DATA_BOOL or VFIO_IRQ_SET_DATA_NONE to > unmask an > > + interrupt, or with VFIO_IRQ_SET_DATA_EVENTFD to generate an event > when the > > + guest unmasks the interrupt. > > + * VFIO_IRQ_SET_ACTION_TRIGGER indicates a triggering event. It can be > used > > + with VFIO_IRQ_SET_DATA_BOOL or VFIO_IRQ_SET_DATA_NONE to > trigger an > > + interrupt, or with VFIO_IRQ_SET_DATA_EVENTFD to generate an event > when the > > + guest triggers the interrupt. > > + > > +* index is the index of IRQ type being setup. > > +* start is the start of the subindex being set. > > +* count describes the number of sub-indexes being set. As a special case, > a > > + count of 0 with data flags of VFIO_IRQ_SET_DATA_NONE disables all > interrupts > > + of the index data is an optional field included when the > > + VFIO_IRQ_SET_DATA_BOOL flag is present. It contains an array of > booleans > > + that specify whether the action is to be performed on the corresponding > > + index. It's used when the action is only performed on a subset of the > range > > + specified. > > + > > +Not all interrupt types support every combination of data and action flags. > > +The client must know the capabilities of the device and IRQ index before it > > +sends a VFIO_USER_DEVICE_SET_IRQ message. > > + > > +Read and Write Operations > > +------------------------- > > + > > +Not all I/O operations between the client and server can be done via > direct > > +access of memory mapped with an mmap() call. In these cases, the client > and > > +server use messages sent over the socket. It is expected that these > operations > > +will have lower performance than direct access. > > + > > +The client can access device memory with VFIO_USER_REGION_READ and > > +VFIO_USER_REGION_WRITE requests. These share a common data > structure that > > +appears after the 16 byte message header. > > + > > +REGION Read/Write Data > > +^^^^^^^^^^^^^^^^^^^^^^ > > + > > ++--------+--------+----------+ > > +| Name | Offset | Size | > > ++========+========+==========+ > > +| Offset | 16 | 8 | > > ++--------+--------+----------+ > > +| Region | 24 | 4 | > > ++--------+--------+----------+ > > +| Count | 28 | 4 | > > ++--------+--------+----------+ > > +| Data | 32 | variable | > > ++--------+--------+----------+ > > + > > +* Offset into the region being accessed. > > +* Region is the index of the region being accessed. > > +* Count is the size of the data to be transferred. > > +* Data is the data to be read or written. > > + > > +The server can access guest memory with VFIO_USER_DMA_READ and > > +VFIO_USER_DMA_WRITE messages. These also share a common data > structure that > > +appears after the 16 byte message header. > > + > > +DMA Read/Write Data > > +^^^^^^^^^^^^^^^^^^^ > > + > > ++---------+--------+----------+ > > +| Name | Offset | Size | > > ++=========+========+==========+ > > +| Address | 16 | 8 | > > ++---------+--------+----------+ > > +| Count | 24 | 4 | > > ++---------+--------+----------+ > > +| Data | 28 | variable | > > ++---------+--------+----------+ > > + > > +* Address is the area of guest memory being accessed. This address must > have > > + been exported to the server with a VFIO_USER_DMA_MAP message. > > +* Count is the size of the data to be transferred. > > +* Data is the data to be read or written. > > + > > +Address and count can also be accessed as ``struct iovec`` from > ``<sys/uio.h>``. > > + > > +VFIO_USER_REGION_READ > > +--------------------- > > + > > +Message format > > +^^^^^^^^^^^^^^ > > + > > ++--------------+------------------------+ > > +| Name | Value | > > ++==============+========================+ > > +| Device ID | <ID> | > > ++--------------+------------------------+ > > +| Message ID | <ID> | > > ++--------------+------------------------+ > > +| Command | 8 | > > ++--------------+------------------------+ > > +| Message size | 32 + data size | > > ++--------------+------------------------+ > > +| Flags Reply | bit set in reply | > > ++--------------+------------------------+ > > +| Read info | REGION read/write data | > > ++--------------+------------------------+ > > + > > +This request is sent from the client to the server to read from device > memory. > > +In the request messages, there will be no data, and the count field will be > the > > +amount of data to be read. The reply will include the data read, and its > count > > +field will be the amount of data read. > > + > > +VFIO_USER_REGION_WRITE > > +---------------------- > > + > > +Message format > > +^^^^^^^^^^^^^^ > > + > > ++--------------+------------------------+ > > +| Name | Value | > > ++==============+========================+ > > +| Device ID | <ID> | > > ++--------------+------------------------+ > > +| Message ID | <ID> | > > ++--------------+------------------------+ > > +| Command | 9 | > > ++--------------+------------------------+ > > +| Message size | 32 + data size | > > ++--------------+------------------------+ > > +| Flags | Reply bit set in reply | > > ++--------------+------------------------+ > > +| Write info | REGION read write data | > > ++--------------+------------------------+ > > + > > +This request is sent from the client to the server to write to device > memory. > > +The request message will contain the data to be written, and its count > field > > +will contain the amount of write data. The count field in the reply will be > > +zero. > > + > > +VFIO_USER_DMA_READ > > +------------------ > > + > > +Message format > > +^^^^^^^^^^^^^^ > > + > > ++--------------+---------------------+ > > +| Name | Value | > > ++==============+=====================+ > > +| Device ID | <ID> | > > ++--------------+---------------------+ > > +| Message ID | <ID> | > > ++--------------+---------------------+ > > +| Command | 10 | > > ++--------------+---------------------+ > > +| Message size | 28 + data size | > > ++--------------+---------------------+ > > +| Flags Reply | bit set in reply | > > ++--------------+---------------------+ > > +| DMA info | DMA read/write data | > > ++--------------+---------------------+ > > + > > +This request is sent from the server to the client to read from guest > memory. > > +In the request messages, there will be no data, and the count field will be > the > > +amount of data to be read. The reply will include the data read, and its > count > > +field will be the amount of data read. > > + > > +VFIO_USER_DMA_WRITE > > +------------------- > > + > > +Message format > > +^^^^^^^^^^^^^^ > > + > > ++--------------+------------------------+ > > +| Name | Value | > > ++==============+========================+ > > +| Device ID | <ID> | > > ++--------------+------------------------+ > > +| Message ID | <ID> | > > ++--------------+------------------------+ > > +| Command | 11 | > > ++--------------+------------------------+ > > +| Message size | 28 + data size | > > ++--------------+------------------------+ > > +| Flags | Reply bit set in reply | > > ++--------------+------------------------+ > > +| DMA info | DMA read/write data | > > ++--------------+------------------------+ > > + > > +This request is sent from the server to the client to write to guest > memory. > > +The request message will contain the data to be written, and its count > field > > +will contain the amount of write data. The count field in the reply will be > > +zero. > > + > > +VFIO_USER_VM_INTERRUPT > > +---------------------- > > + > > +Message format > > +^^^^^^^^^^^^^^ > > + > > ++----------------+------------------------+ > > +| Name | Value | > > ++================+========================+ > > +| Device ID | <ID> | > > ++----------------+------------------------+ > > +| Message ID | <ID> | > > ++----------------+------------------------+ > > +| Command | 12 | > > ++----------------+------------------------+ > > +| Message size | 24 | > > ++----------------+------------------------+ > > +| Flags | Reply bit set in reply | > > ++----------------+------------------------+ > > +| Interrupt info | <interrupt> | > > ++----------------+------------------------+ > > + > > +This request is sent from the server to the client to signal the device has > > +raised an interrupt. > > + > > +Interrupt info format > > +^^^^^^^^^^^^^^^^^^^^^ > > + > > ++----------+--------+------+ > > +| Name | Offset | Size | > > ++==========+========+======+ > > +| Index | 16 | 4 | > > ++----------+--------+------+ > > +| Subindex | 20 | 4 | > > ++----------+--------+------+ > > + > > +* Index is the interrupt index; it is the same value used in > VFIO_USER_SET_IRQS. > > +* Subindex is relative to the index, e.g., the vector number used in PCI > MSI/X > > + type interrupts. > > + > > +VFIO_USER_DEVICE_RESET > > +---------------------- > > + > > +Message format > > +^^^^^^^^^^^^^^ > > + > > ++--------------+------------------------+ > > +| Name | Value | > > ++==============+========================+ > > +| Device ID | <ID> | > > ++--------------+------------------------+ > > +| Message ID | <ID> | > > ++--------------+------------------------+ > > +| Command | 13 | > > ++--------------+------------------------+ > > +| Message size | 16 | > > ++--------------+------------------------+ > > +| Flags | Reply bit set in reply | > > ++--------------+------------------------+ > > + > > +This request is sent from the client to the server to reset the device. > > + > > +Appendices > > +========== > > + > > +Unused VFIO ioctl() commands > > +---------------------------- > > + > > +The following commands must be handled by the client and not sent to > the server: > > + > > +* VFIO_GET_API_VERSION > > +* VFIO_CHECK_EXTENSION > > +* VFIO_SET_IOMMU > > +* VFIO_GROUP_GET_STATUS > > +* VFIO_GROUP_SET_CONTAINER > > +* VFIO_GROUP_UNSET_CONTAINER > > +* VFIO_GROUP_GET_DEVICE_FD > > +* VFIO_IOMMU_GET_INFO > > + > > +However, once support for live migration for VFIO devices is finalized > some > > +of the above commands might have to be handled by the client. This will > be > > +addressed in a future protocol version. > > + > > +Live Migration > > +-------------- > > +Currently live migration is not supported for devices passed through via > VFIO, > > +therefore it is not supported for VFIO-over-socket, either. This is being > > +actively worked on in the "Add migration support for VFIO devices" (v25) > patch > > +series. > > + > > +VFIO groups and containers > > +^^^^^^^^^^^^^^^^^^^^^^^^^^ > > + > > +The current VFIO implementation includes group and container idioms > that > > +describe how a device relates to the host IOMMU. In the VFIO over > socket > > +implementation, the IOMMU is implemented in SW by the client, and isn't > visible > > +to the server. The simplest idea is for the client is to put each device > > into > > > s/is for/for OK. > > > +its own group and container. John and Thanos