On Tue, Jun 4, 2024 at 8:54 PM Stefan Hajnoczi <stefa...@redhat.com> wrote:

> On Thu, May 30, 2024 at 05:22:23PM +0200, Albert Esteve wrote:
> > Add SHMEM_MAP/UNMAP requests to vhost-user.
> >
> > This request allows backends to dynamically map
> > fds into a shared memory region indentified by
>
> Please call this "VIRTIO Shared Memory Region" everywhere (code,
> vhost-user spec, commit description, etc) so it's clear that this is not
> about vhost-user shared memory tables/regions.
>
> > its `shmid`. Then, the fd memory is advertised
> > to the frontend through a BAR+offset, so it can
> > be read by the driver while its valid.
>
> Why is a PCI BAR mentioned here? vhost-user does not know about the
> VIRTIO Transport (e.g. PCI) being used. It's the frontend's job to
> report VIRTIO Shared Memory Regions to the driver.
>
>
I will remove PCI BAR, as it is true that it depends on the
transport. I was trying to explain that the driver
will use the shm_base + shm_offset to access
the mapped memory.


> >
> > Then, the backend can munmap the memory range
> > in a given shared memory region (again, identified
> > by its `shmid`), to free it. After this, the
> > region becomes private and shall not be accessed
> > by the frontend anymore.
>
> What does "private" mean?
>
> The frontend must mmap PROT_NONE to reserve the virtual memory space
> when no fd is mapped in the VIRTIO Shared Memory Region. Otherwise an
> unrelated mmap(NULL, ...) might use that address range and the guest
> would have access to the host memory! This is a security issue and needs
> to be mentioned explicitly in the spec.
>

I mentioned private because it changes the mapping from MAP_SHARED
to MAP_PRIVATE. I will highlight PROT_NONE instead.


>
> >
> > Initializing the memory region is reponsiblity
> > of the PCI device that will using it.
>
> What does this mean?
>

The MemoryRegion is declared in `struct VirtIODevice`,
but it is uninitialized in this commit. So I was trying to say
that the initialization will happen in, e.g., vhost-user-gpu-pci.c
with something like `memory_region_init` , and later `pci_register_bar`.

I am testing that part still.


>
> >
> > Signed-off-by: Albert Esteve <aest...@redhat.com>
> > ---
> >  docs/interop/vhost-user.rst |  23 ++++++++
> >  hw/virtio/vhost-user.c      | 106 ++++++++++++++++++++++++++++++++++++
> >  hw/virtio/virtio.c          |   2 +
> >  include/hw/virtio/virtio.h  |   3 +
> >  4 files changed, 134 insertions(+)
>
> Two missing pieces:
>
> 1. QEMU's --device vhost-user-device needs a way to enumerate VIRTIO
> Shared Memory Regions from the vhost-user backend. vhost-user-device is
> a generic vhost-user frontend without knowledge of the device type, so
> it doesn't know what the valid shmids are and what size the regions
> have.
>

Ok. I was assuming that if a device (backend) makes a request without a
valid shmid or not enough size in the region to perform the mmap, would
just fail in the VMM performing the actual mmap operation. So it would
not necessarily need to know about valid shmids or region sizes.


>
> 2. Other backends don't see these mappings. If the guest submits a vring
> descriptor referencing a mapping to another backend, then that backend
> won't be able to access this memory. David Gilbert hit this problem when
> working on the virtiofs DAX Window. Either the frontend needs to forward
> all SHMAP_MAP/UNMAP messages to the other backends (inefficient and
> maybe racy!) or a new "memcpy" message is needed as a fallback for when
> vhost-user memory region translation fails.
>

Ok. In which scenario would another device want to access the same mapping?
If it is for a shared VIRTIO object, the device that receives the dmabuf fd
could
just do a new mapping in its VIRTIO shared memory region.


>
> >
> > diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> > index d8419fd2f1..3caf2a290c 100644
> > --- a/docs/interop/vhost-user.rst
> > +++ b/docs/interop/vhost-user.rst
> > @@ -1859,6 +1859,29 @@ is sent by the front-end.
> >    when the operation is successful, or non-zero otherwise. Note that if
> the
> >    operation fails, no fd is sent to the backend.
> >
> > +``VHOST_USER_BACKEND_SHMEM_MAP``
> > +  :id: 9
> > +  :equivalent ioctl: N/A
> > +  :request payload: fd and ``struct VhostUserMMap``
> > +  :reply payload: N/A
> > +
> > +  This message can be submitted by the backends to advertise a new
> mapping
> > +  to be made in a given shared memory region. Upon receiving the
> message,
> > +  QEMU will mmap the given fd into the shared memory region with the
>
> s/QEMU/the frontend/
>
> > +  requested ``shmid``. A reply is generated indicating whether mapping
> > +  succeeded.
>
> Please document whether mapping over an existing mapping is allowed. I
> think it should be allowed because it might be useful to atomically
> update a mapping without a race where the driver sees unmapped memory.
>
>
So in my understanding, the frontend (driver) in the guest would initiate
the
mmap/munmap by sending request to the backend (vhost-user device).
Then the vhost-user device sends a request to the VMM to perform the
mapping. We could enforce an ACK to ensure that the mmap operation finished
before the vhost-user device responds to the driver, and thus avoid
races. This way, we support only the simple usecase of not allowing
mmaps over an already mapped region.


> If mapping over an existing mapping is allowed, does the new mapping
> need to cover the old mapping exactly, or can it span multiple previous
> mappings or a subset of an existing mapping?
>
> From a security point of view we need to be careful here. A potentially
> untrusted backend process now has the ability to mmap an arbitrary file
> descriptor into the frontend process. The backend can cause
> denial of service by creating many small mappings to exhaust the OS
> limits on virtual memory areas. The backend can map memory to use as
> part of a security compromise, so we need to be sure the virtual memory
> addresses are not leaked to the backend and only read/write page
> permissions are available.
>

Right, security from untrusted backends is usally the hardest part to me.
But vhost-user devices do only see shm_offset, so this should be safe, no?


>
> > +``VHOST_USER_BACKEND_SHMEM_UNMAP``
> > +  :id: 10
> > +  :equivalent ioctl: N/A
> > +  :request payload: ``struct VhostUserMMap``
> > +  :reply payload: N/A
> > +
> > +  This message can be submitted by the backends so that QEMU un-mmap
>
> s/QEMU/the frontend/
>

This is probably my bad, but I really thought the frontend is the driver.
So frontend/backend as alternative terms for vhost-user driver/device.
And then here we would keep QEMU or use VMM instead?


>
> > +  a given range (``offset``, ``len``) in the shared memory region with
> the
> > +  requested ``shmid``.
>
> Does the range need to correspond to a previously-mapped VhostUserMMap
> or can it cross multiple VhostUserMMaps, be a subset of a VhostUserMMap,
> etc?
>

I would prefer to keep it simple and disallow mapping over a
previously-mapped
region. The range need to correspond to a valid (within size bound), free,
memory region, or else the request will fail. I will add that to the text.

Nonetheless, I am open to discuss other options.


>
> > +  A reply is generated indicating whether unmapping succeeded.
> > +
> >  .. _reply_ack:
> >
> >  VHOST_USER_PROTOCOL_F_REPLY_ACK
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index cdf9af4a4b..9526b9d07f 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -115,6 +115,8 @@ typedef enum VhostUserBackendRequest {
> >      VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
> >      VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
> >      VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
> > +    VHOST_USER_BACKEND_SHMEM_MAP = 9,
> > +    VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
> >      VHOST_USER_BACKEND_MAX
> >  }  VhostUserBackendRequest;
> >
> > @@ -192,6 +194,23 @@ typedef struct VhostUserShared {
> >      unsigned char uuid[16];
> >  } VhostUserShared;
> >
> > +/* For the flags field of VhostUserMMap */
> > +#define VHOST_USER_FLAG_MAP_R (1u << 0)
> > +#define VHOST_USER_FLAG_MAP_W (1u << 1)
> > +
> > +typedef struct {
> > +    /* Shared memory region ID */
> > +    uint8_t shmid;
>
> There is a hole (padding) in the struct since the following fields are
> naturally aligned to 8 bytes. I suggest moving shmid to the end to
> reduce the chance of information leaks if padding is not zeroed.
>

I see. I can move it to the end of the struct or add a padding field in
between. I'll do what you suggested, as it sound like the simplest solution.


>
> > +    /* File offset */
> > +    uint64_t fd_offset;
> > +    /* Offset within the shared memory region */
> > +    uint64_t shm_offset;
> > +    /* Size of region to map */
>
> To avoid giving "region" additional meanings:
>
> s/Size of the region to map/Size of the mapping/
>

Ok, I will change it in the next drop. Probably will keep the RFC for
at least one more patch, seeing that there are a few things I wasn't
correctly considering.

Thanks for all the feedback!


>
> > +    uint64_t len;
> > +    /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
> > +    uint64_t flags;
> > +} VhostUserMMap;
> > +
> >  typedef struct {
> >      VhostUserRequest request;
> >
> > @@ -224,6 +243,7 @@ typedef union {
> >          VhostUserInflight inflight;
> >          VhostUserShared object;
> >          VhostUserTransferDeviceState transfer_state;
> > +        VhostUserMMap mmap;
> >  } VhostUserPayload;
> >
> >  typedef struct VhostUserMsg {
> > @@ -1748,6 +1768,85 @@
> vhost_user_backend_handle_shared_object_lookup(struct vhost_user *u,
> >      return 0;
> >  }
> >
> > +static int
> > +vhost_user_backend_handle_shmem_map(struct vhost_dev *dev,
> > +                                  VhostUserMMap *vu_mmap,
> > +                                  int fd)
> > +{
> > +    void *addr = 0;
> > +    MemoryRegion *mr = NULL;
> > +
> > +    if (fd < 0) {
> > +        error_report("Bad fd for map");
> > +        return -EBADF;
> > +    }
> > +
> > +    if (!dev->vdev->shmem_list ||
> > +        dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
> > +        error_report("Shared memory region at "
> > +                     "ID %d unitialized", vu_mmap->shmid);
> > +        return -EFAULT;
> > +    }
> > +
> > +    mr = &dev->vdev->shmem_list[vu_mmap->shmid];
> > +
> > +    if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
> > +        (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
> > +        error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
> > +                     vu_mmap->shm_offset, vu_mmap->len);
> > +        return -EFAULT;
> > +    }
> > +
> > +    void *shmem_ptr = memory_region_get_ram_ptr(mr);
> > +
> > +    addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
> > +        ((vu_mmap->flags & VHOST_USER_FLAG_MAP_R) ? PROT_READ : 0) |
> > +        ((vu_mmap->flags & VHOST_USER_FLAG_MAP_W) ? PROT_WRITE : 0),
> > +        MAP_SHARED | MAP_FIXED, fd, vu_mmap->fd_offset);
> > +    if (addr == MAP_FAILED) {
> > +        error_report("Failed to mmap mem fd");
> > +        return -EFAULT;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +static int
> > +vhost_user_backend_handle_shmem_unmap(struct vhost_dev *dev,
> > +                                    VhostUserMMap *vu_mmap)
> > +{
> > +    void *addr = 0;
> > +    MemoryRegion *mr = NULL;
> > +
> > +    if (!dev->vdev->shmem_list ||
> > +        dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
> > +        error_report("Shared memory region at "
> > +                     "ID %d unitialized", vu_mmap->shmid);
> > +        return -EFAULT;
> > +    }
> > +
> > +    mr = &dev->vdev->shmem_list[vu_mmap->shmid];
> > +
> > +    if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
> > +        (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
> > +        error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
> > +                     vu_mmap->shm_offset, vu_mmap->len);
> > +        return -EFAULT;
> > +    }
> > +
> > +    void *shmem_ptr = memory_region_get_ram_ptr(mr);
> > +
> > +    addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
> > +                PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1,
> 0);
> > +
> > +    if (addr == MAP_FAILED) {
> > +        error_report("Failed to unmap memory");
> > +        return -EFAULT;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> >  static void close_backend_channel(struct vhost_user *u)
> >  {
> >      g_source_destroy(u->backend_src);
> > @@ -1816,6 +1915,13 @@ static gboolean backend_read(QIOChannel *ioc,
> GIOCondition condition,
> >          ret =
> vhost_user_backend_handle_shared_object_lookup(dev->opaque, ioc,
> >                                                               &hdr,
> &payload);
> >          break;
> > +    case VHOST_USER_BACKEND_SHMEM_MAP:
> > +        ret = vhost_user_backend_handle_shmem_map(dev, &payload.mmap,
> > +                                                fd ? fd[0] : -1);
> > +        break;
> > +    case VHOST_USER_BACKEND_SHMEM_UNMAP:
> > +        ret = vhost_user_backend_handle_shmem_unmap(dev, &payload.mmap);
> > +        break;
> >      default:
> >          error_report("Received unexpected msg type: %d.", hdr.request);
> >          ret = -EINVAL;
> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > index 893a072c9d..59596370ec 100644
> > --- a/hw/virtio/virtio.c
> > +++ b/hw/virtio/virtio.c
> > @@ -3264,6 +3264,8 @@ void virtio_init(VirtIODevice *vdev, uint16_t
> device_id, size_t config_size)
> >              virtio_vmstate_change, vdev);
> >      vdev->device_endian = virtio_default_endian();
> >      vdev->use_guest_notifier_mask = true;
> > +    vdev->shmem_list = NULL;
> > +    vdev->n_shmem_regions = 0;
> >  }
> >
> >  /*
> > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> > index 7d5ffdc145..34bec26593 100644
> > --- a/include/hw/virtio/virtio.h
> > +++ b/include/hw/virtio/virtio.h
> > @@ -165,6 +165,9 @@ struct VirtIODevice
> >       */
> >      EventNotifier config_notifier;
> >      bool device_iotlb_enabled;
> > +    /* Shared memory region for vhost-user mappings. */
> > +    MemoryRegion *shmem_list;
> > +    int n_shmem_regions;
> >  };
> >
> >  struct VirtioDeviceClass {
> > --
> > 2.44.0
> >
>

Reply via email to