Hi Akihiko,
> Subject: Re: [PATCH v4 8/8] virtio-gpu-dmabuf: Create dmabuf for blobs
> associated with VFIO devices
> >>> In addition to memfd, a blob resource can also have its backing
> >>> storage in a VFIO device region. Since, there is no effective way
> >>> to determine where the backing storage is located, we first try to
> >>> create a dmabuf assuming it is in memfd. If that fails, we try to
> >>> create a dmabuf assuming it is in VFIO device region.
> >>>
> >>> So, we first call virtio_gpu_create_udmabuf() to check if the blob's
> >>> backing storage is located in a memfd or not. If it is not, we call
> >>> vfio_create_dmabuf() which identifies the right VFIO device and
> >>> eventually invokes the vfio_device_create_dmabuf_fd() API to have
> >>> a dmabuf fd created.
> >>>
> >>> Note that in virtio_gpu_remap_dmabuf(), we first try to test if
> >>> the VFIO dmabuf exporter supports mmap or not. If it doesn't, we
> >>> use the VFIO device fd directly to create the CPU mapping.
> >>>
> >>> While at it, remove the unnecessary rcu_read_lock/rcu_read_unlock
> >>> from virtio_gpu_create_udmabuf() and also replace warn_report()
> >>> with qemu_log_mask().
> >>>
> >>> Cc: Marc-André Lureau <[email protected]>
> >>> Cc: Alex Bennée <[email protected]>
> >>> Cc: Akihiko Odaki <[email protected]>
> >>> Cc: Dmitry Osipenko <[email protected]>
> >>> Cc: Alex Williamson <[email protected]>
> >>> Cc: Cédric Le Goater <[email protected]>
> >>> Signed-off-by: Vivek Kasireddy <[email protected]>
> >>> ---
> >>> hw/display/Kconfig | 5 ++
> >>> hw/display/virtio-gpu-dmabuf.c | 117
> >> ++++++++++++++++++++++++++++++---
> >>> 2 files changed, 113 insertions(+), 9 deletions(-)
> >>>
> >>> diff --git a/hw/display/Kconfig b/hw/display/Kconfig
> >>> index 1e95ab28ef..0d090f25f5 100644
> >>> --- a/hw/display/Kconfig
> >>> +++ b/hw/display/Kconfig
> >>> @@ -106,6 +106,11 @@ config VIRTIO_VGA
> >>> depends on VIRTIO_PCI
> >>> select VGA
> >>>
> >>> +config VIRTIO_GPU_VFIO_BLOB
> >>> + bool
> >>> + default y
> >>> + depends on VFIO
> >>> +
> >>> config VHOST_USER_GPU
> >>> bool
> >>> default y
> >>> diff --git a/hw/display/virtio-gpu-dmabuf.c b/hw/display/virtio-gpu-
> >> dmabuf.c
> >>> index c34d4c85bc..2dfe70d7eb 100644
> >>> --- a/hw/display/virtio-gpu-dmabuf.c
> >>> +++ b/hw/display/virtio-gpu-dmabuf.c
> >>> @@ -18,6 +18,7 @@
> >>> #include "ui/console.h"
> >>> #include "hw/virtio/virtio-gpu.h"
> >>> #include "hw/virtio/virtio-gpu-pixman.h"
> >>> +#include "hw/vfio/vfio-device.h"
> >>> #include "trace.h"
> >>> #include "system/ramblock.h"
> >>> #include "system/hostmem.h"
> >>> @@ -27,6 +28,38 @@
> >>> #include "standard-headers/linux/udmabuf.h"
> >>> #include "standard-headers/drm/drm_fourcc.h"
> >>>
> >>> +static void vfio_create_dmabuf(struct virtio_gpu_simple_resource
> >> *res)
> >>> +{
> >>> +#if defined(VIRTIO_GPU_VFIO_BLOB)
> >>> + VFIODevice *vbasedev;
> >>> + RAMBlock *first_rb;
> >>> + ram_addr_t offset;
> >>> +
> >>> + first_rb = qemu_ram_block_from_host(res->iov[0].iov_base, false,
> >> &offset);
> >>> + if (!first_rb) {
> >>> + qemu_log_mask(LOG_GUEST_ERROR,
> >>> + "%s: Could not find ramblock\n", __func__);
> >>> + return;
> >>> + }
> >>> +
> >>> + vbasedev = vfio_device_lookup(first_rb->mr);
> >>> + if (!vbasedev) {
> >>> + qemu_log_mask(LOG_GUEST_ERROR,
> >>> + "%s: No VFIO device found to create dmabuf from\n",
> >>> + __func__);
> >>> + return;
> >>> + }
> >>> +
> >>> + res->dmabuf_fd = vfio_device_create_dmabuf_fd(vbasedev,
> >>> + res->iov,
> >>> res->iov_cnt);
> >>> + if (res->dmabuf_fd < 0) {
> >>> + qemu_log_mask(LOG_GUEST_ERROR,
> >>> + "%s: VFIO_DEVICE_FEATURE_DMA_BUF: %s\n",
> >>> + __func__, strerror(errno));
> >>> + }
> >>
> >> This API design of vfio_device_create_dmabuf_fd() has some room of
> >> improvements.
> >>
> >> First, this function requires to pass VFIODevice * which is derived from
> >> res->iov[0]. Since it is getting res->iov[0] anyway, such a parameter is
> >> redundant. vfio_device_create_dmabuf_fd() can call
> >> vfio_device_lookup()
> >> itself instead.
> > Yeah, I agree. I'll move the call to lookup into
> vfio_device_create_dmabuf_fd().
> >
> >>
> >> The second problem is the error handling. The error may not be caused
> by
> >> VFIO_DEVICE_FEATURE_DMA_BUF but may be caused by the guest
> >> passing
> >> regions that do not belong to one VFIO device. The two cases should
> be
> >> distinguished for a correct message.
> > Ok, I'll add separate error messages.
> >
> >>
> >>> +#endif
> >>> +}
> >>> +
> >>> static void virtio_gpu_create_udmabuf(struct
> >> virtio_gpu_simple_resource *res)
> >>> {
> >>> struct udmabuf_create_list *list;
> >>> @@ -43,10 +76,7 @@ static void virtio_gpu_create_udmabuf(struct
> >> virtio_gpu_simple_resource *res)
> >>> sizeof(struct udmabuf_create_item) *
> >>> res->iov_cnt);
> >>>
> >>> for (i = 0; i < res->iov_cnt; i++) {
> >>> - rcu_read_lock();
> >>> rb = qemu_ram_block_from_host(res->iov[i].iov_base, false,
> >> &offset);
> >>> - rcu_read_unlock();
> >>> -
> >>> if (!rb || rb->fd < 0) {
> >>> g_free(list);
> >>> return;
> >>> @@ -62,17 +92,84 @@ static void virtio_gpu_create_udmabuf(struct
> >> virtio_gpu_simple_resource *res)
> >>>
> >>> res->dmabuf_fd = ioctl(udmabuf, UDMABUF_CREATE_LIST, list);
> >>> if (res->dmabuf_fd < 0) {
> >>> - warn_report("%s: UDMABUF_CREATE_LIST: %s", __func__,
> >>> - strerror(errno));
> >>> + qemu_log_mask(LOG_GUEST_ERROR,
> >>> + "%s: UDMABUF_CREATE_LIST: %s\n",
> >>> + __func__, strerror(errno));
> >>> }
> >>> g_free(list);
> >>> }
> >>
> >> These changes of virtio_gpu_create_udmabuf() are independent so
> >> should
> >> go into separate patches with explanation.
> > I called these out in the commit message and included them as they
> were
> > trivial but I guess I could move them into separate patches.
>
> As docs/devel/submitting-a-patch.rst says,
> > A commit message that mentions "Also, ..." is often a good candidate
> > for splitting into multiple patches.
>
> I think there will be another round of review so I suggest taking the
> chance to do so.
Ok, I'll split them into separate patches.
>
> >
> >>
> >>>
> >>> +static void *vfio_dmabuf_mmap(struct virtio_gpu_simple_resource
> >> *res)
> >>> +{
> >>> + struct vfio_region_info *info = NULL;
> >>> + VFIODevice *vbasedev = NULL;
> >>> + ram_addr_t offset, len = 0;
> >>> + RAMBlock *first_rb, *rb;
> >>> + void *map, *submap;
> >>> + int i, ret = -1;
> >>> +
> >>> + first_rb = qemu_ram_block_from_host(res->iov[0].iov_base, false,
> >> &offset);
> >>> + if (!first_rb) {
> >>> + return MAP_FAILED;
> >>> + }
> >>> +#if defined(VIRTIO_GPU_VFIO_BLOB)
> >>> + vbasedev = vfio_device_lookup(first_rb->mr);
> >>> +#endif
> >>> + if (!vbasedev) {
> >>> + return MAP_FAILED;
> >>> + }
> >>> +
> >>> + /*
> >>> + * We first reserve a contiguous chunk of address space for the
> entire
> >>> + * dmabuf, then replace it with smaller mappings that correspond
> to
> >> the
> >>> + * individual segments of the dmabuf.
> >>> + */
> >>> + map = mmap(NULL, res->blob_size, PROT_READ, MAP_SHARED,
> >> vbasedev->fd, 0);
> >>
> >> It should have PROT_NONE instead of PROT_READ to prevent
> accidental
> >> accesses to the placeholder mapping.
> >> Likewise, -1 should be passed instead of vbasedev->fd so that the
> >> placeholder mapping won't result in a mapping to real device.
> > I am OK with both suggestions. However, I don't see any problem if the
> > placeholder mapping is associated with the real device. In other words,
> > what specific problem do you see if we keep vbasedev->fd instead of -1?
>
> These two suggestions are to avoid accidents. If the code is right,
> there will be no accesses to the head of the device memory. But if there
> is something wrong with it, these two changes will be safe guards. In
> other words, with these two changes, you will not have such an
> accidental access unless all of the following three happen:
> 1) Accidentally accessing the mapped page.
> 2) The protection is set something else PROT_NONE.
> 3) The mapping points to the device.
>
> Each of these conditions make accidents rarer to occur.
Sounds good. I'll add these two suggestions.
>
> >
> >>
> >>> + if (map == MAP_FAILED) {
> >>> + return map;
> >>> + }
> >>> +
> >>> + for (i = 0; i < res->iov_cnt; i++) {
> >>> + rb = qemu_ram_block_from_host(res->iov[i].iov_base, false,
> >> &offset);
> >>> + if (rb != first_rb) {
> >>> + goto err;
> >>> + }
> >>> +#if defined(VIRTIO_GPU_VFIO_BLOB)
> >>> + ret = vfio_get_region_index_from_mr(rb->mr);
> >>> + if (ret < 0) {
> >>> + goto err;
> >>> + }
> >>> +
> >>> + ret = vfio_device_get_region_info(vbasedev, ret, &info);
> >>
> >> This function can take rb->mr and call vfio_device_lookup() and
> >> vfio_get_region_index_from_mr() internally to simplify the API as
> >> similar can be done for vfio_device_create_dmabuf_fd().
> > Right, but vfio_device_get_region_info() is an already existing API
> > that is called from various places that I am trying to leverage. I could
> > create another API to wrap the calls to
> vfio_get_region_index_from_mr()
> > and vfio_device_get_region_info() but not sure if that makes sense
> given
> > that it would only have one caller.
>
> vfio_device_get_region_info() should be regarded as an internal function
> of VFIO.
>
> Thinking of that, the current API design that relies on
> vfio_device_lookup() to expose VFIODevice * is not optimal. We never
> want virtio-gpu or anyone else VFIO to mess up with it using functions
> declared in include/hw/vfio/vfio-device.h. Likewise, exposing
> vbasedev->fd introduces bunch of hazards.
>
> I think this entire function is better moved to VFIO for better
> encapsulation. I also think it should be placed close to
> vfio_device_create_dmabuf_fd(). These functions are quite similar so are
> better reviewed togather. Any future event to change either of the
> function is also likely to require a change to the other, so placing
> them nearby will be convenient.
Ok, I'll try to rename and move this function to VFIO.
Thanks,
Vivek
>
> Regards,
> Akihiko Odaki