On Thu, Mar 28, 2024 at 2:01 AM Vivek Kasireddy <vivek.kasire...@intel.com> wrote:
> Having virtio-gpu import scanout buffers (via prime) from other > devices means that we'd be adding a head to headless GPUs assigned > to a Guest VM or additional heads to regular GPU devices that are > passthrough'd to the Guest. In these cases, the Guest compositor > can render into the scanout buffer using a primary GPU and has the > secondary GPU (virtio-gpu) import it for display purposes. > > The main advantage with this is that the imported scanout buffer can > either be displayed locally on the Host (e.g, using Qemu + GTK UI) > or encoded and streamed to a remote client (e.g, Qemu + Spice UI). > Note that since Qemu uses udmabuf driver, there would be no copies > made of the scanout buffer as it is displayed. This should be > possible even when it might reside in device memory such has VRAM. > > The specific use-case that can be supported with this series is when > running Weston or other guest compositors with "additional-devices" > feature (./weston --drm-device=card1 --additional-devices=card0). > More info about this feature can be found at: > https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/736 > > In the above scenario, card1 could be a dGPU or an iGPU and card0 > would be virtio-gpu in KMS only mode. However, the case where this > patch series could be particularly useful is when card1 is a GPU VF > that needs to share its scanout buffer (in a zero-copy way) with the > GPU PF on the Host. Or, it can also be useful when the scanout buffer > needs to be shared between any two GPU devices (assuming one of them > is assigned to a Guest VM) as long as they are P2P DMA compatible. > Is passthrough iGPU-only or passthrough dGPU-only something you intend to use? If it's a dGPU + iGPU setup, then the way other people seem to do it is a "virtualized" iGPU (via virgl/gfxstream/take your pick) and pass-through the dGPU. For example, AMD seems to use virgl to allocate and import into the dGPU. https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23896 https://lore.kernel.org/all/20231221100016.4022353-1-julia.zh...@amd.com/ ChromeOS also uses that method (see crrev.com/c/3764931) [cc: dGPU architect +Dominik Behr <db...@google.com>] So if iGPU + dGPU is the primary use case, you should be able to use these methods as well. The model would "virtualized iGPU" + passthrough dGPU, not split SoCs. > As part of the import, the virtio-gpu driver shares the dma > addresses and lengths with Qemu which then determines whether the > memory region they belong to is owned by a PCI device or whether it > is part of the Guest's system ram. If it is the former, it identifies > the devid (or bdf) and bar and provides this info (along with offsets > and sizes) to the udmabuf driver. In the latter case, instead of the > the devid and bar it provides the memfd. The udmabuf driver then > creates a dmabuf using this info that Qemu shares with Spice for > encode via Gstreamer. > > Note that the virtio-gpu driver registers a move_notify() callback > to track location changes associated with the scanout buffer and > sends attach/detach backing cmds to Qemu when appropriate. And, > synchronization (that is, ensuring that Guest and Host are not > using the scanout buffer at the same time) is ensured by pinning/ > unpinning the dmabuf as part of plane update and using a fence > in resource_flush cmd. I'm not sure how QEMU's display paths work, but with crosvm if you share the guest-created dmabuf with the display, and the guest moves the backing pages, the only recourse is the destroy the surface and show a black screen to the user: not the best thing experience wise. Only amdgpu calls dma_buf_move_notfiy(..), and you're probably testing on Intel only, so you may not be hitting that code path anyways. I forgot the exact reason, but apparently udmabuf may not work with amdgpu displays and it seems the virtualized iGPU + dGPU is the way to go for amdgpu anyways. So I recommend just pinning the buffer for the lifetime of the import for simplicity and correctness. > This series is available at: > https://gitlab.freedesktop.org/Vivek/drm-tip/-/commits/virtgpu_import_rfc > > along with additional patches for Qemu and Spice here: > https://gitlab.freedesktop.org/Vivek/qemu/-/commits/virtgpu_dmabuf_pcidev > https://gitlab.freedesktop.org/Vivek/spice/-/commits/encode_dmabuf_v4 > > Patchset overview: > > Patch 1: Implement VIRTIO_GPU_CMD_RESOURCE_DETACH_BACKING cmd > Patch 2-3: Helpers to initalize, import, free imported object > Patch 4-5: Import and use buffers from other devices for scanout > Patch 6-7: Have udmabuf driver create dmabuf from PCI bars for P2P DMA > > This series is tested using the following method: > - Run Qemu with the following relevant options: > qemu-system-x86_64 -m 4096m .... > -device vfio-pci,host=0000:03:00.0 > -device virtio-vga,max_outputs=1,blob=true,xres=1920,yres=1080 > -spice > port=3001,gl=on,disable-ticketing=on,preferred-codec=gstreamer:h264 > -object memory-backend-memfd,id=mem1,size=4096M > -machine memory-backend=mem1 ... > - Run upstream Weston with the following options in the Guest VM: > ./weston --drm-device=card1 --additional-devices=card0 > > where card1 is a DG2 dGPU (passthrough'd and using xe driver in Guest VM), > card0 is virtio-gpu and the Host is using a RPL iGPU. > > Cc: Gerd Hoffmann <kra...@redhat.com> > Cc: Dongwon Kim <dongwon....@intel.com> > Cc: Daniel Vetter <daniel.vet...@ffwll.ch> > Cc: Christian Koenig <christian.koe...@amd.com> > Cc: Dmitry Osipenko <dmitry.osipe...@collabora.com> > Cc: Rob Clark <robdcl...@chromium.org> > Cc: Thomas Hellström <thomas.hellst...@linux.intel.com> > Cc: Oded Gabbay <ogab...@kernel.org> > Cc: Michal Wajdeczko <michal.wajdec...@intel.com> > Cc: Michael Tretter <m.tret...@pengutronix.de> > > Vivek Kasireddy (7): > drm/virtio: Implement VIRTIO_GPU_CMD_RESOURCE_DETACH_BACKING cmd > drm/virtio: Add a helper to map and note the dma addrs and lengths > drm/virtio: Add helpers to initialize and free the imported object > drm/virtio: Import prime buffers from other devices as guest blobs > drm/virtio: Ensure that bo's backing store is valid while updating > plane > udmabuf/uapi: Add new ioctl to create a dmabuf from PCI bar regions > udmabuf: Implement UDMABUF_CREATE_LIST_FOR_PCIDEV ioctl > > drivers/dma-buf/udmabuf.c | 122 ++++++++++++++++-- > drivers/gpu/drm/virtio/virtgpu_drv.h | 8 ++ > drivers/gpu/drm/virtio/virtgpu_plane.c | 56 ++++++++- > drivers/gpu/drm/virtio/virtgpu_prime.c | 167 ++++++++++++++++++++++++- > drivers/gpu/drm/virtio/virtgpu_vq.c | 15 +++ > include/uapi/linux/udmabuf.h | 11 +- > 6 files changed, 368 insertions(+), 11 deletions(-) > > -- > 2.43.0 > >