"Michael S. Tsirkin" <m...@redhat.com> writes: > On Sun, Jun 23, 2024 at 06:23:30PM +0300, Dmitry Osipenko wrote: >> Hello, >> >> This series enables Vulkan Venus context support on virtio-gpu. >> >> All virglrender and almost all Linux kernel prerequisite changes >> needed by Venus are already in upstream. For kernel there is a pending >> KVM patchset that fixes mapping of compound pages needed for DRM drivers >> using TTM [1], othewrwise hostmem blob mapping will fail with a KVM error >> from Qemu. > > > I'd like to see a Tested-by from Alex Bennée for this - > he's been testing v14 last as far as I could see.
For the record I have been testing v16 but so far I'm seeing the same problems. Broadly: Guest: Debian Trixie (aarch64 and x86_64) + custom build mesa Host: Debian Bookworm (x86 with Intel graphics) or Trixie (aarch64 with AMD graphics) with a small patch to use a vendored virglrenderer: Message-Id: <20240605133527.529950-1-alex.ben...@linaro.org> Date: Wed, 5 Jun 2024 14:35:27 +0100 Subject: [RFC PATCH] subprojects: add a wrapper for libvirglrenderer From: =?UTF-8?q?Alex=20Benn=C3=A9e?= <alex.ben...@linaro.org> which is currently full of debug fprintf's as I've tried to isolate stuff. Running vkcube will draw a frame (aarch64 guest) or a black square (x86_64 guest) and then hang, reporting "stuck in fence wait with iter at.." in the terminal once vkcube is killed. The testing on my Aarch64 guests has been run with a manually booted weston from the framebuffer console. The x86 case was run inside a gnome session (which was using system mesa) as I was unable to get the framebuffer console working. My x86 host is running 6.9.x but currently doesn't have the TTM patch (as I didn't think it was needed for Intel) however I will apply it and test next week when I'm back next to the physical machine. I've got a fairly scratch branch based on v16 with a bunch of additional debug printfs which showed some funkiness with the mappings: virgl_gbm_gpu_import_required: a -> import required virgl_gbm_gpu_import_required: a -> import required virgl_gbm_gpu_import_required: a -> import required virgl_gbm_gpu_import_required: 1 -> import required vrend_renderer_resource_unmap: invalid bits 0x4b virgl_renderer_resource_unmap: unexpected ret = -22, pipe:0x56265f58d0e0 fd_type:0 virtio_gpu_virgl_unmap_resource_blob: failed to unmap virgl resource: Invalid argument virtio_gpu_virgl_process_cmd: ctrl 0x209, error 0x1200 [ 159.298765] [drm:virtio_gpu_dequeue_ctrl_func] *ERROR* response 0x1200 (command 0x209) vrend_renderer_resource_unmap: invalid bits 0x4b virgl_renderer_resource_unmap: unexpected ret = -22, pipe:0x56265cfce470 fd_type:0 virtio_gpu_virgl_unmap_resource_blob: failed to unmap virgl resource: Invalid argument virtio_gpu_virgl_process_cmd: ctrl 0x209, error 0x1200 [ 159.400565] [drm:virtio_gpu_dequeue_ctrl_func] *ERROR* response 0x1200 (command 0x209) vrend_renderer_resource_unmap: invalid bits 0x4b virgl_renderer_resource_unmap: unexpected ret = -22, pipe:0x56265dc596b0 fd_type:0 virtio_gpu_virgl_unmap_resource_blob: failed to unmap virgl resource: Invalid argument virtio_gpu_virgl_process_cmd: ctrl 0x209, error 0x1200 [ 159.502414] [drm:virtio_gpu_dequeue_ctrl_func] *ERROR* response 0x1200 (command 0x209) vrend_renderer_resource_unmap: invalid bits 0x4b virgl_renderer_resource_unmap: unexpected ret = -22, pipe:0x56265f2256c0 fd_type:0 virtio_gpu_virgl_unmap_resource_blob: failed to unmap virgl resource: Invalid argument virtio_gpu_virgl_process_cmd: ctrl 0x209, error 0x1200 [ 159.603818] [drm:virtio_gpu_dequeue_ctrl_func] *ERROR* response 0x1200 (command 0x209) vrend_renderer_resource_unmap: invalid bits 0x4b virgl_renderer_resource_unmap: unexpected ret = -22, pipe:0x56265dc594d0 fd_type:0 virtio_gpu_virgl_unmap_resource_blob: failed to unmap virgl resource: Invalid argument virtio_gpu_virgl_process_cmd: ctrl 0x209, error 0x1200 virgl_gbm_gpu_import_required: a -> import required virgl_gbm_gpu_import_required: 2 -> import required [ 159.705386] [drm:virtio_gpu_dequeue_ctrl_func] *ERROR* response 0x1200 (command 0x209) but so far I wasn't able to track where the resources were originally mapped. I did a mini-debug session with Dimity last week but we weren't able to get anything useful. I won't be able to get back to testing this until next week. I also intend to test with vhost-user-gpu and the currently brewing vhost-device branch: https://github.com/rust-vmm/vhost-device/pull/668 to see if I can isolate these as virglrender issues or QEMU memory mapping issues. > > >> [1] https://lore.kernel.org/kvm/20240229025759.1187910-1-steve...@google.com/ >> >> You'll need to use recent Mesa version containing patch that removes >> dependency on cross-device feature from Venus that isn't supported by >> Qemu [2]. >> >> [2] >> https://gitlab.freedesktop.org/mesa/mesa/-/commit/087e9a96d13155e26987befae78b6ccbb7ae242b >> >> Example Qemu cmdline that enables Venus: >> >> qemu-system-x86_64 -device virtio-vga-gl,hostmem=4G,blob=true,venus=true \ >> -machine q35,accel=kvm,memory-backend=mem1 \ >> -object memory-backend-memfd,id=mem1,size=8G -m 8G >> >> >> Changes from V15 to V16 >> >> - Fixed resource_get_info() change made in v15 that was squshed into a >> wrong patch. Squashed set_scanout_blob patch into the blob-commands patch, >> otherwise we'll need to revert back ordering of blob patches to older v7 >> variant. >> >> - Changed hostmem mapping state to boolean, suggested by Akihiko Odaki. >> >> - Documented Venus in virtio-gpu doc. Amended "Support Venus context" patch >> with the doc addition. Suggested by Akihiko Odaki. >> >> Changes from V14 to V15 >> >> - Dropped hostmem mapping state that got unused in v14, suggested by >> Akihiko Odaki. >> >> - Moved resource_get_info() from set_scanout_blob() to create_blob(), >> suggested by Akihiko Odaki. >> >> - Fixed unitilized variable in create_blob(), spotted by Alex Bennée. >> >> Changes from V13 to V14 >> >> - Fixed erronous fall-through in renderer_state's switch-case that was >> spotted by Marc-André Lureau. >> >> - Reworked HOSTMEM_MR_FINISH_UNMAPPING handling as was suggested by >> Akihiko Odaki. Now it shares the same code path with HOSTMEM_MR_MAPPED. >> >> - Made use of g_autofree in virgl_cmd_resource_create_blob() as was >> suggested by Akihiko Odaki. >> >> - Removed virtio_gpu_virgl_deinit() and moved all deinit code to >> virtio_gpu_gl_device_unrealize() as was suggested by Marc-André Lureau. >> >> - Replaced HAVE_FEATURE in mseon.build with virglrenderer's VERSION_MAJOR >> check as was suggested by Marc-André Lureau. >> >> - Added trace event for cmd-suspension as was suggested by Marc-André Lureau. >> >> - Added patch to replace in-flight printf's with trace events as was >> suggested by Marc-André Lureau >> >> Changes from V12 to V13 >> >> - Replaced `res->async_unmap_in_progress` flag with a mapping state, >> moved it to the virtio_gpu_virgl_hostmem_region like was suggested >> by Akihiko Odaki. >> >> - Renamed blob_unmap function and added back cmd_suspended argument >> to it. Suggested by Akihiko Odaki. >> >> - Reordered VirtIOGPUGL refactoring patches to minimize code changes >> like was suggested by Akihiko Odaki. >> >> - Replaced gl->renderer_inited with gl->renderer_state, like was suggested >> by Alex Bennée. >> >> - Added gl->renderer state resetting to gl_device_unrealize(), for >> consistency. Suggested by Alex Bennée. >> >> - Added rb's from Alex and Manos. >> >> - Fixed compiling with !HAVE_VIRGL_RESOURCE_BLOB. >> >> Changes from V11 to V12 >> >> - Fixed virgl_cmd_resource_create_blob() error handling. Now it doesn't >> corrupt resource list and releases resource properly on error. Thanks >> to Akihiko Odaki for spotting the bug. >> >> - Added new patch that handles virtio_gpu_virgl_init() failure gracefully, >> fixing Qemu crash. Besides fixing the crash, it allows to implement >> a cleaner virtio_gpu_virgl_deinit(). >> >> - virtio_gpu_virgl_deinit() now assumes that previously virgl was >> initialized successfully when it was inited at all. Suggested by >> Akihiko Odaki. >> >> - Fixed missed freeing of print_stats timer in virtio_gpu_virgl_deinit() >> >> - Added back blob unmapping or RESOURCE_UNREF that was requested >> by Akihiko Odaki. Added comment to the code explaining how >> async unmapping works. Added back `res->async_unmap_in_progress` >> flag and added comment telling why it's needed. >> >> - Moved cmdq_resume_bh to VirtIOGPUGL and made coding style changes >> suggested by Akihiko Odaki. >> >> - Added patches that move fence_poll and print_stats timers to VirtIOGPUGL >> for consistency with cmdq_resume_bh. >> >> Changes from V10 to V11 >> >> - Replaced cmd_resume bool in struct ctrl_command with >> "cmd->finished + !VIRTIO_GPU_FLAG_FENCE" checking as was requested >> by Akihiko Odaki. >> >> - Reworked virgl_cmd_resource_unmap/unref_blob() to avoid re-adding >> the 'async_unmap_in_progress' flag that was dropped in v9: >> >> 1. virgl_cmd_resource_[un]map_blob() now doesn't check itself whether >> resource was previously mapped and lets virglrenderer to do the >> checking. >> >> 2. error returned by virgl_renderer_resource_unmap() is now handled >> and reported properly, previously the error wasn't checked. The >> virgl_renderer_resource_unmap() fails if resource wasn't mapped. >> >> 3. virgl_cmd_resource_unref_blob() now doesn't allow to unref resource >> that is mapped, it's a error condition if guest didn't unmap resource >> before doing the unref. Previously unref was implicitly unmapping >> resource. >> >> Changes from V9 to V10 >> >> - Dropped 'async_unmap_in_progress' variable and switched to use >> aio_bh_new() isntead of oneshot variant in the "blob commands" patch. >> >> - Further improved error messages by printing error code when actual error >> occurrs and using ERR_UNSPEC instead of ERR_ENOMEM when we don't really >> know if it was ENOMEM for sure. >> >> - Added vdc->unrealize for the virtio GL device and freed virgl data. >> >> - Dropped UUID and doc/migration patches. UUID feature isn't needed >> anymore, instead we changed Mesa Venus driver to not require UUID. >> >> - Renamed virtio-gpu-gl "vulkan" property name back to "venus". >> >> Changes from V8 to V9 >> >> - Added resuming of cmdq processing when hostmem MR is freed, >> as was suggested by Akihiko Odaki. >> >> - Added more error messages, suggested by Akihiko Odaki >> >> - Dropped superfluous 'res->async_unmap_completed', suggested >> by Akihiko Odaki. >> >> - Kept using cmd->suspended flag. Akihiko Odaki suggested to make >> virtio_gpu_virgl_process_cmd() return false if cmd processing is >> suspended, but it's not easy to implement due to ubiquitous >> VIRTIO_GPU_FILL_CMD() macros that returns void, requiring to change >> all the virtio-gpu processing code. >> >> - Added back virtio_gpu_virgl_resource as was requested by Akihiko Odaki, >> though I'm not convinced it's really needed. >> >> - Switched to use GArray, renamed capset2_max_ver/size vars and moved >> "vulkan" property definition to the virtio-gpu-gl device in the Venus >> patch, like was suggested by Akihiko Odaki. >> >> - Moved UUID to virtio_gpu_virgl_resource and dropped UUID save/restore >> since it will require bumping VM version and virgl device isn't miratable >> anyways. >> >> - Fixed exposing UUID feature with Rutabaga >> >> - Dropped linux-headers update patch because headers were already updated >> in Qemu/staging. >> >> - Added patch that updates virtio migration doc with a note about virtio-gpu >> migration specifics, suggested by Akihiko Odaki. >> >> - Addressed coding style issue noticed by Akihiko Odaki >> >> Changes from V7 to V8 >> >> - Supported suspension of virtio-gpu commands processing and made >> unmapping of hostmem region asynchronous by blocking/suspending >> cmd processing until region is unmapped. Suggested by Akihiko Odaki. >> >> - Fixed arm64 building of x86 targets using updated linux-headers. >> Corrected the update script. Thanks to Rob Clark for reporting >> the issue. >> >> - Added new patch that makes registration of virgl capsets dynamic. >> Requested by Antonio Caggiano and Pierre-Eric Pelloux-Prayer. >> >> - Venus capset now isn't advertised if Vulkan is disabled with vulkan=false >> >> Changes from V6 to V7 >> >> - Used scripts/update-linux-headers.sh to update Qemu headers based >> on Linux v6.8-rc3 that adds Venus capset definition to virtio-gpu >> protocol, was requested by Peter Maydel >> >> - Added r-bs that were given to v6 patches. Corrected missing s-o-bs >> >> - Dropped context_init Qemu's virtio-gpu device configuration flag, >> was suggested by Marc-André Lureau >> >> - Added missing error condition checks spotted by Marc-André Lureau >> and Akihiko Odaki, and few more >> >> - Returned back res->mr referencing to memory_region_init_ram_ptr() like >> was suggested by Akihiko Odaki. Incorporated fix suggested by Pierre-Eric >> to specify the MR name >> >> - Dropped the virgl_gpu_resource wrapper, cleaned up and simplified >> patch that adds blob-cmd support >> >> - Fixed improper blob resource removal from resource list on resource_unref >> that was spotted by Akihiko Odaki >> >> - Change order of the blob patches, was suggested by Akihiko Odaki. >> The cmd_set_scanout_blob support is enabled first >> >> - Factored out patch that adds resource management support to virtio-gpu-gl, >> was requested by Marc-André Lureau >> >> - Simplified and improved the UUID support patch, dropped the hash table >> as we don't need it for now. Moved QemuUUID to virtio_gpu_simple_resource. >> This all was suggested by Akihiko Odaki and Marc-André Lureau >> >> - Dropped console_has_gl() check, suggested by Akihiko Odaki >> >> - Reworked Meson cheking of libvirglrender features, made new features >> available based on virglrender pkgconfig version instead of checking >> symbols in header. This should fix build error using older virglrender >> version, reported by Alex Bennée >> >> - Made enabling of Venus context configrable via new virtio-gpu device >> "vulkan=true" flag, suggested by Marc-André Lureau. The flag is disabled >> by default because it requires blob and hostmem options to be enabled >> and configured >> >> Changes from V5 to V6 >> >> - Move macros configurations under virgl.found() and rename >> HAVE_VIRGL_CONTEXT_CREATE_WITH_FLAGS. >> >> - Handle the case while context_init is disabled. >> >> - Enable context_init by default. >> >> - Move virtio_gpu_virgl_resource_unmap() into >> virgl_cmd_resource_unmap_blob(). >> >> - Introduce new struct virgl_gpu_resource to store virgl specific members. >> >> - Remove erro handling of g_new0, because glib will abort() on OOM. >> >> - Set resource uuid as option. >> >> - Implement optional subsection of vmstate_virtio_gpu_resource_uuid_state >> for virtio live migration. >> >> - Use g_int_hash/g_int_equal instead of the default >> >> - Add scanout_blob function for virtio-gpu-virgl >> >> - Resolve the memory leak on virtio-gpu-virgl >> >> - Remove the unstable API flags check because virglrenderer is already 1.0 >> >> - Squash the render server flag support into "Initialize Venus" >> >> Changes from V4 (virtio gpu V4) to V5 >> >> - Inverted patch 5 and 6 because we should configure >> HAVE_VIRGL_CONTEXT_INIT firstly. >> >> - Validate owner of memory region to avoid slowing down DMA. >> >> - Use memory_region_init_ram_ptr() instead of >> memory_region_init_ram_device_ptr(). >> >> - Adjust sequence to allocate gpu resource before virglrender resource >> creation >> >> - Add virtio migration handling for uuid. >> >> - Send kernel patch to define VIRTIO_GPU_CAPSET_VENUS. >> https://lore.kernel.org/lkml/20230915105918.3763061-1-ray.hu...@amd.com/ >> >> - Add meson check to make sure unstable APIs defined from 0.9.0. >> >> Changes from V1 to V2 (virtio gpu V4) >> >> - Remove unused #include "hw/virtio/virtio-iommu.h" >> >> - Add a local function, called virgl_resource_destroy(), that is used >> to release a vgpu resource on error paths and in resource_unref. >> >> - Remove virtio_gpu_virgl_resource_unmap from >> virtio_gpu_cleanup_mapping(), >> since this function won't be called on blob resources and also because >> blob resources are unmapped via virgl_cmd_resource_unmap_blob(). >> >> - In virgl_cmd_resource_create_blob(), do proper cleanup in error paths >> and move QTAILQ_INSERT_HEAD(&g->reslist, res, next) after the resource >> has been fully initialized. >> >> - Memory region has a different life-cycle from virtio gpu resources >> i.e. cannot be released synchronously along with the vgpu resource. >> So, here the field "region" was changed to a pointer and is allocated >> dynamically when the blob is mapped. >> Also, since the pointer can be used to indicate whether the blob >> is mapped, the explicite field "mapped" was removed. >> >> - In virgl_cmd_resource_map_blob(), add check on the value of >> res->region, to prevent beeing called twice on the same resource. >> >> - Add a patch to enable automatic deallocation of memory regions to resolve >> use-after-free memory corruption with a reference. >> >> >> Antonio Caggiano (1): >> virtio-gpu: Support Venus context >> >> Dmitry Osipenko (8): >> virtio-gpu: Use trace events for tracking number of in-flight fences >> virtio-gpu: Move fence_poll timer to VirtIOGPUGL >> virtio-gpu: Move print_stats timer to VirtIOGPUGL >> virtio-gpu: Handle virtio_gpu_virgl_init() failure >> virtio-gpu: Unrealize GL device >> virtio-gpu: Use pkgconfig version to decide which virgl features are >> available >> virtio-gpu: Don't require udmabuf when blobs and virgl are enabled >> virtio-gpu: Support suspension of commands processing >> >> Huang Rui (2): >> virtio-gpu: Support context-init feature with virglrenderer >> virtio-gpu: Add virgl resource management >> >> Pierre-Eric Pelloux-Prayer (1): >> virtio-gpu: Register capsets dynamically >> >> Robert Beckett (1): >> virtio-gpu: Handle resource blob commands >> >> docs/system/devices/virtio-gpu.rst | 11 + >> hw/display/trace-events | 3 + >> hw/display/virtio-gpu-gl.c | 62 ++- >> hw/display/virtio-gpu-virgl.c | 585 +++++++++++++++++++++++++++-- >> hw/display/virtio-gpu.c | 44 ++- >> include/hw/virtio/virtio-gpu.h | 32 +- >> meson.build | 5 +- >> 7 files changed, 685 insertions(+), 57 deletions(-) >> >> -- >> 2.45.2 -- Alex Bennée Virtualisation Tech Lead @ Linaro