Dmitry Osipenko <dmitry.osipe...@collabora.com> writes: > On 5/22/24 12:00, Alex Bennée wrote: >> Dmitry Osipenko <dmitry.osipe...@collabora.com> writes: >> >>> On 5/21/24 17:57, Alex Bennée wrote: >>>> Alex Bennée <alex.ben...@linaro.org> writes: >>>> >>>>> Dmitry Osipenko <dmitry.osipe...@collabora.com> writes: >>>>> >>>>>> Hello, >>>>>> >>>>>> This series enables Vulkan Venus context support on virtio-gpu. >>>>>> >>>>>> All virglrender and almost all Linux kernel prerequisite changes >>>>>> needed by Venus are already in upstream. For kernel there is a pending >>>>>> KVM patchset that fixes mapping of compound pages needed for DRM drivers >>>>>> using TTM [1], othewrwise hostmem blob mapping will fail with a KVM error >>>>>> from Qemu. >>>>>> >>>>>> [1] >>>>>> https://lore.kernel.org/kvm/20240229025759.1187910-1-steve...@google.com/ >>>>>> >>>>>> You'll need to use recent Mesa version containing patch that removes >>>>>> dependency on cross-device feature from Venus that isn't supported by >>>>>> Qemu [2]. >>>>>> >>>>>> [2] >>>>>> https://gitlab.freedesktop.org/mesa/mesa/-/commit/087e9a96d13155e26987befae78b6ccbb7ae242b >>>>>> >>>>>> Example Qemu cmdline that enables Venus: >>>>>> >>>>>> qemu-system-x86_64 -device >>>>>> virtio-vga-gl,hostmem=4G,blob=true,venus=true \ >>>>>> -machine q35,accel=kvm,memory-backend=mem1 \ >>>>>> -object memory-backend-memfd,id=mem1,size=8G -m 8G >>>>> >>>>> What is the correct device for non-x86 guests? We have virtio-gpu-gl-pci >>>>> but when doing that I get: >>>>> >>>>> -device virtio-gpu-gl-pci,hostmem=4G,blob=true,venus=true >>>>> qemu-system-aarch64: -device >>>>> virtio-gpu-gl-pci,hostmem=4G,blob=true,venus=true: opengl is not available >>>>> >>>>> According to 37f86af087 (virtio-gpu: move virgl realize + properties): >>>>> >>>>> Drop the virgl property, the virtio-gpu-gl-device has virgl enabled no >>>>> matter what. Just use virtio-gpu-device instead if you don't want >>>>> enable virgl and opengl. This simplifies the logic and reduces the test >>>>> matrix. >>>>> >>>>> but that's not a good solution because that needs virtio-mmio and there >>>>> are reasons to have a PCI device (for one thing no ambiguity about >>>>> discovery). >>>> >>>> Oops my mistake forgetting: >>>> >>>> --display gtk,gl=on >>>> >>>> Although I do see a lot of eglMakeContext failures. >>> >>> Please post the full Qemu cmdline you're using >> >> With: >> >> ./qemu-system-aarch64 \ >> -machine type=virt,virtualization=on,pflash0=rom,pflash1=efivars \ >> -cpu neoverse-n1 \ >> -smp 4 \ >> -accel tcg \ >> -device virtio-net-pci,netdev=unet \ >> -device virtio-scsi-pci \ >> -device scsi-hd,drive=hd \ >> -netdev user,id=unet,hostfwd=tcp::2222-:22 \ >> -blockdev >> driver=raw,node-name=hd,file.driver=host_device,file.filename=/dev/zen-ssd2/trixie-arm64,discard=unmap >> \ >> -serial mon:stdio \ >> -blockdev >> node-name=rom,driver=file,filename=(pwd)/pc-bios/edk2-aarch64-code.fd,read-only=true >> \ >> -blockdev >> node-name=efivars,driver=file,filename=$HOME/images/qemu-arm64-efivars \ >> -m 8192 \ >> -object memory-backend-memfd,id=mem,size=8G,share=on \ >> -device virtio-gpu-gl-pci,hostmem=4G,blob=true,venus=true \ >> -display gtk,gl=on,show-cursor=on -vga none \ >> -device qemu-xhci -device usb-kbd -device usb-tablet >> >> I get a boot up with a lot of: >> >> >> >> >> (qemu:1545322): Gdk-WARNING **: 09:26:09.470: eglMakeCurrent failed >> >> >> >> (qemu:1545322): Gdk-WARNING **: 09:26:09.470: eglMakeCurrent failed >> >> >> >> (qemu:1545322): Gdk-WARNING **: 09:26:09.470: eglMakeCurrent failed >> >> >> >> (qemu:1545322): Gdk-WARNING **: 09:26:09.470: eglMakeCurrent failed >> >> >> In the guest I run: >> >> meson devenv -C /root/lsrc/graphics/mesa.git/build fish >> >> to bring in the latest Mesa (with virtio enabled). Running vulkaninfo >> reports two cards: >> >> ========== >> >> VULKANINFO >> ========== >> >> >> Vulkan Instance Version: 1.3.280 >> >> >> >> Instance Extensions: count = 14 >> ------------------------------- >> VK_EXT_debug_report : extension revision 10 >> VK_EXT_debug_utils : extension revision 2 >> VK_EXT_headless_surface : extension revision 1 >> VK_KHR_device_group_creation : extension revision 1 >> VK_KHR_external_fence_capabilities : extension revision 1 >> VK_KHR_external_memory_capabilities : extension revision 1 >> VK_KHR_external_semaphore_capabilities : extension revision 1 >> VK_KHR_get_physical_device_properties2 : extension revision 2 >> VK_KHR_get_surface_capabilities2 : extension revision 1 >> VK_KHR_portability_enumeration : extension revision 1 >> VK_KHR_surface : extension revision 25 >> VK_KHR_surface_protected_capabilities : extension revision 1 >> VK_KHR_wayland_surface : extension revision 6 >> VK_LUNARG_direct_driver_loading : extension revision 1 >> >> Instance Layers: count = 2 >> -------------------------- >> VK_LAYER_MESA_device_select Linux device selection layer 1.3.211 version 1 >> VK_LAYER_MESA_overlay Mesa Overlay layer 1.3.211 >> version 1 >> >> Devices: >> ======== >> GPU0: >> apiVersion = 1.3.230 >> driverVersion = 24.1.99 >> vendorID = 0x8086 >> deviceID = 0xa780 >> deviceType = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU >> deviceName = Virtio-GPU Venus (Intel(R) Graphics (RPL-S)) >> driverID = DRIVER_ID_MESA_VENUS >> driverName = venus >> driverInfo = Mesa 24.2.0-devel (git-0b582449f0) >> conformanceVersion = 1.3.0.0 >> deviceUUID = 29d2e940-a1a0-3054-0f9a-9f7dec52a084 >> driverUUID = 3694c390-f245-612a-12ce-7d3a99127622 >> GPU1: >> apiVersion = 1.2.0 >> driverVersion = 24.1.99 >> vendorID = 0x10005 >> deviceID = 0x0000 >> deviceType = PHYSICAL_DEVICE_TYPE_CPU >> deviceName = Virtio-GPU Venus (llvmpipe (LLVM 15.0.6, 256 >> bits)) >> driverID = DRIVER_ID_MESA_VENUS >> driverName = venus >> driverInfo = Mesa 24.2.0-devel (git-0b582449f0) >> conformanceVersion = 1.3.0.0 >> deviceUUID = 5fb5c03f-c537-f0fe-a7e6-9cd5866acb8d >> driverUUID = 3694c390-f245-612a-12ce-7d3a99127622 >> >> Running weston and then vkcube-wayland reports its selecting "GPU 0: >> Virtio-GPU Venus (Intel(R) Graphics (RPL-S))" but otherwise produces no >> output. >> >> If I run with "-display sdl,gl=on,show-cursor=on" and the same other >> command line options the results for vulkaninfo are the same. However >> vkcube-wayland gets a little further and draws the initial cube on the >> screen before locking up with: >> >> MESA-VIRTIO: debug: stuck in fence wait with iter at xxxx >> >> where xxxx grows each time it prints. On shutting down I see some virgl >> errors interspersed with the systemd logs: >> >> [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 >> (command 0x209) >> [ OK ] Stopped systemd-logind.service - User Login Management. >> virtio_gpu_virgl_process_cmd: ctrl 0x209, error 0x1200 >> [ 475.257111] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* >> response 0x1200 (command 0x209) >> [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 >> (command 0x209) >> [ OK ] Stopped target network-online.target - Network is Online. >> [ OK ] Stopped target remote-fs.target - Remote File Systems. >> [ OK ] Stopped NetworkManager-wait-online…vice - Network Manager Wait >> Online. >> Stopping avahi-daemon.service - Avahi mDNS/DNS-SD Stack... >> Stopping cups.service - CUPS Scheduler... >> Stopping user-runtime-dir@0.servic…er Runtime Directory >> /run/user/0... >> [ OK ] Stopped avahi-daemon.service - Avahi mDNS/DNS-SD Stack. >> [ OK ] Stopped cups.service - CUPS Scheduler. >> virtio_gpu_virgl_process_cmd: ctrl 0x209, error 0x1200 >> [ 475.357543] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* >> response 0x1200 (command 0x209) >> [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 >> (command 0x209) >> [ OK ] Stopped target network.target - Network. >> [ OK ] Stopped target nss-user-lookup.target - User and Group Name >> Lookups. >> Stopping NetworkManager.service - Network Manager... >> Stopping networking.service - Raise network interfaces... >> Stopping wpa_supplicant.service - WPA supplicant... >> [ OK ] Stopped wpa_supplicant.service - WPA supplicant. >> virtio_gpu_virgl_process_cmd: ctrl 0x209, error 0x1200 >> [ 493.585261] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* >> response 0x1200 (command 0x209) >> [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 >> (command 0x209) >> > > I've reproduced this with qemu-system-aarch64. Vkcube works for a second > and then stops, Qemu compeltely gets frozen after closing and re-running > vkcube. Doesn't feel like this is a problem with venus, but with arm64. > For now don't know where is the bug, will take a closer look.
I'm guessing some sort of resource leak, if I run vkcube-wayland in the guest it complains about being stuck on a fence with the iterator going up. However on the host I see: virtio_gpu_fence_ctrl fence 0x13f1, type 0x207 virtio_gpu_fence_ctrl fence 0x13f2, type 0x207 virtio_gpu_fence_resp fence 0x13f1 virtio_gpu_fence_resp fence 0x13f2 virtio_gpu_fence_ctrl fence 0x13f3, type 0x207 virtio_gpu_fence_ctrl fence 0x13f4, type 0x207 virtio_gpu_fence_resp fence 0x13f3 virtio_gpu_fence_resp fence 0x13f4 virtio_gpu_fence_ctrl fence 0x13f5, type 0x207 virtio_gpu_fence_ctrl fence 0x13f6, type 0x207 virtio_gpu_fence_resp fence 0x13f5 virtio_gpu_fence_resp fence 0x13f6 virtio_gpu_fence_ctrl fence 0x13f7, type 0x207 virtio_gpu_fence_ctrl fence 0x13f8, type 0x207 virtio_gpu_fence_resp fence 0x13f7 virtio_gpu_fence_resp fence 0x13f8 virtio_gpu_fence_ctrl fence 0x13f9, type 0x204 virtio_gpu_fence_resp fence 0x13f9 which looks like its going ok. However when I git Ctrl-C in the guest it kills QEMU: virtio_gpu_fence_ctrl fence 0x13fc, type 0x207 virtio_gpu_fence_ctrl fence 0x13fd, type 0x207 virtio_gpu_fence_ctrl fence 0x13fe, type 0x204 virtio_gpu_fence_ctrl fence 0x13ff, type 0x207 virtio_gpu_fence_ctrl fence 0x1400, type 0x207 virtio_gpu_fence_resp fence 0x13fc virtio_gpu_fence_resp fence 0x13fd virtio_gpu_fence_resp fence 0x13fe virtio_gpu_fence_resp fence 0x13ff virtio_gpu_fence_resp fence 0x1400 qemu-system-aarch64: ../../subprojects/virglrenderer/src/virglrenderer.c:1282: virgl_renderer_resource_unmap: Assertion `!ret' failed. fish: Job 1, './qemu-system-aarch64 \' terminated by signal -machine type=virt,virtuali… ( -cpu neoverse-n1 \) fish: Job -smp 4 \, ' -accel tcg \' terminated by signal -device virtio-net-pci,netd… ( -device virtio-scsi-pci \) fish: Job -device scsi-hd,drive=hd \, ' -netdev user,id=unet,hostfw…' terminated by signal -blockdev driver=raw,node-n… ( -serial mon:stdio \) fish: Job -blockdev node-name=rom,dri…, ' -blockdev node-name=efivars…' terminated by signal -m 8192 \ ( -object memory-backend-memf…) fish: Job -device virtio-gpu-gl-pci,h…, ' -display sdl,gl=on,show-cur…' terminated by signal -device qemu-xhci -device u… ( -kernel /home/alex/lsrc/lin…) fish: Job -d guest_errors,unimp,trace…, 'SIGABRT' terminated by signal Abort () The backtrace (and the 18G size of the core file!) indicates a leak: (gdb) bt #0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44 #1 0x00007f0fa68a9e8f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78 #2 0x00007f0fa685afb2 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #3 0x00007f0fa6845472 in __GI_abort () at ./stdlib/abort.c:79 #4 0x00007f0fa6845395 in __assert_fail_base (fmt=0x7f0fa69b9a90 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x55c3e1b0762d "!ret", file=file@entry=0x55c3e1d306f0 "../../subprojects/virglrenderer/src/virglrenderer.c", line=line@entry=1282, function=function@entry=0x55c3e1d30910 <__PRETTY_FUNCTION__.2> "virgl_renderer_resource_unmap") at ./assert/assert.c:92 #5 0x00007f0fa6853eb2 in __GI___assert_fail (assertion=assertion@entry=0x55c3e1b0762d "!ret", file=file@entry=0x55c3e1d306f0 "../../subprojects/virglrenderer/src/virglrenderer.c", line=line@entry=1282, function=function@entry=0x55c3e1d30910 <__PRETTY_FUNCTION__.2> "virgl_renderer_resource_unmap") at ./assert/assert.c:101 #6 0x000055c3e1958b50 in virgl_renderer_resource_unmap (res_handle=<optimized out>) at ../../subprojects/virglrenderer/src/virglrenderer.c:1282 #7 0x000055c3e13d8507 in virtio_gpu_virgl_unmap_resource_blob (g=g@entry=0x55c3e5fed600, res=0x55c3e6e67b60, cmd_suspended=cmd_suspended@entry=0x7ffd5d720aaf) at ../../hw/display/virtio-gpu-virgl.c:188 #8 0x000055c3e13d9af4 in virgl_cmd_resource_unmap_blob (cmd_suspended=0x7ffd5d720aaf, cmd=0x55c3e5bd9710, g=0x55c3e5fed600) at ../../hw/display/virtio-gpu-virgl.c:797 #9 virtio_gpu_virgl_process_cmd (g=0x55c3e5fed600, cmd=0x55c3e5bd9710) at ../../hw/display/virtio-gpu-virgl.c:979 #10 0x000055c3e13d6019 in virtio_gpu_process_cmdq (g=0x55c3e5fed600) at ../../hw/display/virtio-gpu.c:1055 #11 0x000055c3e190c646 in aio_bh_poll (ctx=ctx@entry=0x55c3e4c03710) at ../../util/async.c:218 #12 0x000055c3e18f562e in aio_dispatch (ctx=0x55c3e4c03710) at ../../util/aio-posix.c:423 #13 0x000055c3e190c2ce in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at ../../util/async.c:360 #14 0x00007f0fa8b047a9 in g_main_context_dispatch () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 #15 0x000055c3e190db78 in glib_pollfds_poll () at ../../util/main-loop.c:287 #16 os_host_main_loop_wait (timeout=1882878) at ../../util/main-loop.c:310 #17 main_loop_wait (nonblocking=nonblocking@entry=0) at ../../util/main-loop.c:589 #18 0x000055c3e1348ac9 in qemu_main_loop () at ../../system/runstate.c:796 #19 0x000055c3e174f786 in qemu_default_main () at ../../system/main.c:37 #20 0x00007f0fa684624a in __libc_start_call_main (main=main@entry=0x55c3e10286e0 <main>, argc=argc@entry=47, argv=argv@entry=0x7ffd5d720f18) at ../sysdeps/nptl/libc_start_call_main.h:58 #21 0x00007f0fa6846305 in __libc_start_main_impl (main=0x55c3e10286e0 <main>, argc=47, argv=0x7ffd5d720f18, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffd5d720f08) at ../csu/libc-start.c:360 #22 0x000055c3e102a3f1 in _start () -- Alex Bennée Virtualisation Tech Lead @ Linaro