Dmitry Osipenko <dmitry.osipe...@collabora.com> writes:

> On 5/22/24 12:00, Alex Bennée wrote:
>> Dmitry Osipenko <dmitry.osipe...@collabora.com> writes:
>> 
>>> On 5/21/24 17:57, Alex Bennée wrote:
>>>> Alex Bennée <alex.ben...@linaro.org> writes:
>>>>
>>>>> Dmitry Osipenko <dmitry.osipe...@collabora.com> writes:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> This series enables Vulkan Venus context support on virtio-gpu.
>>>>>>
>>>>>> All virglrender and almost all Linux kernel prerequisite changes
>>>>>> needed by Venus are already in upstream. For kernel there is a pending
>>>>>> KVM patchset that fixes mapping of compound pages needed for DRM drivers
>>>>>> using TTM [1], othewrwise hostmem blob mapping will fail with a KVM error
>>>>>> from Qemu.
>>>>>>
>>>>>> [1] 
>>>>>> https://lore.kernel.org/kvm/20240229025759.1187910-1-steve...@google.com/
>>>>>>
>>>>>> You'll need to use recent Mesa version containing patch that removes
>>>>>> dependency on cross-device feature from Venus that isn't supported by
>>>>>> Qemu [2].
>>>>>>
>>>>>> [2] 
>>>>>> https://gitlab.freedesktop.org/mesa/mesa/-/commit/087e9a96d13155e26987befae78b6ccbb7ae242b
>>>>>>
>>>>>> Example Qemu cmdline that enables Venus:
>>>>>>
>>>>>>   qemu-system-x86_64 -device 
>>>>>> virtio-vga-gl,hostmem=4G,blob=true,venus=true \
>>>>>>       -machine q35,accel=kvm,memory-backend=mem1 \
>>>>>>       -object memory-backend-memfd,id=mem1,size=8G -m 8G
>>>>>
>>>>> What is the correct device for non-x86 guests? We have virtio-gpu-gl-pci
>>>>> but when doing that I get:
>>>>>
>>>>>   -device virtio-gpu-gl-pci,hostmem=4G,blob=true,venus=true
>>>>>   qemu-system-aarch64: -device 
>>>>> virtio-gpu-gl-pci,hostmem=4G,blob=true,venus=true: opengl is not available
>>>>>
>>>>> According to 37f86af087 (virtio-gpu: move virgl realize + properties):
>>>>>
>>>>>   Drop the virgl property, the virtio-gpu-gl-device has virgl enabled no
>>>>>   matter what.  Just use virtio-gpu-device instead if you don't want
>>>>>   enable virgl and opengl.  This simplifies the logic and reduces the test
>>>>>   matrix.
>>>>>
>>>>> but that's not a good solution because that needs virtio-mmio and there
>>>>> are reasons to have a PCI device (for one thing no ambiguity about
>>>>> discovery).
>>>>
>>>> Oops my mistake forgetting:
>>>>
>>>>   --display gtk,gl=on
>>>>
>>>> Although I do see a lot of eglMakeContext failures.
>>>
>>> Please post the full Qemu cmdline you're using
>> 
>> With:
>> 
>>   ./qemu-system-aarch64 \
>>            -machine type=virt,virtualization=on,pflash0=rom,pflash1=efivars \
>>            -cpu neoverse-n1 \
>>            -smp 4 \
>>            -accel tcg \
>>            -device virtio-net-pci,netdev=unet \
>>            -device virtio-scsi-pci \
>>            -device scsi-hd,drive=hd \
>>            -netdev user,id=unet,hostfwd=tcp::2222-:22 \
>>            -blockdev 
>> driver=raw,node-name=hd,file.driver=host_device,file.filename=/dev/zen-ssd2/trixie-arm64,discard=unmap
>>  \
>>            -serial mon:stdio \
>>            -blockdev 
>> node-name=rom,driver=file,filename=(pwd)/pc-bios/edk2-aarch64-code.fd,read-only=true
>>  \
>>            -blockdev 
>> node-name=efivars,driver=file,filename=$HOME/images/qemu-arm64-efivars \
>>            -m 8192 \
>>            -object memory-backend-memfd,id=mem,size=8G,share=on \
>>            -device virtio-gpu-gl-pci,hostmem=4G,blob=true,venus=true \
>>            -display gtk,gl=on,show-cursor=on -vga none \
>>            -device qemu-xhci -device usb-kbd -device usb-tablet
>> 
>> I get a boot up with a lot of:
>> 
>>                                                                              
>>                                                                              
>>                    
>>   (qemu:1545322): Gdk-WARNING **: 09:26:09.470: eglMakeCurrent failed        
>>                                                                              
>>                      
>> 
>>   (qemu:1545322): Gdk-WARNING **: 09:26:09.470: eglMakeCurrent failed        
>>                                                                              
>>                      
>> 
>>   (qemu:1545322): Gdk-WARNING **: 09:26:09.470: eglMakeCurrent failed        
>>                                                                              
>>                      
>> 
>>   (qemu:1545322): Gdk-WARNING **: 09:26:09.470: eglMakeCurrent failed        
>>        
>> 
>> In the guest I run:
>> 
>>   meson devenv -C /root/lsrc/graphics/mesa.git/build fish
>> 
>> to bring in the latest Mesa (with virtio enabled). Running vulkaninfo
>> reports two cards:
>> 
>>   ==========                                                                 
>>            
>>   VULKANINFO                        
>>   ==========                                                                 
>>            
>> 
>>   Vulkan Instance Version: 1.3.280                                           
>>            
>> 
>> 
>>   Instance Extensions: count = 14
>>   -------------------------------
>>   VK_EXT_debug_report                    : extension revision 10
>>   VK_EXT_debug_utils                     : extension revision 2
>>   VK_EXT_headless_surface                : extension revision 1
>>   VK_KHR_device_group_creation           : extension revision 1
>>   VK_KHR_external_fence_capabilities     : extension revision 1
>>   VK_KHR_external_memory_capabilities    : extension revision 1
>>   VK_KHR_external_semaphore_capabilities : extension revision 1
>>   VK_KHR_get_physical_device_properties2 : extension revision 2
>>   VK_KHR_get_surface_capabilities2       : extension revision 1
>>   VK_KHR_portability_enumeration         : extension revision 1
>>   VK_KHR_surface                         : extension revision 25
>>   VK_KHR_surface_protected_capabilities  : extension revision 1
>>   VK_KHR_wayland_surface                 : extension revision 6
>>   VK_LUNARG_direct_driver_loading        : extension revision 1
>> 
>>   Instance Layers: count = 2
>>   --------------------------
>>   VK_LAYER_MESA_device_select Linux device selection layer 1.3.211  version 1
>>   VK_LAYER_MESA_overlay       Mesa Overlay layer           1.3.211
>>   version 1
>> 
>>   Devices:
>>   ========
>>   GPU0:
>>           apiVersion         = 1.3.230
>>           driverVersion      = 24.1.99
>>           vendorID           = 0x8086
>>           deviceID           = 0xa780
>>           deviceType         = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
>>           deviceName         = Virtio-GPU Venus (Intel(R) Graphics (RPL-S))
>>           driverID           = DRIVER_ID_MESA_VENUS
>>           driverName         = venus
>>           driverInfo         = Mesa 24.2.0-devel (git-0b582449f0)
>>           conformanceVersion = 1.3.0.0
>>           deviceUUID         = 29d2e940-a1a0-3054-0f9a-9f7dec52a084
>>           driverUUID         = 3694c390-f245-612a-12ce-7d3a99127622
>>   GPU1:
>>           apiVersion         = 1.2.0
>>           driverVersion      = 24.1.99
>>           vendorID           = 0x10005
>>           deviceID           = 0x0000
>>           deviceType         = PHYSICAL_DEVICE_TYPE_CPU
>>           deviceName         = Virtio-GPU Venus (llvmpipe (LLVM 15.0.6, 256 
>> bits))
>>           driverID           = DRIVER_ID_MESA_VENUS
>>           driverName         = venus
>>           driverInfo         = Mesa 24.2.0-devel (git-0b582449f0)
>>           conformanceVersion = 1.3.0.0
>>           deviceUUID         = 5fb5c03f-c537-f0fe-a7e6-9cd5866acb8d
>>           driverUUID         = 3694c390-f245-612a-12ce-7d3a99127622
>> 
>> Running weston and then vkcube-wayland reports its selecting "GPU 0:
>> Virtio-GPU Venus (Intel(R) Graphics (RPL-S))" but otherwise produces no
>> output.
>> 
>> If I run with "-display sdl,gl=on,show-cursor=on" and the same other
>> command line options the results for vulkaninfo are the same. However
>> vkcube-wayland gets a little further and draws the initial cube on the
>> screen before locking up with:
>> 
>>  MESA-VIRTIO: debug: stuck in fence wait with iter at xxxx
>> 
>> where xxxx grows each time it prints. On shutting down I see some virgl
>> errors interspersed with the systemd logs:
>> 
>>   [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 
>> (command 0x209)
>>   [  OK  ] Stopped systemd-logind.service - User Login Management.
>>   virtio_gpu_virgl_process_cmd: ctrl 0x209, error 0x1200
>>   [  475.257111] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* 
>> response 0x1200 (command 0x209)
>>   [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 
>> (command 0x209)
>>   [  OK  ] Stopped target network-online.target - Network is Online.
>>   [  OK  ] Stopped target remote-fs.target - Remote File Systems.
>>   [  OK  ] Stopped NetworkManager-wait-online…vice - Network Manager Wait 
>> Online.
>>            Stopping avahi-daemon.service - Avahi mDNS/DNS-SD Stack...
>>            Stopping cups.service - CUPS Scheduler...
>>            Stopping user-runtime-dir@0.servic…er Runtime Directory 
>> /run/user/0...
>>   [  OK  ] Stopped avahi-daemon.service - Avahi mDNS/DNS-SD Stack.
>>   [  OK  ] Stopped cups.service - CUPS Scheduler.
>>   virtio_gpu_virgl_process_cmd: ctrl 0x209, error 0x1200
>>   [  475.357543] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* 
>> response 0x1200 (command 0x209)
>>   [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 
>> (command 0x209)
>>   [  OK  ] Stopped target network.target - Network.
>>   [  OK  ] Stopped target nss-user-lookup.target - User and Group Name 
>> Lookups.
>>            Stopping NetworkManager.service - Network Manager...
>>            Stopping networking.service - Raise network interfaces...
>>            Stopping wpa_supplicant.service - WPA supplicant...
>>   [  OK  ] Stopped wpa_supplicant.service - WPA supplicant.
>>   virtio_gpu_virgl_process_cmd: ctrl 0x209, error 0x1200
>>   [  493.585261] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* 
>> response 0x1200 (command 0x209)
>>   [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 
>> (command 0x209)
>> 
>
> I've reproduced this with qemu-system-aarch64. Vkcube works for a second
> and then stops, Qemu compeltely gets frozen after closing and re-running
> vkcube. Doesn't feel like this is a problem with venus, but with arm64.
> For now don't know where is the bug, will take a closer look.

I'm guessing some sort of resource leak, if I run vkcube-wayland in the
guest it complains about being stuck on a fence with the iterator going
up. However on the host I see:

  virtio_gpu_fence_ctrl fence 0x13f1, type 0x207
  virtio_gpu_fence_ctrl fence 0x13f2, type 0x207
  virtio_gpu_fence_resp fence 0x13f1
  virtio_gpu_fence_resp fence 0x13f2
  virtio_gpu_fence_ctrl fence 0x13f3, type 0x207
  virtio_gpu_fence_ctrl fence 0x13f4, type 0x207
  virtio_gpu_fence_resp fence 0x13f3
  virtio_gpu_fence_resp fence 0x13f4
  virtio_gpu_fence_ctrl fence 0x13f5, type 0x207
  virtio_gpu_fence_ctrl fence 0x13f6, type 0x207
  virtio_gpu_fence_resp fence 0x13f5
  virtio_gpu_fence_resp fence 0x13f6
  virtio_gpu_fence_ctrl fence 0x13f7, type 0x207
  virtio_gpu_fence_ctrl fence 0x13f8, type 0x207
  virtio_gpu_fence_resp fence 0x13f7
  virtio_gpu_fence_resp fence 0x13f8
  virtio_gpu_fence_ctrl fence 0x13f9, type 0x204
  virtio_gpu_fence_resp fence 0x13f9

which looks like its going ok. However when I git Ctrl-C in the guest it
kills QEMU:

  virtio_gpu_fence_ctrl fence 0x13fc, type 0x207
  virtio_gpu_fence_ctrl fence 0x13fd, type 0x207
  virtio_gpu_fence_ctrl fence 0x13fe, type 0x204
  virtio_gpu_fence_ctrl fence 0x13ff, type 0x207
  virtio_gpu_fence_ctrl fence 0x1400, type 0x207
  virtio_gpu_fence_resp fence 0x13fc
  virtio_gpu_fence_resp fence 0x13fd
  virtio_gpu_fence_resp fence 0x13fe
  virtio_gpu_fence_resp fence 0x13ff
  virtio_gpu_fence_resp fence 0x1400
  qemu-system-aarch64: 
../../subprojects/virglrenderer/src/virglrenderer.c:1282: 
virgl_renderer_resource_unmap: Assertion `!ret' failed.
  fish: Job 1, './qemu-system-aarch64 \' terminated by signal     -machine 
type=virt,virtuali… (    -cpu neoverse-n1 \)
  fish: Job     -smp 4 \, '    -accel tcg \' terminated by signal     -device 
virtio-net-pci,netd… (    -device virtio-scsi-pci \)
  fish: Job     -device scsi-hd,drive=hd \, '    -netdev user,id=unet,hostfw…' 
terminated by signal     -blockdev driver=raw,node-n… (    -serial mon:stdio \)
  fish: Job     -blockdev node-name=rom,dri…, '    -blockdev 
node-name=efivars…' terminated by signal     -m 8192 \ (    -object 
memory-backend-memf…)
  fish: Job     -device virtio-gpu-gl-pci,h…, '    -display 
sdl,gl=on,show-cur…' terminated by signal     -device qemu-xhci -device u… (    
-kernel /home/alex/lsrc/lin…)
  fish: Job     -d guest_errors,unimp,trace…, 'SIGABRT' terminated by signal 
Abort ()

The backtrace (and the 18G size of the core file!) indicates a leak:

  (gdb) bt
  #0  __pthread_kill_implementation (threadid=<optimized out>, 
signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
  #1  0x00007f0fa68a9e8f in __pthread_kill_internal (signo=6, 
threadid=<optimized out>) at ./nptl/pthread_kill.c:78
  #2  0x00007f0fa685afb2 in __GI_raise (sig=sig@entry=6) at 
../sysdeps/posix/raise.c:26
  #3  0x00007f0fa6845472 in __GI_abort () at ./stdlib/abort.c:79
  #4  0x00007f0fa6845395 in __assert_fail_base
      (fmt=0x7f0fa69b9a90 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
assertion=assertion@entry=0x55c3e1b0762d "!ret", file=file@entry=0x55c3e1d306f0 
"../../subprojects/virglrenderer/src/virglrenderer.c", line=line@entry=1282, 
function=function@entry=0x55c3e1d30910 <__PRETTY_FUNCTION__.2> 
"virgl_renderer_resource_unmap") at ./assert/assert.c:92
  #5  0x00007f0fa6853eb2 in __GI___assert_fail
      (assertion=assertion@entry=0x55c3e1b0762d "!ret", 
file=file@entry=0x55c3e1d306f0 
"../../subprojects/virglrenderer/src/virglrenderer.c", line=line@entry=1282, 
function=function@entry=0x55c3e1d30910 <__PRETTY_FUNCTION__.2> 
"virgl_renderer_resource_unmap") at ./assert/assert.c:101
  #6  0x000055c3e1958b50 in virgl_renderer_resource_unmap 
(res_handle=<optimized out>) at 
../../subprojects/virglrenderer/src/virglrenderer.c:1282
  #7  0x000055c3e13d8507 in virtio_gpu_virgl_unmap_resource_blob 
(g=g@entry=0x55c3e5fed600, res=0x55c3e6e67b60, 
cmd_suspended=cmd_suspended@entry=0x7ffd5d720aaf)
      at ../../hw/display/virtio-gpu-virgl.c:188
  #8  0x000055c3e13d9af4 in virgl_cmd_resource_unmap_blob 
(cmd_suspended=0x7ffd5d720aaf, cmd=0x55c3e5bd9710, g=0x55c3e5fed600) at 
../../hw/display/virtio-gpu-virgl.c:797
  #9  virtio_gpu_virgl_process_cmd (g=0x55c3e5fed600, cmd=0x55c3e5bd9710) at 
../../hw/display/virtio-gpu-virgl.c:979
  #10 0x000055c3e13d6019 in virtio_gpu_process_cmdq (g=0x55c3e5fed600) at 
../../hw/display/virtio-gpu.c:1055
  #11 0x000055c3e190c646 in aio_bh_poll (ctx=ctx@entry=0x55c3e4c03710) at 
../../util/async.c:218
  #12 0x000055c3e18f562e in aio_dispatch (ctx=0x55c3e4c03710) at 
../../util/aio-posix.c:423
  #13 0x000055c3e190c2ce in aio_ctx_dispatch (source=<optimized out>, 
callback=<optimized out>, user_data=<optimized out>) at ../../util/async.c:360
  #14 0x00007f0fa8b047a9 in g_main_context_dispatch () at 
/lib/x86_64-linux-gnu/libglib-2.0.so.0
  #15 0x000055c3e190db78 in glib_pollfds_poll () at ../../util/main-loop.c:287
  #16 os_host_main_loop_wait (timeout=1882878) at ../../util/main-loop.c:310
  #17 main_loop_wait (nonblocking=nonblocking@entry=0) at 
../../util/main-loop.c:589
  #18 0x000055c3e1348ac9 in qemu_main_loop () at ../../system/runstate.c:796
  #19 0x000055c3e174f786 in qemu_default_main () at ../../system/main.c:37
  #20 0x00007f0fa684624a in __libc_start_call_main 
(main=main@entry=0x55c3e10286e0 <main>, argc=argc@entry=47, 
argv=argv@entry=0x7ffd5d720f18)
      at ../sysdeps/nptl/libc_start_call_main.h:58
  #21 0x00007f0fa6846305 in __libc_start_main_impl
      (main=0x55c3e10286e0 <main>, argc=47, argv=0x7ffd5d720f18, 
init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
stack_end=0x7ffd5d720f08)
      at ../csu/libc-start.c:360
  #22 0x000055c3e102a3f1 in _start ()

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Reply via email to