On 6/21/24 11:59, Alex Bennée wrote: > Dmitry Osipenko <dmitry.osipe...@collabora.com> writes: > >> On 6/19/24 20:37, Alex Bennée wrote: >>> So I've been experimenting with Aarch64 TCG with an Intel backend like >>> this: >>> >>> ./qemu-system-aarch64 \ >>> -M virt -cpu cortex-a76 \ >>> -device virtio-net-pci,netdev=unet \ >>> -netdev user,id=unet,hostfwd=tcp::2222-:22 \ >>> -m 8192 \ >>> -object memory-backend-memfd,id=mem,size=8G,share=on \ >>> -serial mon:stdio \ >>> -kernel >>> ~/lsrc/linux.git/builds/arm64.initramfs/arch/arm64/boot/Image \ >>> -append "console=ttyAMA0" \ >>> -device qemu-xhci -device usb-kbd -device usb-tablet \ >>> -device virtio-gpu-gl-pci,blob=true,venus=true,hostmem=4G \ >>> -display sdl,gl=on -d >>> plugin,guest_errors,trace:virtio_gpu_cmd_res_create_blob,trace:virtio_gpu_cmd_res_back_\*,trace:virtio_gpu_cmd_res_xfer_toh_3d,trace:virtio_gpu_cmd_res_xfer_fromh_3d,trace:address_space_map >>> >>> >>> And I've noticed a couple of things. First trying to launch vkmark to >>> run a KMS mode test fails with: >>> >> ... >>> virgl_render_server[1875931]: vkr: failed to import resource: invalid >>> res_id 5 >>> virgl_render_server[1875931]: vkr: vkAllocateMemory resulted in CS error >>> virgl_render_server[1875931]: vkr: ring_submit_cmd: vn_dispatch_command >>> failed >>> >>> More interestingly when shutting stuff down we see weirdness like: >>> >>> address_space_map as:0x561b48ec48c0 addr 0x1008ac4b0:18 write:1 attrs:0x1 >>> >>> >>> virgl_render_server[1875931]: vkr: destroying context 3 (vkmark) with a >>> valid instance >>> >>> virgl_render_server[1875931]: vkr: destroying device with valid objects >>> >>> >>> vkr_context_remove_object: -7438602987017907480 >>> >>> >>> vkr_context_remove_object: 7 >>> >>> >>> vkr_context_remove_object: 5 >>> >>> which indicates something has gone very wrong. I'm not super familiar >>> with the memory allocation patterns but should stuff that is done as >>> virtio_gpu_cmd_res_back_attach() be find-able in the list of resources? >> >> This is expected to fail. Vkmark creates shmem virgl GBM FB BO on guest >> that isn't exportable on host. AFAICT, more code changes should be >> needed to support this case. > > There are a lot of acronyms there. If this is pure guest memory why > isn't it exportable to the host? Or should the underlying mesa library > be making sure the allocation happens from the shared region? > > Is vkmark particularly special here?
Actually, you could get it to work to a some degree if you'll compile virglrenderer with -Dminigbm_allocation=true. On host use GTK/Wayland display. Vkmark isn't special. It's virglrenderer that has a room for improvement. ChromeOS doesn't use KMS in VMs, proper KMS support was never a priority for Venus. >> Note that "destroying device with valid objects" msg is fine, won't hurt >> to silence it in Venus to avoid confusion. It will happen every time >> guest application is closed without explicitly releasing every VK >> object. > > I was more concerned with: > >>> vkr_context_remove_object: -7438602987017907480 >>> >>> > > which looks like a corruption of the object ids (or maybe an offby one) At first this appeared to be a valid value, otherwise venus should've crashed Qemu with a debug-assert if ID was invalid. But I never see such odd IDs with my testing. >>> I tried running under RR to further debug but weirdly I can't get >>> working graphics with that. I did try running under threadsan which >>> complained about a potential data race: >>> >>> vkr_context_add_object: 1 -> 0x7b2c00000288 >>> vkr_context_add_object: 2 -> 0x7b2c00000270 >>> vkr_context_add_object: 3 -> 0x7b3800007f28 >>> vkr_context_add_object: 4 -> 0x7b3800007fa0 >>> vkr_context_add_object: 5 -> 0x7b48000103f8 >>> vkr_context_add_object: 6 -> 0x7b48000104a0 >>> vkr_context_add_object: 7 -> 0x7b4800010440 >>> virtio_gpu_cmd_res_back_attach res 0x5 >>> virtio_gpu_cmd_res_back_attach res 0x6 >>> vkr_context_add_object: 8 -> 0x7b48000103e0 >>> virgl_render_server[1751430]: vkr: failed to import resource: invalid >>> res_id 5 >>> virgl_render_server[1751430]: vkr: vkAllocateMemory resulted in CS error >>> virgl_render_server[1751430]: vkr: ring_submit_cmd: vn_dispatch_command >>> failed >>> ================== >>> WARNING: ThreadSanitizer: data race (pid=1751256) >>> Read of size 8 at 0x7f7fa0ea9138 by main thread (mutexes: write M0): >>> #0 memcpy <null> (qemu-system-aarch64+0x41fede) (BuildId: >>> 0bab171e77cb6782341ee3407e44af7267974025) >> .. >>> ================== >>> SUMMARY: ThreadSanitizer: data race >>> (/home/alex/lsrc/qemu.git/builds/system.threadsan/qemu-system-aarch64+0x41fede) >>> (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) in __interceptor_memcpy >>> >>> This could be a false positive or it could be a race between the guest >>> kernel clearing memory while we are still doing >>> virtio_gpu_ctrl_response. >>> >>> What do you think? >> >> The memcpy warning looks a bit suspicion, but likely is harmless. I >> don't see such warning with TSAN and x86 VM. > > TSAN can only pick up these interactions with TCG guests because it can > track guest memory accesses. With a KVM guest we have no visibility of > the guest accesses. I couldn't reproduce this issue with my KVM/TCG/ARM64 setups. Fox x86 I checked both KVM and TCG, TSAN only warns about vitio-net memcpy's for me. -- Best regards, Dmitry