On 6/21/24 11:59, Alex Bennée wrote:
> Dmitry Osipenko <dmitry.osipe...@collabora.com> writes:
> 
>> On 6/19/24 20:37, Alex Bennée wrote:
>>> So I've been experimenting with Aarch64 TCG with an Intel backend like
>>> this:
>>>
>>> ./qemu-system-aarch64 \
>>>            -M virt -cpu cortex-a76 \
>>>            -device virtio-net-pci,netdev=unet \
>>>            -netdev user,id=unet,hostfwd=tcp::2222-:22 \
>>>            -m 8192 \
>>>            -object memory-backend-memfd,id=mem,size=8G,share=on \
>>>            -serial mon:stdio \
>>>            -kernel 
>>> ~/lsrc/linux.git/builds/arm64.initramfs/arch/arm64/boot/Image \
>>>            -append "console=ttyAMA0" \
>>>            -device qemu-xhci -device usb-kbd -device usb-tablet \
>>>            -device virtio-gpu-gl-pci,blob=true,venus=true,hostmem=4G \
>>>            -display sdl,gl=on -d 
>>> plugin,guest_errors,trace:virtio_gpu_cmd_res_create_blob,trace:virtio_gpu_cmd_res_back_\*,trace:virtio_gpu_cmd_res_xfer_toh_3d,trace:virtio_gpu_cmd_res_xfer_fromh_3d,trace:address_space_map
>>>  
>>>
>>> And I've noticed a couple of things. First trying to launch vkmark to
>>> run a KMS mode test fails with:
>>>
>> ...
>>>   virgl_render_server[1875931]: vkr: failed to import resource: invalid 
>>> res_id 5
>>>   virgl_render_server[1875931]: vkr: vkAllocateMemory resulted in CS error 
>>>   virgl_render_server[1875931]: vkr: ring_submit_cmd: vn_dispatch_command 
>>> failed
>>>
>>> More interestingly when shutting stuff down we see weirdness like:
>>>
>>>   address_space_map as:0x561b48ec48c0 addr 0x1008ac4b0:18 write:1 attrs:0x1 
>>>                                                                             
>>>                        
>>>   virgl_render_server[1875931]: vkr: destroying context 3 (vkmark) with a 
>>> valid instance                                                              
>>>                          
>>>   virgl_render_server[1875931]: vkr: destroying device with valid objects   
>>>                                                                             
>>>                        
>>>   vkr_context_remove_object: -7438602987017907480                           
>>>                                                                             
>>>                        
>>>   vkr_context_remove_object: 7                                              
>>>                                                                             
>>>                        
>>>   vkr_context_remove_object: 5       
>>>
>>> which indicates something has gone very wrong. I'm not super familiar
>>> with the memory allocation patterns but should stuff that is done as
>>> virtio_gpu_cmd_res_back_attach() be find-able in the list of resources?
>>
>> This is expected to fail. Vkmark creates shmem virgl GBM FB BO on guest
>> that isn't exportable on host. AFAICT, more code changes should be
>> needed to support this case.
> 
> There are a lot of acronyms there. If this is pure guest memory why
> isn't it exportable to the host? Or should the underlying mesa library
> be making sure the allocation happens from the shared region?
> 
> Is vkmark particularly special here?

Actually, you could get it to work to a some degree if you'll compile
virglrenderer with -Dminigbm_allocation=true. On host use GTK/Wayland
display.

Vkmark isn't special. It's virglrenderer that has a room for
improvement. ChromeOS doesn't use KMS in VMs, proper KMS support was
never a priority for Venus.

>> Note that "destroying device with valid objects" msg is fine, won't hurt
>> to silence it in Venus to avoid confusion. It will happen every time
>> guest application is closed without explicitly releasing every VK
>> object.
> 
> I was more concerned with:
> 
>>>   vkr_context_remove_object: -7438602987017907480                           
>>>                                                                             
>>>                        
> 
> which looks like a corruption of the object ids (or maybe an offby one)

At first this appeared to be a valid value, otherwise venus should've
crashed Qemu with a debug-assert if ID was invalid. But I never see such
odd IDs with my testing.

>>> I tried running under RR to further debug but weirdly I can't get
>>> working graphics with that. I did try running under threadsan which
>>> complained about a potential data race:
>>>
>>>   vkr_context_add_object: 1 -> 0x7b2c00000288
>>>   vkr_context_add_object: 2 -> 0x7b2c00000270
>>>   vkr_context_add_object: 3 -> 0x7b3800007f28
>>>   vkr_context_add_object: 4 -> 0x7b3800007fa0
>>>   vkr_context_add_object: 5 -> 0x7b48000103f8
>>>   vkr_context_add_object: 6 -> 0x7b48000104a0
>>>   vkr_context_add_object: 7 -> 0x7b4800010440
>>>   virtio_gpu_cmd_res_back_attach res 0x5
>>>   virtio_gpu_cmd_res_back_attach res 0x6
>>>   vkr_context_add_object: 8 -> 0x7b48000103e0
>>>   virgl_render_server[1751430]: vkr: failed to import resource: invalid 
>>> res_id 5
>>>   virgl_render_server[1751430]: vkr: vkAllocateMemory resulted in CS error
>>>   virgl_render_server[1751430]: vkr: ring_submit_cmd: vn_dispatch_command 
>>> failed
>>>   ==================
>>>   WARNING: ThreadSanitizer: data race (pid=1751256)
>>>     Read of size 8 at 0x7f7fa0ea9138 by main thread (mutexes: write M0):
>>>       #0 memcpy <null> (qemu-system-aarch64+0x41fede) (BuildId: 
>>> 0bab171e77cb6782341ee3407e44af7267974025)
>> ..
>>>   ==================
>>>   SUMMARY: ThreadSanitizer: data race 
>>> (/home/alex/lsrc/qemu.git/builds/system.threadsan/qemu-system-aarch64+0x41fede)
>>>  (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) in __interceptor_memcpy
>>>
>>> This could be a false positive or it could be a race between the guest
>>> kernel clearing memory while we are still doing
>>> virtio_gpu_ctrl_response.
>>>
>>> What do you think?
>>
>> The memcpy warning looks a bit suspicion, but likely is harmless. I
>> don't see such warning with TSAN and x86 VM.
> 
> TSAN can only pick up these interactions with TCG guests because it can
> track guest memory accesses. With a KVM guest we have no visibility of
> the guest accesses. 

I couldn't reproduce this issue with my KVM/TCG/ARM64 setups. Fox x86 I
checked both KVM and TCG, TSAN only warns about vitio-net memcpy's for me.

-- 
Best regards,
Dmitry


Reply via email to