Dmitry Osipenko <dmitry.osipe...@collabora.com> writes:

> On 6/19/24 20:37, Alex Bennée wrote:
>> So I've been experimenting with Aarch64 TCG with an Intel backend like
>> this:
>> 
>> ./qemu-system-aarch64 \
>>            -M virt -cpu cortex-a76 \
>>            -device virtio-net-pci,netdev=unet \
>>            -netdev user,id=unet,hostfwd=tcp::2222-:22 \
>>            -m 8192 \
>>            -object memory-backend-memfd,id=mem,size=8G,share=on \
>>            -serial mon:stdio \
>>            -kernel 
>> ~/lsrc/linux.git/builds/arm64.initramfs/arch/arm64/boot/Image \
>>            -append "console=ttyAMA0" \
>>            -device qemu-xhci -device usb-kbd -device usb-tablet \
>>            -device virtio-gpu-gl-pci,blob=true,venus=true,hostmem=4G \
>>            -display sdl,gl=on -d 
>> plugin,guest_errors,trace:virtio_gpu_cmd_res_create_blob,trace:virtio_gpu_cmd_res_back_\*,trace:virtio_gpu_cmd_res_xfer_toh_3d,trace:virtio_gpu_cmd_res_xfer_fromh_3d,trace:address_space_map
>>  
>> 
>> And I've noticed a couple of things. First trying to launch vkmark to
>> run a KMS mode test fails with:
>> 
> ...
>>   virgl_render_server[1875931]: vkr: failed to import resource: invalid 
>> res_id 5
>>   virgl_render_server[1875931]: vkr: vkAllocateMemory resulted in CS error 
>>   virgl_render_server[1875931]: vkr: ring_submit_cmd: vn_dispatch_command 
>> failed
>> 
>> More interestingly when shutting stuff down we see weirdness like:
>> 
>>   address_space_map as:0x561b48ec48c0 addr 0x1008ac4b0:18 write:1 attrs:0x1  
>>                                                                              
>>                      
>>   virgl_render_server[1875931]: vkr: destroying context 3 (vkmark) with a 
>> valid instance                                                               
>>                         
>>   virgl_render_server[1875931]: vkr: destroying device with valid objects    
>>                                                                              
>>                      
>>   vkr_context_remove_object: -7438602987017907480                            
>>                                                                              
>>                      
>>   vkr_context_remove_object: 7                                               
>>                                                                              
>>                      
>>   vkr_context_remove_object: 5       
>> 
>> which indicates something has gone very wrong. I'm not super familiar
>> with the memory allocation patterns but should stuff that is done as
>> virtio_gpu_cmd_res_back_attach() be find-able in the list of resources?
>
> This is expected to fail. Vkmark creates shmem virgl GBM FB BO on guest
> that isn't exportable on host. AFAICT, more code changes should be
> needed to support this case.

There are a lot of acronyms there. If this is pure guest memory why
isn't it exportable to the host? Or should the underlying mesa library
be making sure the allocation happens from the shared region?

Is vkmark particularly special here?


> Note that "destroying device with valid objects" msg is fine, won't hurt
> to silence it in Venus to avoid confusion. It will happen every time
> guest application is closed without explicitly releasing every VK
> object.

I was more concerned with:

>>   vkr_context_remove_object: -7438602987017907480                            
>>                                                                              
>>                      

which looks like a corruption of the object ids (or maybe an offby one)

>
>> I tried running under RR to further debug but weirdly I can't get
>> working graphics with that. I did try running under threadsan which
>> complained about a potential data race:
>> 
>>   vkr_context_add_object: 1 -> 0x7b2c00000288
>>   vkr_context_add_object: 2 -> 0x7b2c00000270
>>   vkr_context_add_object: 3 -> 0x7b3800007f28
>>   vkr_context_add_object: 4 -> 0x7b3800007fa0
>>   vkr_context_add_object: 5 -> 0x7b48000103f8
>>   vkr_context_add_object: 6 -> 0x7b48000104a0
>>   vkr_context_add_object: 7 -> 0x7b4800010440
>>   virtio_gpu_cmd_res_back_attach res 0x5
>>   virtio_gpu_cmd_res_back_attach res 0x6
>>   vkr_context_add_object: 8 -> 0x7b48000103e0
>>   virgl_render_server[1751430]: vkr: failed to import resource: invalid 
>> res_id 5
>>   virgl_render_server[1751430]: vkr: vkAllocateMemory resulted in CS error
>>   virgl_render_server[1751430]: vkr: ring_submit_cmd: vn_dispatch_command 
>> failed
>>   ==================
>>   WARNING: ThreadSanitizer: data race (pid=1751256)
>>     Read of size 8 at 0x7f7fa0ea9138 by main thread (mutexes: write M0):
>>       #0 memcpy <null> (qemu-system-aarch64+0x41fede) (BuildId: 
>> 0bab171e77cb6782341ee3407e44af7267974025)
> ..
>>   ==================
>>   SUMMARY: ThreadSanitizer: data race 
>> (/home/alex/lsrc/qemu.git/builds/system.threadsan/qemu-system-aarch64+0x41fede)
>>  (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) in __interceptor_memcpy
>> 
>> This could be a false positive or it could be a race between the guest
>> kernel clearing memory while we are still doing
>> virtio_gpu_ctrl_response.
>> 
>> What do you think?
>
> The memcpy warning looks a bit suspicion, but likely is harmless. I
> don't see such warning with TSAN and x86 VM.

TSAN can only pick up these interactions with TCG guests because it can
track guest memory accesses. With a KVM guest we have no visibility of
the guest accesses. 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Reply via email to