Hi, I have a small update.
On Monday, September 16, 2024 10:04:28 AM GMT+5:30 Sahil wrote: > On Thursday, September 12, 2024 3:24:27 PM GMT+5:30 Eugenio Perez Martin > wrote: > [...] > > The function that gets the features from vhost-vdpa in QEMU is > > hw/virtio/vhost-vdpa.c:vhost_vdpa_get_features. Can you check that it > > returns bit 34 (offset starts with 0 here)? If it returns it, can you > > keep debugging until you see what clears it? > > > > If it comes clear, then we need to check the kernel. > > Got it. I'll start debugging from here. I am printing the value of "*features & (1ULL << 34)" in hw/virtio/vhost-vdpa.c:vhost_vdpa_get_features and I see it is 1. I guess that means the vhost device has the packed feature bit turned on in L1. I am also printing out the values of "host_features", "guest_features" and "backend_features" set in "VirtIODevice vdev" in hw/virtio/virtio-pci.c:virtio_pci_common_read under "case VIRTIO_PCI_COMMON_DF". I observed the following values: dev name: virtio-net host features: 0x10150bfffa7 guest features: 0x0 backend features: 0x10150bfffa7 The host features and backend features match but guest features is 0. Is this because the value of guest features has not been set yet or is it because the driver hasn't selected any of the features? I am not entirely sure but I think it's the former considering that the value of /sys/devices/pci0000:00/0000:00:07.0/virtio1/features is 0x10110afffa7. Please let me know if I am wrong. I found a few other issues as well. When I shut down the L2 VM, I get the following errors just after shutdown: qemu-system-x86_64: vhost VQ 0 ring restore failed: -1: Operation not permitted (1) qemu-system-x86_64: vhost VQ 1 ring restore failed: -1: Operation not permitted (1) qemu-system-x86_64: vhost VQ 2 ring restore failed: -1: Operation not permitted (1) This is printed in hw/virtio/vhost.c:vhost_virtqueue_stop. According to the comments, this is because the connection to the backend is broken. I booted L1 by running: $ ./qemu/build/qemu-system-x86_64 -enable-kvm \ -drive file=//home/valdaarhun/valdaarhun/qcow2_img/L1.qcow2,media=disk,if=virtio -net nic,model=virtio -net user,hostfwd=tcp::2222-:22 \ -device intel-iommu,snoop-control=on \ -device virtio-net-pci,netdev=net0,disable-legacy=on,disable-modern=off,iommu_platform=on,event_idx=off,packed=on,bus=pcie.0,addr=0x4 \ -netdev tap,id=net0,script=no,downscript=no \ -nographic \ -m 8G \ -smp 4 \ -M q35 \ -cpu host 2>&1 | tee vm.log And I booted L2 by running: # ./qemu/build/qemu-system-x86_64 \ -nographic \ -m 4G \ -enable-kvm \ -M q35 \ -drive file=//root/L2.qcow2,media=disk,if=virtio \ -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0 \ -device virtio-net-pci,netdev=vhost-vdpa0,disable-legacy=on,disable-modern=off,event_idx=off,bus=pcie.0,addr=0x7 \ -smp 4 \ -cpu host \ 2>&1 | tee vm.log Am I missing something here? When booting L2, I also noticed that the control flow does not enter the following "if" block in hw/virtio/vhost-vdpa.c:vhost_vdpa_init. if (dev->migration_blocker == NULL && !v->shadow_vqs_enabled) { ... vhost_svq_valid_features(features, &dev->migration_blocker); } So "vhost_svq_valid_features" is never called. According to the comments this is because the device was not started with x-svq=on. Could this be a result (or reason) of the broken connection to the backend? Is there a way to manually set this option? Thanks, Sahil