Dear Stefan, I hope you are the right person to contact about this, please point me to the right direction otherwise. I am also Cc:ing the SPDK and qemu-devel mailing lists, to solicit community feedback.
As part of my internship at Arrikto, I have spent the last few months working on the SPDK vhost target application. I was triggered by the “VirtioVhostUser” feature you proposed for QEMU [https://wiki.qemu.org/Features/VirtioVhostUser] and made my end goal to have an end-to-end system running, where a slave VM offers storage to a master VM over vhost-user, and exposes an underlying SCSI block device underneath. My current approach is to use virtio-scsi-based storage inside the slave VM. I see that you have managed to move the vhost-user backend inside a VM over a virtio-vhost-user transport. I have experimented with running the SPDK vhost app over vhost-user, but have run with quite a few problems with the virtio-pci driver. Apologies in advance for the rather lengthy email, I would definitely value any short-term hints you may have, as well as any longer-term feedback you may offer on my general direction. My current state is: I started with your DPDK code at https://github.com/stefanha/dpdk/tree/virtio-vhost-user, and read about your effort to integrate the DPDK vhost-scsi application with virtio-vhost-user, here: http://mails.dpdk.org/archives/dev/2018-January/088155.html My initial approach was to replicate your work, but with the SPDK vhost library running over virtio-vhost-user. I have pushed all of my code in the following repository, it is still a WIP and I really need to tidy up the commits: https://bitbucket.org/ndragazis/spdk.git Hacks I had to do: - I use the modified script usertools/dpdk-devbind.py found in your DPDK repository here: https://github.com/stefanha/dpdk to bind the virtio-vhost-user device to the vfio-pci kernel driver. The SPDK setup script in scripts/setup.sh does not handle unclassified devices like the virtio-vhost-user device. I plan to fix this later. - I pass the PCI address of the virtio-vhost-user device to the vhost library, by repurposing the existing -S option; it no longer refers to the UNIX socket, as in the case of the UNIX transport. This means the virtio-vhost-user transport is hardcoded and not configurable by the user. I plan to fix this later. - I copied your code that implements the virtio-vhost-user transport and made the necessary changes to abstract the transport implementation. I also copied the virtio-pci code from DPDK rte_vhost into the SPDK vhost library, so the virtio-vhost-user driver could use it. I saw this is what you did as a quick hack to make the DPDK vhost-scsi application handle the virtio-vhost-user device. Having done that, I tried to demo my integration end-to-end, and everything worked fine with a Malloc block device, but things broke when I switched to a virtio-scsi block device inside the slave. My attempts to call construct_vhost_scsi_controller failed with an I/O error. Here is the log: -- cut here -- $ export VVU_DEVICE="0000:00:06.0" $ sudo modprobe vfio enable_unsafe_noiommu_mode=1 $ sudo modprobe vfio-pci $ sudo ./dpdk-devbind.py -b vfio-pci $VVU_DEVICE $ cd spdk $ sudo scripts/setup.sh Active mountpoints on /dev/vda, so not binding PCI dev 0000:00:04.0 0000:00:05.0 (1af4 1004): virtio-pci -> vfio-pci $ sudo app/vhost/vhost -S "$VVU_DEVICE" -m 0x3 & [1] 3917 $ Starting SPDK v18.07-pre / DPDK 18.02.0 initialization... [ DPDK EAL parameters: vhost -c 0x3 -m 1024 --file-prefix=spdk_pid3918 ] EAL: Multi-process socket /var/run/.spdk_pid3918_unix EAL: Probing VFIO support... EAL: VFIO support initialized EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using unreliable clock cycles ! EAL: PCI device 0000:00:06.0 on NUMA socket -1 EAL: Invalid NUMA socket, default to 0 EAL: probe driver: 1af4:1017 virtio_vhost_user EAL: using IOMMU type 8 (No-IOMMU) EAL: Ignore mapping IO port bar(0) VIRTIO_PCI_CONFIG: found modern virtio pci device. VIRTIO_PCI_CONFIG: modern virtio pci detected. VHOST_CONFIG: Added virtio-vhost-user device at 0000:00:06.0 $ sudo scripts/rpc.py construct_virtio_pci_scsi_bdev 0000:00:05.0 VirtioScsi0 EAL: PCI device 0000:00:05.0 on NUMA socket -1 EAL: Invalid NUMA socket, default to 0 EAL: probe driver: 1af4:1004 spdk_virtio EAL: Ignore mapping IO port bar(0) [ "VirtioScsi0t0" ] $ sudo scripts/rpc.py construct_vhost_scsi_controller --cpumask 0x1 vhost.0 VHOST_CONFIG: BAR 2 not availabled Got JSON-RPC error response request: { "params": { "cpumask": "0x1", "ctrlr": "vhost.0" }, "jsonrpc": "2.0", "method": "construct_vhost_scsi_controller", "id": 1 } response: { "message": "Input/output error", "code": -32602 } -- cut here -- This was really painful to debug. I managed to find the cause yesterday, I had bumped into this DPDK bug: https://bugs.dpdk.org/show_bug.cgi?id=85 and I worked around it, essentially by short-circuiting the point where the DPDK runtime rescans the PCI bus and corrupts the dev->mem_resource[] field for the already-mapped-in-userspace virtio-vhost-user PCI device. I just commented out this line: https://github.com/spdk/dpdk/blob/08332d13b3a66cb1a8c3a184def76b039052d676/drivers/bus/pci/linux/pci.c#L355 This seems to be a good enough workaround for now. I’m not sure this bug has been fixed, I will comment on the DPDK bugzilla. But, now, I have really hit a roadblock. I get a segfault, I run the exact same commands as shown above, and end up with this backtrace: -- cut here -- #0 0x000000000046ae42 in spdk_bdev_get_io (channel=0x30) at bdev.c:920 #1 0x000000000046c985 in spdk_bdev_readv_blocks (desc=0x93f8a0, ch=0x0, iov=0x7ffff2fb7c88, iovcnt=1, offset_blocks=0, num_blocks=8, cb=0x453e1a <spdk_bdev_scsi_task_complete_cmd>, cb_arg=0x7ffff2fb7bc0) at bdev.c:1696 #2 0x000000000046c911 in spdk_bdev_readv (desc=0x93f8a0, ch=0x0, iov=0x7ffff2fb7c88, iovcnt=1, offset=0, nbytes=4096, cb=0x453e1a <spdk_bdev_scsi_task_complete_cmd>, cb_arg=0x7ffff2fb7bc0) at bdev.c:1680 #3 0x0000000000453fe2 in spdk_bdev_scsi_read (bdev=0x941c80, bdev_desc=0x93f8a0, bdev_ch=0x0, task=0x7ffff2fb7bc0, lba=0, len=8) at scsi_bdev.c:1317 #4 0x000000000045462e in spdk_bdev_scsi_readwrite (task=0x7ffff2fb7bc0, lba=0, xfer_len=8, is_read=true) at scsi_bdev.c:1477 #5 0x0000000000454c95 in spdk_bdev_scsi_process_block (task=0x7ffff2fb7bc0) at scsi_bdev.c:1662 #6 0x00000000004559ce in spdk_bdev_scsi_execute (task=0x7ffff2fb7bc0) at scsi_bdev.c:2029 #7 0x00000000004512e4 in spdk_scsi_lun_execute_task (lun=0x93f830, task=0x7ffff2fb7bc0) at lun.c:162 #8 0x0000000000450a87 in spdk_scsi_dev_queue_task (dev=0x713c80 <g_devs>, task=0x7ffff2fb7bc0) at dev.c:264 #9 0x000000000045ae48 in task_submit (task=0x7ffff2fb7bc0) at vhost_scsi.c:268 #10 0x000000000045c2b8 in process_requestq (svdev=0x7ffff31d9dc0, vq=0x7ffff31d9f40) at vhost_scsi.c:649 #11 0x000000000045c4ad in vdev_worker (arg=0x7ffff31d9dc0) at vhost_scsi.c:685 #12 0x00000000004797f2 in _spdk_reactor_run (arg=0x944540) at reactor.c:471 #13 0x0000000000479dad in spdk_reactors_start () at reactor.c:633 #14 0x00000000004783b1 in spdk_app_start (opts=0x7fffffffe390, start_fn=0x404df8 <vhost_started>, arg1=0x0, arg2=0x0) at app.c:570 #15 0x0000000000404ec0 in main (argc=7, argv=0x7fffffffe4f8) at vhost.c:115 -- cut here -- I have not yet been able to debug this, it’s most probably my bug, but I am wondering whether there could be a conflict between the two distinct virtio drivers: (1) the pre-existing one in the SPDK virtio library under lib/virtio/, and (2) the one I copied into lib/vhost/rte_vhost/ as part of the vhost library. I understand that even if I make it work for now, this cannot be a long-term solution. I would like to re-use the pre-existing virtio-pci code from the virtio library to support virtio-vhost-user. Do you see any potential problems in this? Did you change the virtio code that you placed inside rte_vhost? It seems there are subtle differences between the two codebases. These are my short-term issues. On the longer term, I’d be happy to contribute to VirtioVhostUser development any way I can. I have seen some TODOs in your QEMU code here: https://github.com/stefanha/qemu/blob/virtio-vhost-user/hw/virtio/virtio-vhost-user.c and I would like to contribute, but it’s not obvious to me what progress you’ve made since. As an example, I’d love to explore the possibility of adding support for interrupt-driven vhost-user backends over the virtio-vhost-user transport. To summarize: - I will follow up on the DPDK bug here: https://bugs.dpdk.org/show_bug.cgi?id=85 about a proposed fix. - Any hints on my segfault? I will definitely continue troubleshooting. - Once I’ve sorted this out, how can I start using a single copy of the virtio-pci codebase? I guess I have to make some changes to comply with the API and check the dependencies. - My current plan to contribute towards an IRQ-based implementation of the virtio-vhost-user transport would be to use the vhost-user kick file descriptors as a trigger to insert virtual interrupts and handle them in userspace. The virtio-vhost-user device could exploit the irqfd mechanism of the KVM for this purpose. I will keep you and the list posted on this, I would appreciate any early feedback you may have. Looking forward to any comments/feedback/pointers you may have. I am rather inexperienced with this stuff, but it’s definitely exciting and I’d love to contribute more to QEMU and SPDK. Thank you for reading this far, Nikos -- Nikos Dragazis Undergraduate Student School of Electrical and Computer Engineering National Technical University of Athens