On Thu, Apr 6, 2023 at 4:17 PM Maxime Coquelin <maxime.coque...@redhat.com> wrote: > > Hi Yongji, > > On 4/6/23 05:44, Yongji Xie wrote: > > Hi Maxime, > > > > On Fri, Mar 31, 2023 at 11:43 PM Maxime Coquelin > > <maxime.coque...@redhat.com> wrote: > >> > >> This series introduces a new type of backend, VDUSE, > >> to the Vhost library. > >> > >> VDUSE stands for vDPA device in Userspace, it enables > >> implementing a Virtio device in userspace and have it > >> attached to the Kernel vDPA bus. > >> > >> Once attached to the vDPA bus, it can be used either by > >> Kernel Virtio drivers, like virtio-net in our case, via > >> the virtio-vdpa driver. Doing that, the device is visible > >> to the Kernel networking stack and is exposed to userspace > >> as a regular netdev. > >> > >> It can also be exposed to userspace thanks to the > >> vhost-vdpa driver, via a vhost-vdpa chardev that can be > >> passed to QEMU or Virtio-user PMD. > >> > >> While VDUSE support is already available in upstream > >> Kernel, a couple of patches are required to support > >> network device type: > >> > >> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_poc > >> > >> In order to attach the created VDUSE device to the vDPA > >> bus, a recent iproute2 version containing the vdpa tool is > >> required. > >> > >> Usage: > >> ====== > >> > >> 1. Probe required Kernel modules > >> # modprobe vdpa > >> # modprobe vduse > >> # modprobe virtio-vdpa > >> > >> 2. Build (require vduse kernel headers to be available) > >> # meson build > >> # ninja -C build > >> > >> 3. Create a VDUSE device (vduse0) using Vhost PMD with > >> testpmd (with 4 queue pairs in this example) > >> # ./build/app/dpdk-testpmd --no-pci > >> --vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9 -- -i > >> --txq=4 --rxq=4 > >> > >> 4. Attach the VDUSE device to the vDPA bus > >> # vdpa dev add name vduse0 mgmtdev vduse > >> => The virtio-net netdev shows up (eth0 here) > >> # ip l show eth0 > >> 21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP > >> mode DEFAULT group default qlen 1000 > >> link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff > >> > >> 5. Start/stop traffic in testpmd > >> testpmd> start > >> testpmd> show port stats 0 > >> ######################## NIC statistics for port 0 > >> ######################## > >> RX-packets: 11 RX-missed: 0 RX-bytes: 1482 > >> RX-errors: 0 > >> RX-nombuf: 0 > >> TX-packets: 1 TX-errors: 0 TX-bytes: 62 > >> > >> Throughput (since last show) > >> Rx-pps: 0 Rx-bps: 0 > >> Tx-pps: 0 Tx-bps: 0 > >> > >> ############################################################################ > >> testpmd> stop > >> > >> 6. Detach the VDUSE device from the vDPA bus > >> # vdpa dev del vduse0 > >> > >> 7. Quit testpmd > >> testpmd> quit > >> > >> Known issues & remaining work: > >> ============================== > >> - Fix issue in FD manager (still polling while FD has been removed) > >> - Add Netlink support in Vhost library > >> - Support device reconnection > >> - Support packed ring > >> - Enable & test more Virtio features > >> - Provide performance benchmark results > >> > > > > Nice work! Thanks for bringing VDUSE to the network area. I wonder if > > you have some plan to support userspace memory registration [1]? I > > think this feature can benefit the performance since an extra data > > copy could be eliminated in our case. > > I plan to have a closer look later, once VDUSE support will be added. > I think it will be difficult to support it in the case of DPDK for > networking: > > - For dequeue path it would be basically re-introducing dequeue zero- > copy support that we removed some time ago. It was a hack where we > replaced the regular mbuf buffer with the descriptor one, increased the > reference counter, and at next dequeue API calls checked if the former > mbufs ref counter is 1 and restore the mbuf. Issue is that physical NIC > drivers usually release sent mbufs by pool, once a certain threshold is > met. So it can cause draining of the virtqueue as the descs are not > written back into the used ring for quite some time, depending on the > NIC/traffic/... >
OK, I see. Could this issue be mitigated by releasing sent mbufs one by one once we sent it out or simply increasing the virtqueue size? > - For enqueue path, I don't think this is possible with virtual switches > by design, as when a mbuf is received on a physical port, we don't know > in which Vhost/VDUSE port it will be switched to. And for VM to VM > communication, should it use the src VM buffer or the dest VM one? > Yes, I agree that it's hard to achieve that in the enqueue path. > Only case it could work is if you had a simple forwarder between a VDUSE > device and a physical port. But I don't think there is much interest in > such use-case. > OK, I get it. Thanks, Yongji