This patchset only acts as a PoC to request the community for comments. This patchset is to provide high performance networking interface (virtio) for container-based DPDK applications. The way of starting DPDK applications in containers with ownership of NIC devices exclusively is beyond the scope. The basic idea here is to present a new virtual device (named eth_cvio), which can be discovered and initialized in container-based DPDK applications rte_eal_init(). To minimize the change, we reuse already-existing virtio frontend driver code (driver/net/virtio/).
Compared to QEMU/VM case, virtio device framework (translates I/O port r/w operations into unix socket/cuse protocol, which is originally provided in QEMU), is integrated in virtio frontend driver. Aka, this new converged driver actually plays the role of original frontend driver and the role of QEMU device framework. The biggest difference here lies in how to calculate relative address for backend. The principle of virtio is that: based on one or multiple shared memory segments, vhost maintains a reference system with the base addresses and length of these segments so that an address from VM comes (usually GPA, Guest Physical Address), vhost can translate it into self-recognizable address (aka VVA, Vhost Virtual Address). To decrease the overhead of address translation, we should maintain as few segments as better. In the context of virtual machines, GPA is always locally continuous. So it's a good choice. In container's case, CVA (Container Virtual Address) can be used. This means that: a. when set_base_addr, CVA address is used; b. when preparing RX's descriptors, CVA address is used; c. when transmitting packets, CVA is filled in TX's descriptors; d. in TX and CQ's header, CVA is used. How to share memory? In VM's case, qemu always shares all physical layout to backend. But it's not feasible for a container, as a process, to share all virtual memory regions to backend. So only specified virtual memory regions (type is shared) are sent to backend. It leads to a limitation that only addresses in these areas can be used to transmit or receive packets. For now, the shared memory is created in /dev/shm using shm_open() in the memory initialization process. How to use? a. Apply the patch of virtio for container. We need two copies of patched code (referred as dpdk-app/ and dpdk-vhost/) b. To compile container apps: $: cd dpdk-app $: vim config/common_linuxapp (uncomment "CONFIG_RTE_VIRTIO_VDEV=y") $: make config RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc $: make install RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc $: make -C examples/l2fwd RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc c. To build a docker image using Dockerfile below. $: cat ./Dockerfile FROM ubuntu:latest WORKDIR /usr/src/dpdk COPY . /usr/src/dpdk CMD ["/usr/src/dpdk/examples/l2fwd/build/l2fwd", "-c", "0xc", "-n", "4", "--no-huge", "--no-pci", "--vdev=eth_cvio0,queue_num=256,rx=1,tx=1,cq=0,path=/var/run/usvhost", "--", "-p", "0x1"] $: docker build -t dpdk-app-l2fwd . d. To compile vhost: $: cd dpdk-vhost $: make config RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc $: make install RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc $: make -C examples/vhost RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc e. Start vhost-switch $: ./examples/vhost/build/vhost-switch -c 3 -n 4 --socket-mem 1024,1024 -- -p 0x1 --stats 1 f. Start docker $: docker run -i -t -v <path to vhost unix socket>:/var/run/usvhost dpdk-app-l2fwd Signed-off-by: Huawei Xie <huawei.xie at intel.com> Signed-off-by: Jianfeng Tan <jianfeng.tan at intel.com> Jianfeng Tan (5): virtio/container: add handler for ioport rd/wr virtio/container: add a new virtual device named eth_cvio virtio/container: unify desc->addr assignment virtio/container: adjust memory initialization process vhost/container: change mode of vhost listening socket config/common_linuxapp | 5 + drivers/net/virtio/Makefile | 4 + drivers/net/virtio/vhost-user.c | 433 +++++++++++++++++++++++++++ drivers/net/virtio/vhost-user.h | 137 +++++++++ drivers/net/virtio/virtio_ethdev.c | 319 +++++++++++++++----- drivers/net/virtio/virtio_ethdev.h | 16 + drivers/net/virtio/virtio_pci.h | 32 +- drivers/net/virtio/virtio_rxtx.c | 9 +- drivers/net/virtio/virtio_rxtx_simple.c | 9 +- drivers/net/virtio/virtqueue.h | 9 +- lib/librte_eal/common/include/rte_memory.h | 5 + lib/librte_eal/linuxapp/eal/eal_memory.c | 58 +++- lib/librte_mempool/rte_mempool.c | 16 +- lib/librte_vhost/vhost_user/vhost-net-user.c | 5 + 14 files changed, 967 insertions(+), 90 deletions(-) create mode 100644 drivers/net/virtio/vhost-user.c create mode 100644 drivers/net/virtio/vhost-user.h -- 2.1.4