This patch adds support for a new port type to the userspace datapath called dpdkvhostuser. It adds to the existing infrastructure of vhost-cuse, however disables vhost-cuse ports as the default port type, in favour of vhost-user ports. vhost-cuse 'dpdkvhost' ports are still available and can be enabled using a configure flag, steps for which are available in INSTALL.DPDK.md.
A new dpdkvhostuser port will create a unix domain socket which when provided to QEMU is used to facilitate communication between the virtio-net device on the VM and the OVS port on the host. Signed-off-by: Ciara Loftus <ciara.lof...@intel.com> --- INSTALL.DPDK.md | 176 ++++++++++++++++++++++++++++++++++++++---------- acinclude.m4 | 13 ++++ configure.ac | 1 + lib/netdev-dpdk.c | 114 +++++++++++++++++++++++++------ lib/netdev.c | 4 ++ vswitchd/ovs-vswitchd.c | 9 ++- 6 files changed, 259 insertions(+), 58 deletions(-) diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md index 899763f..a39645f 100644 --- a/INSTALL.DPDK.md +++ b/INSTALL.DPDK.md @@ -16,7 +16,9 @@ OVS needs a system with 1GB hugepages support. Building and Installing: ------------------------ -Required DPDK 2.0, `fuse`, `fuse-devel` (`libfuse-dev` on Debian/Ubuntu) +Required: DPDK 2.0 +Optional (if building with vhost-cuse): `fuse`, `fuse-devel` (`libfuse-dev` +on Debian/Ubuntu) 1. Configure build & install DPDK: 1. Set `$DPDK_DIR` @@ -31,13 +33,10 @@ Required DPDK 2.0, `fuse`, `fuse-devel` (`libfuse-dev` on Debian/Ubuntu) `CONFIG_RTE_BUILD_COMBINE_LIBS=y` - Update `config/common_linuxapp` so that DPDK is built with vhost - libraries; currently, OVS only supports vhost-cuse, so DPDK vhost-user - libraries should be explicitly turned off (they are enabled by default - in DPDK 2.0). + Update `config/common_linuxapp` so that DPDK is built with vhost-user + libraries. `CONFIG_RTE_LIBRTE_VHOST=y` - `CONFIG_RTE_LIBRTE_VHOST_USER=n` Then run `make install` to build and install the library. For default install without IVSHMEM: @@ -311,40 +310,146 @@ the vswitchd. DPDK vhost: ----------- -vhost-cuse is only supported at present i.e. not using the standard QEMU -vhost-user interface. It is intended that vhost-user support will be added -in future releases when supported in DPDK and that vhost-cuse will eventually -be deprecated. See [DPDK Docs] for more info on vhost. +DPDK 2.0 supports two types of vhost: -Prerequisites: -1. Insert the Cuse module: +1. vhost-user +2. vhost-cuse - `modprobe cuse` +By default, vhost-user is enabled in DPDK and following this, the same +applies for OVS. This document assumes the use of vhost-user, unless +otherwise specified. At the moment, vhost-cuse support is optional in +OVS, however it is intended to deprecated in a future release. -2. Build and insert the `eventfd_link` module: +(Optional) Building with vhost-cuse ports: +------------------------------------------ - `cd $DPDK_DIR/lib/librte_vhost/eventfd_link/` - `make` - `insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko` +Should you wish to use vhost-cuse instead of vhost-user, you must first +enable vhost-cuse in DPDK by setting the following additional flag in +`config/common_linuxapp`: + + `CONFIG_RTE_LIBRTE_VHOST_USER=n` + +Following this, rebuild DPDK as per the instructions in the "Building and +Installing" section. +Secondly, you must enable vhost-cuse in OVS. This can be achieved by using +the `--with-vhostcuse` flag during the `./configure` step like so: + +`./configure --with-dpdk=$DPDK_BUILD --with-vhostcuse` + +Finally, rebuild OVS as per step 3 in the "Building and Installing" section. + +DPDK vhost Prerequisites: +------------------------- + +1. DPDK 2.0 with vhost support enabled as documented in the "Building and + Installing section": + +2. (Optional) If using vhost-cuse: + + 1. Insert the Cuse module: + + `modprobe cuse` + + 2. Build and insert the `eventfd_link` module: + + ``` + cd $DPDK_DIR/lib/librte_vhost/eventfd_link/ + make + insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko + ``` + +3. QEMU version v2.1.0+ + + Both vhost-user and vhost-cuse will work with QEMU v2.1.0 and above, + however it is recommended to use v2.2.0 if providing your VM with memory + greater than 1GB due to potential issues with memory mapping larger areas. + Note: For vhost-cuse, QEMU v1.6.2 will also work, with slightly different + command line parameters, which are specified later in this document. + +Adding DPDK vhost ports to the Switch: +-------------------------------------- Following the steps above to create a bridge, you can now add DPDK vhost -as a port to the vswitch. +as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost ports can have +arbitrary names. + +When adding vhost ports to the switch, take care depending on which type of +vhost you are using. + + - For vhost-user (default), the name of the port type is `dpdkvhostuser` -`ovs-vsctl add-port br0 dpdkvhost0 -- set Interface dpdkvhost0 type=dpdkvhost` + ``` + ovs-ofctl add-port br0 vhost-user-1 -- set Interface vhost-user-1 + type=dpdkvhostuser + ``` + + This action creates a socket located at + `/usr/local/var/run/openvswitch/vhost-user-1`, which you must provide + to your VM on the QEMU command line. More instructions on this can be + found in the next section "DPDK vhost-user VM configuration" + Note: If you wish for the vhost-user sockets to be created in a + directory other than `/usr/local/var/run/openvswitch`, you may specify + another location on the ovs-vswitchd command line like so: -Unlike DPDK ring ports, DPDK vhost ports can have arbitrary names: + `./vswitchd/ovs-vswitchd --dpdk --vhost_sock_dir /my-dir -c 0x1 ...` -`ovs-vsctl add-port br0 port123ABC -- set Interface port123ABC type=dpdkvhost` + - For vhost-cuse, the name of the port type is `dpdkvhost` -However, please note that when attaching userspace devices to QEMU, the -name provided during the add-port operation must match the ifname parameter -on the QEMU command line. + ``` + ovs-ofctl add-port br0 vhost-cuse-1 -- set Interface vhost-cuse-1 + type=dpdkvhost + ``` + When attaching vhost-cuse ports to QEMU, the name provided during the + add-port operation must match the ifname parameter on the QEMU command + line. More instructions on this can be found in the section "DPDK + vhost-cuse VM configuration" -DPDK vhost VM configuration: ----------------------------- +DPDK vhost-user VM configuration: +--------------------------------- +Follow the steps below to attach vhost-user port(s) to a VM. - vhost ports use a Linux* character device to communicate with QEMU. +1. Configure sockets. + Pass the following parameters to QEMU to attach a vhost-user device: + + ``` + -chardev socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user-1 + -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce + -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1 + ``` + + ...where vhost-user-1 is the name of the vhost-user port added + to the switch. + Repeat the above parameters for multiple devices, changing the + chardev path and id as necessary. Note that a separate and different + chardev path needs to be specified for each vhost-user device. For + example you have a second vhost-user port named 'vhost-user-2', you + append your QEMU command line with an additional set of parameters: + + + ``` + -chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2 + -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce + -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2 + ``` + +2. Configure huge pages. + QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a + virtio-net device's virtual rings and packet buffers mapping the VM's + physical memory on hugetlbfs. To enable vhost-ports to map the VM's + memory into their process address space, pass the following paramters + to QEMU: + + ``` + -object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages, + share=on + -numa node,memdev=mem -mem-prealloc + ``` + +DPDK vhost-cuse VM configuration: +--------------------------------- + + vhost-cuse ports use a Linux* character device to communicate with QEMU. By default it is set to `/dev/vhost-net`. It is possible to reuse this standard device for DPDK vhost, which makes setup a little simpler but it is better practice to specify an alternative character device in order to @@ -410,16 +515,19 @@ DPDK vhost VM configuration: QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a virtio-net device's virtual rings and packet buffers mapping the VM's physical memory on hugetlbfs. To enable vhost-ports to map the VM's - memory into their process address space, pass the following paramters + memory into their process address space, pass the following parameters to QEMU: `-object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages, share=on -numa node,memdev=mem -mem-prealloc` + Note: For use with an earlier QEMU version such as v1.6.2, use the + following to configure hugepages instead: -DPDK vhost VM configuration with QEMU wrapper: ----------------------------------------------- + `-mem-path /dev/hugepages -mem-prealloc` +DPDK vhost-cuse VM configuration with QEMU wrapper: +--------------------------------------------------- The QEMU wrapper script automatically detects and calls QEMU with the necessary parameters. It performs the following actions: @@ -445,8 +553,8 @@ qemu-wrap.py -cpu host -boot c -hda <disk image> -m 4096 -smp 4 netdev=net1,mac=00:00:00:00:00:01 ``` -DPDK vhost VM configuration with libvirt: ------------------------------------------ +DPDK vhost-cuse VM configuration with libvirt: +---------------------------------------------- If you are using libvirt, you must enable libvirt to access the character device by adding it to controllers cgroup for libvirtd using the following @@ -520,7 +628,7 @@ Now you may launch your VM using virt-manager, or like so: `virsh create my_vhost_vm.xml` -DPDK vhost VM configuration with libvirt and QEMU wrapper: +DPDK vhost-cuse VM configuration with libvirt and QEMU wrapper: ---------------------------------------------------------- To use the qemu-wrapper script in conjuntion with libvirt, follow the @@ -548,7 +656,7 @@ steps in the previous section before proceeding with the following steps: the correct emulator location and set any additional options. If you are using a alternative character device name, please set "us_vhost_path" to the location of that device. The script will automatically detect and insert - the correct "vhostfd" value in the QEMU command line arguements. + the correct "vhostfd" value in the QEMU command line arguments. 5. Use virt-manager to launch the VM diff --git a/acinclude.m4 b/acinclude.m4 index e9d0ed9..08e1402 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -225,6 +225,19 @@ AC_DEFUN([OVS_CHECK_DPDK], [ AM_CONDITIONAL([DPDK_NETDEV], test -n "$RTE_SDK") ]) +dnl OVS_CHECK_VHOST_CUSE +dnl +dnl Enable DPDK vhost-cuse support in favour of vhost-user +AC_DEFUN([OVS_CHECK_VHOST_CUSE], [ + AC_ARG_WITH(vhostcuse, + [AC_HELP_STRING([--with-vhostcuse], + [Enable DPDK vhost-cuse])]) + + if test X"$with_vhostcuse" != X; then + AC_DEFINE([VHOST_CUSE], [1], [DPDK vhost-cuse support enabled, vhost-user disabled.]) + fi +]) + dnl OVS_GREP_IFELSE(FILE, REGEX, [IF-MATCH], [IF-NO-MATCH]) dnl dnl Greps FILE for REGEX. If it matches, runs IF-MATCH, otherwise IF-NO-MATCH. diff --git a/configure.ac b/configure.ac index 068674e..3f635b4 100644 --- a/configure.ac +++ b/configure.ac @@ -165,6 +165,7 @@ AC_ARG_VAR(KARCH, [Kernel Architecture String]) AC_SUBST(KARCH) OVS_CHECK_LINUX OVS_CHECK_DPDK +OVS_CHECK_VHOST_CUSE OVS_CHECK_PRAGMA_MESSAGE AC_SUBST([OVS_CFLAGS]) AC_SUBST([OVS_LDFLAGS]) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 5af15d4..54ead15 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -28,6 +28,7 @@ #include <unistd.h> #include <stdio.h> +#include "dirs.h" #include "dp-packet.h" #include "dpif-netdev.h" #include "list.h" @@ -101,8 +102,18 @@ BUILD_ASSERT_DECL((MAX_NB_MBUF / ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF)) #define MAX_PKT_BURST 32 /* Max burst size for RX/TX */ -/* Character device cuse_dev_name. */ -char *cuse_dev_name = NULL; +/* For vhost-user, the path where sockets will be created. + * For vhost-cuse, the name of the character device. */ +char *vhost_dev_or_sock = NULL; + +#ifdef VHOST_CUSE +char vhost_flag[] = "--cuse_dev_name"; +char vhost_flag_default_val[] = "vhost-net"; +#else +#define VHOST_USER +char vhost_flag[] = "--vhost_sock_dir"; +char vhost_flag_default_val[PATH_MAX]; /* Initialized at runtime via ovs_rundir */ +#endif static const struct rte_eth_conf port_conf = { .rxmode = { @@ -230,6 +241,11 @@ struct netdev_dpdk { /* virtio-net structure for vhost device */ OVSRCU_TYPE(struct virtio_net *) virtio_dev; +#ifdef VHOST_USER + /* socket location for vhost-user device */ + char socket_path[IF_NAME_SZ]; +#endif + /* In dpdk_list. */ struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex); rte_spinlock_t txq_lock; @@ -556,6 +572,24 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned int port_no, netdev_->n_txq = NR_QUEUE; netdev_->n_rxq = NR_QUEUE; + /* Take the name of the vhost-user port and append it to the location where + * the socket is to be created, then register the socket. + */ +#ifdef VHOST_USER + if (type == DPDK_DEV_VHOST) { + snprintf(netdev->socket_path, sizeof(netdev->socket_path), "%s/%s", + vhost_dev_or_sock, netdev_->name); + err = rte_vhost_driver_register(netdev->socket_path); + if (err != 0) { + VLOG_ERR("vhost-user socket device setup failure for socket %s\n", + netdev->socket_path); + goto unlock; + } + + VLOG_INFO("Socket %s created for vhost-user port %s\n", netdev->socket_path, netdev_->name); + } +#endif + if (type == DPDK_DEV_ETH) { netdev_dpdk_alloc_txq(netdev, NR_QUEUE); err = dpdk_eth_dev_init(netdev); @@ -1525,6 +1559,21 @@ set_irq_status(struct virtio_net *dev) } /* + * Compare the name of the QEMU device with the name of the vhost port + */ +static int +compare_vhost_name(char* ifname, struct netdev_dpdk *netdev) { +#ifdef VHOST_CUSE + if (strncmp(ifname, netdev->up.name, IFNAMSIZ) == 0) + return 1; +#else + if (strncmp(ifname, netdev->socket_path, IF_NAME_SZ) == 0) + return 1; +#endif + return 0; +} + +/* * A new virtio-net device is added to a vhost port. */ static int @@ -1536,7 +1585,7 @@ new_device(struct virtio_net *dev) ovs_mutex_lock(&dpdk_mutex); /* Add device to the vhost port with the same name as that passed down. */ LIST_FOR_EACH(netdev, list_node, &dpdk_list) { - if (strncmp(dev->ifname, netdev->up.name, IFNAMSIZ) == 0) { + if (compare_vhost_name(dev->ifname, netdev)) { ovs_mutex_lock(&netdev->mutex); ovsrcu_set(&netdev->virtio_dev, dev); ovs_mutex_unlock(&netdev->mutex); @@ -1616,7 +1665,7 @@ const struct virtio_net_device_ops virtio_net_device_ops = }; static void * -start_cuse_session_loop(void *dummy OVS_UNUSED) +start_vhost_loop(void *dummy OVS_UNUSED) { pthread_detach(pthread_self()); /* Put the cuse thread into quiescent state. */ @@ -1628,22 +1677,23 @@ start_cuse_session_loop(void *dummy OVS_UNUSED) static int dpdk_vhost_class_init(void) { - int err = -1; - rte_vhost_driver_callback_register(&virtio_net_device_ops); +#ifdef VHOST_CUSE + int err = -1; /* Register CUSE device to handle IOCTLs. - * Unless otherwise specified on the vswitchd command line, cuse_dev_name - * is set to vhost-net. + * Unless otherwise specified on the vswitchd command line, vhost_dev_or_sock + * is set to "vhost-net". */ - err = rte_vhost_driver_register(cuse_dev_name); + err = rte_vhost_driver_register(vhost_dev_or_sock); if (err != 0) { VLOG_ERR("CUSE device setup failure."); return -1; } +#endif - ovs_thread_create("cuse_thread", start_cuse_session_loop, NULL); + ovs_thread_create("vhost_thread", start_vhost_loop, NULL); return 0; } @@ -1861,6 +1911,15 @@ dpdk_init(int argc, char **argv) int result; int base = 0; char *pragram_name = argv[0]; + char *vhost_flag_val = NULL; + unsigned flag_max = 0; + +#ifdef VHOST_USER + strncpy(vhost_flag_default_val, ovs_rundir(), PATH_MAX); + flag_max = PATH_MAX; +#else + flag_max = NAME_MAX; +#endif if (argc < 2 || strcmp(argv[1], "--dpdk")) return 0; @@ -1869,29 +1928,36 @@ dpdk_init(int argc, char **argv) argc--; argv++; - /* If the cuse_dev_name parameter has been provided, set 'cuse_dev_name' to - * this string if it meets the correct criteria. Otherwise, set it to the - * default (vhost-net). + /* Depending on which version of vhost is in use, process the vhost-specific + * flag if it is provided on the vswitchd command line, otherwise resort to + * a default value. + * + * For vhost-user: Process "--cuse_dev_name" to set the custom location of + * the vhost-user socket(s). + * For vhost-cuse: Process "--vhost_sock_dir" to set the custom name of the + * vhost-cuse character device. */ - if (!strcmp(argv[1], "--cuse_dev_name") && - (strlen(argv[2]) <= NAME_MAX)) { + if (!strcmp(argv[1], vhost_flag) && + (strlen(argv[2]) <= flag_max)) { - cuse_dev_name = strdup(argv[2]); + vhost_flag_val = strdup(argv[2]); - /* Remove the cuse_dev_name configuration parameters from the argument + /* Remove the vhost flag configuration parameters from the argument * list, so that the correct elements are passed to the DPDK * initialization function */ argc -= 2; - argv += 2; /* Increment by two to bypass the cuse_dev_name arguments */ + argv += 2; /* Increment by two to bypass the vhost flag arguments */ base = 2; - VLOG_ERR("User-provided cuse_dev_name in use: /dev/%s", cuse_dev_name); - } else { - cuse_dev_name = "vhost-net"; - VLOG_INFO("No cuse_dev_name provided - defaulting to /dev/vhost-net"); + VLOG_ERR("User-provided %s in use: %s", vhost_flag, vhost_flag_val); + } else { + vhost_flag_val = vhost_flag_default_val; + VLOG_INFO("No %s provided - defaulting to %s", vhost_flag, vhost_flag_val); } + vhost_dev_or_sock = (char*)vhost_flag_val; + /* Keep the program name argument as this is needed for call to * rte_eal_init() */ @@ -1946,7 +2012,11 @@ const struct netdev_class dpdk_ring_class = const struct netdev_class dpdk_vhost_class = NETDEV_DPDK_CLASS( +#ifdef VHOST_CUSE "dpdkvhost", +#else + "dpdkvhostuser", +#endif dpdk_vhost_class_init, netdev_dpdk_vhost_construct, netdev_dpdk_vhost_destruct, diff --git a/lib/netdev.c b/lib/netdev.c index 45f7f29..77513fa 100644 --- a/lib/netdev.c +++ b/lib/netdev.c @@ -111,7 +111,11 @@ netdev_is_pmd(const struct netdev *netdev) { return (!strcmp(netdev->netdev_class->type, "dpdk") || !strcmp(netdev->netdev_class->type, "dpdkr") || +#ifdef VHOST_CUSE !strcmp(netdev->netdev_class->type, "dpdkvhost")); +#else + !strcmp(netdev->netdev_class->type, "dpdkvhostuser")); +#endif } static void diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c index a1b33da..ea9560b 100644 --- a/vswitchd/ovs-vswitchd.c +++ b/vswitchd/ovs-vswitchd.c @@ -252,9 +252,14 @@ usage(void) daemon_usage(); vlog_usage(); printf("\nDPDK options:\n" - " --dpdk options Initialize DPDK datapath.\n" - " --cuse_dev_name BASENAME override default character device name\n" + " --dpdk options Initialize DPDK datapath.\n"); +#ifdef VHOST_CUSE + printf(" --cuse_dev_name BASENAME override default character device name\n" " for use with userspace vHost.\n"); +#else + printf(" --vhost_sock_dir DIR override default directory where\n" + " vhost-user sockets are created.\n"); +#endif printf("\nOther options:\n" " --unixctl=SOCKET override default control socket name\n" " -h, --help display this help message\n" -- 1.9.3 _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev