> From: ovs-dev-boun...@openvswitch.org > [mailto:ovs-dev-boun...@openvswitch.org] On Behalf Of Ciara Loftus > Sent: Friday, 15 December, 2017 14:14 > To: d...@openvswitch.org > Subject: [ovs-dev] [PATCH v6 2/2] netdev-dpdk: Enable optional dequeue zero > copy for vHost User > > Enabled per port like so: > ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true > > The feature is disabled by default and can only be enabled/disabled when > a vHost port is down. > > When packets from a vHost device with zero copy enabled are destined for > a 'dpdk' port, the number of tx descriptors on that 'dpdk' port must be > set to a smaller value. 128 is recommended. This can be achieved like > so:
We should clarify here that the sum of the tx descriptors of all 'dpdk' ports the VM will send to should not exceed 128. In case of a bond over two physical ports in balance-tcp mode, you would, for example have to divide 128 by the number of links in the bond. > > ovs-vsctl set Interface dpdkport options:n_txq_desc=128 > > Due to the requirement above, the feature is considered 'experimental'. > > Signed-off-by: Ciara Loftus <ciara.lof...@intel.com> > --- > v6: > * Note the feature is experimental. > * Mention bug in DPDK & temporary requirement on QEMU < v2.7 > > Documentation/howto/dpdk.rst | 33 ++++++++++++ > Documentation/topics/dpdk/vhost-user.rst | 68 ++++++++++++++++++++++++ > NEWS | 2 + > lib/netdev-dpdk.c | 89 > +++++++++++++++++++++++++++++++- > vswitchd/vswitch.xml | 11 ++++ > 5 files changed, 202 insertions(+), 1 deletion(-) > > diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst > index d123819..3e1b8f8 100644 > --- a/Documentation/howto/dpdk.rst > +++ b/Documentation/howto/dpdk.rst > @@ -709,3 +709,36 @@ devices to bridge ``br0``. Once complete, follow the > below steps: > Check traffic on multiple queues:: > > $ cat /proc/interrupts | grep virtio > + > +PHY-VM-PHY (vHost Dequeue Zero Copy) > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +vHost dequeue zero copy functionality can be validated using the > +PHY-VM-PHY configuration. To begin, follow the steps described in > +:ref:`dpdk-phy-phy` to create and initialize the database, start > +ovs-vswitchd and add ``dpdk``-type and ``dpdkvhostuser``-type devices > +and flows to bridge ``br0``. Once complete, follow the below steps: > + > +1. Enable dequeue zero copy on the vHost devices. > + > + $ ovs-vsctl set Interface dpdkvhostuser0 options:dq-zero-copy=true > + $ ovs-vsctl set Interface dpdkvhostuser1 options:dq-zero-copy=true > + > +The following log should be observed for each device: > + > + netdev_dpdk|INFO|Zero copy enabled for vHost socket <name> > + > +2. Reduce the number of txq descriptors on the phy ports. > + > + $ ovs-vsctl set Interface phy0 options:n_txq_desc=128 > + $ ovs-vsctl set Interface phy1 options:n_txq_desc=128 > + > +3. Proceed with the test by launching the VM and configuring guest > +forwarding, be it via the vHost loopback method or kernel forwarding > +method, and sending traffic. The following log should be oberved for > +each device as it becomes active during VM boot: > + > + VHOST_CONFIG: dequeue zero copy is enabled > + > +It is essential that step 1 is performed before booting the VM, otherwise > +the feature will not be enabled. > diff --git a/Documentation/topics/dpdk/vhost-user.rst > b/Documentation/topics/dpdk/vhost-user.rst > index 8447e2d..5a65360 100644 > --- a/Documentation/topics/dpdk/vhost-user.rst > +++ b/Documentation/topics/dpdk/vhost-user.rst > @@ -458,3 +458,71 @@ Sample XML > </domain> > > .. _QEMU documentation: > http://git.qemu-project.org/?p=qemu.git;a=blob;f=docs/specs/vhost-user.txt;h=7890d7169;hb=HEAD > + > +vhost-user Dequeue Zero Copy (experimental) > +------------------------------------------- > + > +Normally when dequeuing a packet from a vHost User device, a memcpy operation > +must be used to copy that packet from guest address space to host address > +space. This memcpy can be removed by enabling dequeue zero-copy like so: > + > + $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true > + > +With this feature enabled, a reference (pointer) to the packet is passed to > +the host, instead of a copy of the packet. Removing this memcpy can give a > +performance improvement for some use cases, for example switching large > packets > +between different VMs. However additional packet loss may be observed. > + > +Note that the feature is disabled by default and must be explicitly enabled > +by using the command above. > + > +The feature cannot be enabled when the device is active (ie. VM booted). If > +you wish to enable the feature after the VM has booted, you must shutdown > +the VM and bring it back up. > + > +The same logic applies for disabling the feature - it must be disabled when > +the device is inactive, for example before VM boot. To disable the feature: > + > + $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=false > + > +The feature is available to both dpdkvhostuser and dpdkvhostuserclient port > +types. > + > +A limitation exists whereby if packets from a vHost port with > dq-zero-copy=true > +are destined for a 'dpdk' type port, the number of tx descriptors > (n_txq_desc) > +for that port must be reduced to a smaller number, 128 being the recommended > +value. This can be achieved by issuing the following command: > + > + $ ovs-vsctl set Interface dpdkport options:n_txq_desc=128 > + > +More information on the n_txq_desc option can be found in the "DPDK Physical > +Port Queue Sizes" section of the `intro/install/dpdk.rst` guide. > + > +The reason for this limitation is due to how the zero copy functionality is > +implemented. The vHost device's 'tx used vring', a virtio structure used for > +tracking used ie. sent descriptors, will only be updated when the NIC frees > +the corresponding mbuf. If we don't free the mbufs frequently enough, that > +vring will be starved and packets will no longer be processed. One way to > +ensure we don't encounter this scenario, is to configure n_txq_desc to a > small > +enough number such that the 'mbuf free threshold' for the NIC will be hit > more > +often and thus free mbufs more frequently. The value of 128 is suggested, but > +values of 64 and 256 have been tested and verified to work too, with > differing > +performance characteristics. A value of 512 can be used too, if the virtio > +queue size in the guest is increased to 1024 (available to configure in QEMU > +versions v2.10 and greater). This value can be set like so: > + > + $ qemu-system-x86_64 ... -chardev socket,id=char1,path=<sockpath>,server > + -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce > + -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1, > + tx_queue_size=1024 > + > +Because of this limitation, this feature is condsidered 'experimental'. > + > +The feature currently does not fully work with QEMU >= v2.7 due to a bug in > +DPDK which will be addressed in an upcoming release. The patch to fix this > +issue can be found here: > +http://dpdk.org/dev/patchwork/patch/32198/ > + > +Further information can be found in the > +`DPDK documentation > +<http://dpdk.readthedocs.io/en/v17.05/prog_guide/vhost_lib.html>`__ > diff --git a/NEWS b/NEWS > index 49d2fa5..5921b50 100644 > --- a/NEWS > +++ b/NEWS > @@ -23,6 +23,8 @@ Post-v2.8.0 > - DPDK: > * Add support for DPDK v17.11 > * Add support for vHost IOMMU > + * Optional dequeue zero copy feature for vHost ports (experimental) can > be > + enabled per port via the boolean 'dq-zero-copy' option. Shorten to * Add support for vHost deque zero copy (experimental) > > v2.8.0 - 31 Aug 2017 > -------------------- > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > index 5068d4e..b52e993 100644 > --- a/lib/netdev-dpdk.c > +++ b/lib/netdev-dpdk.c > @@ -379,6 +379,9 @@ struct netdev_dpdk { > /* True if vHost device is 'up' and has been reconfigured at least > once */ > bool vhost_reconfigured; > /* 3 pad bytes here. */ > + > + /* True if dq-zero-copy feature has successfully been enabled */ > + bool dq_zc_enabled; > ); > > PADDED_MEMBERS(CACHE_LINE_SIZE, > @@ -889,6 +892,7 @@ common_construct(struct netdev *netdev, dpdk_port_t > port_no, > dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); > ovsrcu_index_init(&dev->vid, -1); > dev->vhost_reconfigured = false; > + dev->dq_zc_enabled = false; > dev->attached = false; > > ovsrcu_init(&dev->qos_conf, NULL); > @@ -1400,6 +1404,29 @@ netdev_dpdk_ring_set_config(struct netdev *netdev, > const struct smap *args, > return 0; > } > > +static void > +dpdk_vhost_set_config_helper(struct netdev_dpdk *dev, > + const struct smap *args) > +{ > + bool needs_reconfigure = false; > + bool zc_requested = smap_get_bool(args, "dq-zero-copy", false); > + > + if (zc_requested && > + !(dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) { > + dev->vhost_driver_flags |= RTE_VHOST_USER_DEQUEUE_ZERO_COPY; > + needs_reconfigure = true; > + } else if (!zc_requested && > + (dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) { > + dev->vhost_driver_flags &= ~RTE_VHOST_USER_DEQUEUE_ZERO_COPY; > + needs_reconfigure = true; > + } > + > + /* Only try to change ZC mode when device is down */ > + if (needs_reconfigure && (netdev_dpdk_get_vid(dev) == -1)) { > + netdev_request_reconfigure(&dev->up); > + } > +} > + > static int > netdev_dpdk_vhost_client_set_config(struct netdev *netdev, > const struct smap *args, > @@ -1416,6 +1443,23 @@ netdev_dpdk_vhost_client_set_config(struct netdev > *netdev, > netdev_request_reconfigure(netdev); > } > } > + > + dpdk_vhost_set_config_helper(dev, args); > + > + ovs_mutex_unlock(&dev->mutex); > + > + return 0; > +} > + > +static int > +netdev_dpdk_vhost_set_config(struct netdev *netdev, > + const struct smap *args, > + char **errp OVS_UNUSED) > +{ > + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > + > + ovs_mutex_lock(&dev->mutex); > + dpdk_vhost_set_config_helper(dev, args); > ovs_mutex_unlock(&dev->mutex); > > return 0; > @@ -2748,6 +2792,46 @@ netdev_dpdk_txq_map_clear(struct netdev_dpdk *dev) > } > } > > +static void > +vhost_change_zero_copy_mode(struct netdev_dpdk *dev, bool client_mode, > + bool enable) > +{ > + int err = rte_vhost_driver_unregister(dev->vhost_id); > + > + if (err) { > + VLOG_ERR("Error unregistering vHost socket %s; can't change zero > copy " > + "mode", dev->vhost_id); > + } else { > + err = dpdk_setup_vhost_device(dev, client_mode); > + if (err) { > + VLOG_ERR("Error changing zero copy mode for vHost socket %s", > + dev->vhost_id); > + } else if (enable) { > + dev->dq_zc_enabled = true; > + VLOG_INFO("Zero copy enabled for vHost socket %s", > dev->vhost_id); > + } else { > + dev->dq_zc_enabled = false; > + VLOG_INFO("Zero copy disabled for vHost socket %s", > dev->vhost_id); > + } > + } > +} > + > +static void > +vhost_check_zero_copy_status(struct netdev_dpdk *dev) > +{ > + bool mode = dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT; > + > + if ((dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY) > + && !dev->dq_zc_enabled) { > + /* ZC disabled but requested to be enabled, enable it. */ > + vhost_change_zero_copy_mode(dev, mode, true); > + } else if (!(dev->vhost_driver_flags & > + RTE_VHOST_USER_DEQUEUE_ZERO_COPY) && dev->dq_zc_enabled) { > + /* ZC enabled but requested to be disabled, disable it. */ > + vhost_change_zero_copy_mode(dev, mode, false); > + } > +} > + > /* > * Remove a virtio-net device from the specific vhost port. Use dev->remove > * flag to stop any more packets from being sent or received to/from a VM and > @@ -2793,6 +2877,7 @@ destroy_device(int vid) > */ > ovsrcu_quiesce_start(); > VLOG_INFO("vHost Device '%s' has been removed", ifname); > + netdev_request_reconfigure(&dev->up); > } else { > VLOG_INFO("vHost Device '%s' not found", ifname); > } > @@ -3284,6 +3369,8 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev) > /* Carrier status may need updating. */ > netdev_change_seq_changed(&dev->up); > } > + } else { > + vhost_check_zero_copy_status(dev); > } > > return 0; > @@ -3446,7 +3533,7 @@ static const struct netdev_class dpdk_vhost_class = > NULL, > netdev_dpdk_vhost_construct, > netdev_dpdk_vhost_destruct, > - NULL, > + netdev_dpdk_vhost_set_config, > NULL, > netdev_dpdk_vhost_send, > netdev_dpdk_vhost_get_carrier, > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml > index 21ffaf5..330b028 100644 > --- a/vswitchd/vswitch.xml > +++ b/vswitchd/vswitch.xml > @@ -2669,6 +2669,17 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 > type=patch options:peer=p1 \ > </p> > </column> > > + <column name="options" key="dq-zero-copy" > + type='{"type": "boolean"}'> > + <p> > + The value specifies whether or not to enable dequeue zero copy on > + the given interface. The feature is considered experimental. > + The port must be in an inactive state in order to enable or disable > + this feature. > + Only supported by dpdkvhostuserclient and dpdkvhostuser interfaces. > + </p> > + </column> > + > <column name="options" key="n_rxq_desc" > type='{"type": "integer", "minInteger": 1, "maxInteger": > 4096}'> > <p> > -- > 2.7.5 > > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev