On 18.12.2017 16:20, Loftus, Ciara wrote: >> >> On 18.12.2017 15:28, Loftus, Ciara wrote: >>>> >>>> Not a full review. >>> >>> Thanks for your feedback. >>> >>>> >>>> General thoughts: >>>> >>>> If following conditions are true: >>>> >>>> 1. We don't need to add new feature to deprecated vhostuser port. >>> >>> Agree. >>> >>>> >>>> 2. We actually don't need to have ability to change ZC config if vhost- >> server- >>>> path >>>> already configured. >>>> >>>> Let me explain this condition: >>>> To change 'dq-zero-copy' if VM already started we need to stop VM, >>>> change the >>>> 'dq-zero-copy' and start the VM back. It's not much simpler than stop >> VM, >>>> re-add >>>> port with different value for 'dq-zero-copy' and start VM back. I'm >> assuming >>>> here >>>> that VM restart is much more complex and unlikely operation than port >>>> removing >>>> from OVS and adding back. This will require documenting, but we >> already >>>> have a >>>> bunch of docs about how to modify this config option in current version. >>> >>> Don't fully agree. I can't say whether your assumption is correct. >>> Port-deletion & re-addition has repercussions eg. loss of statistics from an >> interface. Retaining those stats might be important for some. >>> Why not leave it up to the user to choose their preferred method of >> enabling feature ie. reboot the VM or delete & add the port. >>> >>>> >>>> we may drop patch #1 from the series and use just something like >> following >>>> code instead of code changes in this patch (not tested, just for a >> reference): >>>> >>>> --------------- >>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c >>>> index 2325f0e..c4cbf7c 100644 >>>> --- a/lib/netdev-dpdk.c >>>> +++ b/lib/netdev-dpdk.c >>>> @@ -379,6 +379,9 @@ struct netdev_dpdk { >>>> /* True if vHost device is 'up' and has been reconfigured at >>>> least once >> */ >>>> bool vhost_reconfigured; >>>> /* 3 pad bytes here. */ >>>> + >>>> + /* True if dq-zero-copy feature requested for vhost device. */ >>>> + bool vhost_dq_zc; >>>> ); >>>> >>>> PADDED_MEMBERS(CACHE_LINE_SIZE, >>>> @@ -889,6 +892,7 @@ common_construct(struct netdev *netdev, >>>> dpdk_port_t port_no, >>>> dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); >>>> ovsrcu_index_init(&dev->vid, -1); >>>> dev->vhost_reconfigured = false; >>>> + dev->vhost_dq_zc = false; >>>> dev->attached = false; >>>> >>>> ovsrcu_init(&dev->qos_conf, NULL); >>>> @@ -1384,6 +1388,7 @@ netdev_dpdk_vhost_client_set_config(struct >>>> netdev *netdev, >>>> path = smap_get(args, "vhost-server-path"); >>>> if (path && strcmp(path, dev->vhost_id)) { >>>> strcpy(dev->vhost_id, path); >>>> + dev->vhost_dq_zc = smap_get_bool(args, "dq-zero-copy", false); >>>> netdev_request_reconfigure(netdev); >>>> } >>>> } >>>> @@ -3288,6 +3293,11 @@ netdev_dpdk_vhost_client_reconfigure(struct >>>> netdev *netdev) >>>> if (dpdk_vhost_iommu_enabled()) { >>>> vhost_flags |= RTE_VHOST_USER_IOMMU_SUPPORT; >>>> } >>>> + >>>> + /* Enable Zero Copy mode if configured. */ >>>> + if (dev->vhost_dq_zc) { >>>> + vhost_flags |= RTE_VHOST_USER_DEQUEUE_ZERO_COPY; >>>> + } >>>> err = rte_vhost_driver_register(dev->vhost_id, vhost_flags); >>>> if (err) { >>>> VLOG_ERR("vhost-user device setup failure for device %s\n", >>>> >>>> --------------- >>>> >>>> Thoughts? >>> >>> The above is a lot shorter of course which is nice, but I don't think we >> should sacrifice the ability to enable the feature post-boot for this. >> >> But you wrote in documentation below: >> >> +The feature cannot be enabled when the device is active (ie. VM booted). If >> +you wish to enable the feature after the VM has booted, you must >> shutdown >> +the VM and bring it back up. >> >> What you mean saying "sacrifice the ability to enable the feature post-boot"? >> Am I missed something? > > My understanding is that your suggestion means you sacrifice the ability to > enable the feature post *first* VM boot. Am I correct? > My suggestion (and how the patch is implemented now) allows you to enable the > feature post *first* VM boot, by means of a re-boot.
Your approach: * ovs-vsctl \ add-port br0 vhost0 \ -- set interface vhost0 type=dpdkvhostuserclient options:vhost-server-path=./vhost0.sock * ./boot_vm.sh * ovs-vsctl set interface vhost0 options:dq-zero-copy=true * ./stop_vm.sh (stop virtio-net driver) * ./boot_vm.sh (start virtio-net driver) My approach: * ovs-vsctl \ add-port br0 vhost0 \ -- set interface vhost0 type=dpdkvhostuserclient options:vhost-server-path=./vhost0.sock * ./boot_vm.sh * ovs-vsctl del-port vhost0 * ovs-vsctl \ add-port br0 vhost0 \ -- set interface vhost0 type=dpdkvhostuserclient options:vhost-server-path=./vhost0.sock \ options:dq-zero-copy=true * ./stop_vm.sh (stop virtio-net driver) * ./boot_vm.sh (start virtio-net driver) The only difference is, as you already mentioned, statistics for that interface will be dropped. But I'm not sure If someone really needs to change that parameter while VM is running. It sounds strange that someone wants to enable/disable this feature few times in a VM lifetime. It looks more reasonable that this feature will be enabled or disabled for the whole life of a single VM. Especially because it costs VM/driver restart. The most common scenario, IMHO, is passing all the required parameters at once in 'add-port' command. In this case there will be no difference between our approaches from the user's point of view, but a big difference from the developer's. ------ Beside above discussion. I guess that with vhostuserclient ports we don't really need to reboot the VM. We only need to reload virtio driver in guest. If guest uses DPDK, this means that we only need to restart DPDK application inside guest to enable/disable zero copy. Am I right? If so, I think, documentation could be modified with driver reloading instead of VM rebooting. ------ One more general comment to documentation: It's not a full rST now, it's half - rST / half - usual text. Please, add '::' for command blocks, ':ref:' for referencing of another sections (Physical Port Queue Sizes). Best regards, Ilya Maximets. >>>> >>>> Best regards, Ilya Maximets. >>>> >>>> P.S. Few comments about patch itself are inline. >>>> >>>> On 08.12.2017 18:26, Ciara Loftus wrote: >>>>> Enabled per port like so: >>>>> ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true >>>>> >>>>> The feature is disabled by default and can only be enabled/disabled >> when >>>>> a vHost port is down. >>>>> >>>>> When packets from a vHost device with zero copy enabled are destined >> for >>>>> a 'dpdk' port, the number of tx descriptors on that 'dpdk' port must be >>>>> set to a smaller value. 128 is recommended. This can be achieved like >>>>> so: >>>>> >>>>> ovs-vsctl set Interface dpdkport options:n_txq_desc=128 >>>>> >>>>> Signed-off-by: Ciara Loftus <ciara.lof...@intel.com> >>>>> --- >>>>> v5: >>>>> * Rebase >>>>> * Update docs with warning of potential packet loss >>>>> * Update docs with info on NIC & Virtio descriptor values and working >>>>> combinations >>>>> >>>>> Documentation/howto/dpdk.rst | 33 ++++++++++++ >>>>> Documentation/topics/dpdk/vhost-user.rst | 61 >>>> ++++++++++++++++++++++ >>>>> NEWS | 2 + >>>>> lib/netdev-dpdk.c | 89 >>>> +++++++++++++++++++++++++++++++- >>>>> vswitchd/vswitch.xml | 11 ++++ >>>>> 5 files changed, 195 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/Documentation/howto/dpdk.rst >>>> b/Documentation/howto/dpdk.rst >>>>> index d123819..3e1b8f8 100644 >>>>> --- a/Documentation/howto/dpdk.rst >>>>> +++ b/Documentation/howto/dpdk.rst >>>>> @@ -709,3 +709,36 @@ devices to bridge ``br0``. Once complete, follow >>>> the below steps: >>>>> Check traffic on multiple queues:: >>>>> >>>>> $ cat /proc/interrupts | grep virtio >>>>> + >>>>> +PHY-VM-PHY (vHost Dequeue Zero Copy) >>>>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>>> + >>>>> +vHost dequeue zero copy functionality can be validated using the >>>>> +PHY-VM-PHY configuration. To begin, follow the steps described in >>>>> +:ref:`dpdk-phy-phy` to create and initialize the database, start >>>>> +ovs-vswitchd and add ``dpdk``-type and ``dpdkvhostuser``-type devices >>>>> +and flows to bridge ``br0``. Once complete, follow the below steps: >>>>> + >>>>> +1. Enable dequeue zero copy on the vHost devices. >>>>> + >>>>> + $ ovs-vsctl set Interface dpdkvhostuser0 options:dq-zero- >> copy=true >>>>> + $ ovs-vsctl set Interface dpdkvhostuser1 options:dq-zero- >> copy=true >>>>> + >>>>> +The following log should be observed for each device: >>>>> + >>>>> + netdev_dpdk|INFO|Zero copy enabled for vHost socket <name> >>>>> + >>>>> +2. Reduce the number of txq descriptors on the phy ports. >>>>> + >>>>> + $ ovs-vsctl set Interface phy0 options:n_txq_desc=128 >>>>> + $ ovs-vsctl set Interface phy1 options:n_txq_desc=128 >>>>> + >>>>> +3. Proceed with the test by launching the VM and configuring guest >>>>> +forwarding, be it via the vHost loopback method or kernel forwarding >>>>> +method, and sending traffic. The following log should be oberved for >>>>> +each device as it becomes active during VM boot: >>>>> + >>>>> + VHOST_CONFIG: dequeue zero copy is enabled >>>>> + >>>>> +It is essential that step 1 is performed before booting the VM, >> otherwise >>>>> +the feature will not be enabled. >>>>> diff --git a/Documentation/topics/dpdk/vhost-user.rst >>>> b/Documentation/topics/dpdk/vhost-user.rst >>>>> index 8b1c671..8587370 100644 >>>>> --- a/Documentation/topics/dpdk/vhost-user.rst >>>>> +++ b/Documentation/topics/dpdk/vhost-user.rst >>>>> @@ -441,3 +441,64 @@ Sample XML >>>>> </domain> >>>>> >>>>> .. _QEMU documentation: http://git.qemu- >>>> project.org/?p=qemu.git;a=blob;f=docs/specs/vhost- >>>> user.txt;h=7890d7169;hb=HEAD >>>>> + >>>>> +vhost-user Dequeue Zero Copy >>>>> +------------------------------------- >>>>> + >>>>> +Normally when dequeuing a packet from a vHost User device, a >> memcpy >>>> operation >>>>> +must be used to copy that packet from guest address space to host >>>> address >>>>> +space. This memcpy can be removed by enabling dequeue zero-copy >> like >>>> so: >>>>> + >>>>> + $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero- >>>> copy=true >>>>> + >>>>> +With this feature enabled, a reference (pointer) to the packet is passed >> to >>>>> +the host, instead of a copy of the packet. Removing this memcpy can >> give >>>> a >>>>> +performance improvement for some use cases, for example switching >>>> large packets >>>>> +between different VMs. However additional packet loss may be >>>> observed. >>>>> + >>>>> +Note that the feature is disabled by default and must be explicitly >> enabled >>>>> +by using the command above. >>>>> + >>>>> +The feature cannot be enabled when the device is active (ie. VM >> booted). >>>> If >>>>> +you wish to enable the feature after the VM has booted, you must >>>> shutdown >>>>> +the VM and bring it back up. >>>>> + >>>>> +The same logic applies for disabling the feature - it must be disabled >> when >>>>> +the device is inactive, for example before VM boot. To disable the >>>> feature: >>>>> + >>>>> + $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero- >>>> copy=false >>>>> + >>>>> +The feature is available to both dpdkvhostuser and >> dpdkvhostuserclient >>>> port >>>>> +types. >>>>> + >>>>> +A limitation exists whereby if packets from a vHost port with dq-zero- >>>> copy=true >>>>> +are destined for a 'dpdk' type port, the number of tx descriptors >>>> (n_txq_desc) >>>>> +for that port must be reduced to a smaller number, 128 being the >>>> recommended >>>>> +value. This can be achieved by issuing the following command: >>>>> + >>>>> + $ ovs-vsctl set Interface dpdkport options:n_txq_desc=128 >>>>> + >>>>> +More information on the n_txq_desc option can be found in the "DPDK >>>> Physical >>>>> +Port Queue Sizes" section of the `intro/install/dpdk.rst` guide. >>>> >>>> I think some :ref: link should be here. >>>> >>>>> + >>>>> +The reason for this limitation is due to how the zero copy functionality >> is >>>>> +implemented. The vHost device's 'tx used vring', a virtio structure used >> for >>>>> +tracking used ie. sent descriptors, will only be updated when the NIC >>>> frees >>>>> +the corresponding mbuf. If we don't free the mbufs frequently >> enough, >>>> that >>>>> +vring will be starved and packets will no longer be processed. One way >> to >>>>> +ensure we don't encounter this scenario, is to configure n_txq_desc to >> a >>>> small >>>>> +enough number such that the 'mbuf free threshold' for the NIC will be >> hit >>>> more >>>>> +often and thus free mbufs more frequently. The value of 128 is >>>> suggested, but >>>>> +values of 64 and 256 have been tested and verified to work too, with >>>> differing >>>>> +performance characteristics. A value of 512 can be used too, if the >>>>> virtio >>>>> +queue size in the guest is increased to 1024 (available to configure in >>>> QEMU >>>>> +versions v2.10 and greater). This value can be set like so: >>>>> + >>>>> + $ qemu-system-x86_64 ... -chardev >>>> socket,id=char1,path=<sockpath>,server >>>>> + -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce >>>>> + -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1, >>>>> + tx_queue_size=1024 >>>>> + >>>>> +Further information can be found in the >>>>> +`DPDK documentation >>>>> >> +<http://dpdk.readthedocs.io/en/v17.05/prog_guide/vhost_lib.html>`__ >>>>> diff --git a/NEWS b/NEWS >>>>> index d45904e..50630f8 100644 >>>>> --- a/NEWS >>>>> +++ b/NEWS >>>>> @@ -18,6 +18,8 @@ Post-v2.8.0 >>>>> - DPDK: >>>>> * Add support for DPDK v17.11 >>>>> * Add support for vHost IOMMU >>>>> + * Optional dequeue zero copy feature for vHost ports can be >> enabled >>>> per >>>>> + port via the boolean 'dq-zero-copy' option. >>>>> >>>>> v2.8.0 - 31 Aug 2017 >>>>> -------------------- >>>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c >>>>> index 879019e..03cc6cb 100644 >>>>> --- a/lib/netdev-dpdk.c >>>>> +++ b/lib/netdev-dpdk.c >>>>> @@ -361,6 +361,9 @@ struct netdev_dpdk { >>>>> /* True if vHost device is 'up' and has been reconfigured at >>>>> least >> once >>>> */ >>>>> bool vhost_reconfigured; >>>>> /* 3 pad bytes here. */ >>>>> + >>>>> + /* True if dq-zero-copy feature has successfully been enabled */ >>>>> + bool dq_zc_enabled; >>>> >>>> "pad bytes" should be fixed above or removed entirely, or we may just >>>> revert padding >>>> here too because it's incorrect just like in dp_netdev_pmd_thread. The >> only >>>> difference >>>> is that struct netdev_dpdk aligned to cacheline. >>>> >>>>> ); >>>>> >>>>> PADDED_MEMBERS(CACHE_LINE_SIZE, >>>>> @@ -871,6 +874,7 @@ common_construct(struct netdev *netdev, >>>> dpdk_port_t port_no, >>>>> dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); >>>>> ovsrcu_index_init(&dev->vid, -1); >>>>> dev->vhost_reconfigured = false; >>>>> + dev->dq_zc_enabled = false; >>>>> dev->attached = false; >>>>> >>>>> ovsrcu_init(&dev->qos_conf, NULL); >>>>> @@ -1383,6 +1387,29 @@ netdev_dpdk_ring_set_config(struct netdev >>>> *netdev, const struct smap *args, >>>>> return 0; >>>>> } >>>>> >>>>> +static void >>>>> +dpdk_vhost_set_config_helper(struct netdev_dpdk *dev, >>>>> + const struct smap *args) >>>>> +{ >>>>> + bool needs_reconfigure = false; >>>>> + bool zc_requested = smap_get_bool(args, "dq-zero-copy", false); >>>>> + >>>>> + if (zc_requested && >>>>> + !(dev->vhost_driver_flags & >>>> RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) { >>>>> + dev->vhost_driver_flags |= >>>> RTE_VHOST_USER_DEQUEUE_ZERO_COPY; >>>>> + needs_reconfigure = true; >>>>> + } else if (!zc_requested && >>>>> + (dev->vhost_driver_flags & >>>> RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) { >>>>> + dev->vhost_driver_flags &= >>>> ~RTE_VHOST_USER_DEQUEUE_ZERO_COPY; >>>>> + needs_reconfigure = true; >>>>> + } >>>>> + >>>>> + /* Only try to change ZC mode when device is down */ >>>>> + if (needs_reconfigure && (netdev_dpdk_get_vid(dev) == -1)) { >>>>> + netdev_request_reconfigure(&dev->up); >>>>> + } >>>>> +} >>>>> + >>>>> static int >>>>> netdev_dpdk_vhost_client_set_config(struct netdev *netdev, >>>>> const struct smap *args, >>>>> @@ -1399,6 +1426,23 @@ netdev_dpdk_vhost_client_set_config(struct >>>> netdev *netdev, >>>>> netdev_request_reconfigure(netdev); >>>>> } >>>>> } >>>>> + >>>>> + dpdk_vhost_set_config_helper(dev, args); >>>>> + >>>>> + ovs_mutex_unlock(&dev->mutex); >>>>> + >>>>> + return 0; >>>>> +} >>>>> + >>>>> +static int >>>>> +netdev_dpdk_vhost_set_config(struct netdev *netdev, >>>>> + const struct smap *args, >>>>> + char **errp OVS_UNUSED) >>>>> +{ >>>>> + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >>>>> + >>>>> + ovs_mutex_lock(&dev->mutex); >>>>> + dpdk_vhost_set_config_helper(dev, args); >>>>> ovs_mutex_unlock(&dev->mutex); >>>>> >>>>> return 0; >>>>> @@ -2723,6 +2767,46 @@ netdev_dpdk_txq_map_clear(struct >>>> netdev_dpdk *dev) >>>>> } >>>>> } >>>>> >>>>> +static void >>>>> +vhost_change_zero_copy_mode(struct netdev_dpdk *dev, bool >>>> client_mode, >>>>> + bool enable) >>>>> +{ >>>>> + int err = rte_vhost_driver_unregister(dev->vhost_id); >>>>> + >>>>> + if (err) { >>>>> + VLOG_ERR("Error unregistering vHost socket %s; can't change zero >>>> copy " >>>>> + "mode", dev->vhost_id); >>>>> + } else { >>>>> + err = dpdk_setup_vhost_device(dev, client_mode); >>>>> + if (err) { >>>>> + VLOG_ERR("Error changing zero copy mode for vHost socket %s", >>>>> + dev->vhost_id); >>>>> + } else if (enable) { >>>>> + dev->dq_zc_enabled = true; >>>>> + VLOG_INFO("Zero copy enabled for vHost socket %s", dev- >>>>> vhost_id); >>>>> + } else { >>>>> + dev->dq_zc_enabled = false; >>>>> + VLOG_INFO("Zero copy disabled for vHost socket %s", dev- >>>>> vhost_id); >>>>> + } >>>>> + } >>>>> +} >>>>> + >>>>> +static void >>>>> +vhost_check_zero_copy_status(struct netdev_dpdk *dev) >>>>> +{ >>>>> + bool mode = dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT; >>>>> + >>>>> + if ((dev->vhost_driver_flags & >>>> RTE_VHOST_USER_DEQUEUE_ZERO_COPY) >>>>> + && !dev->dq_zc_enabled) { >>>>> + /* ZC disabled but requested to be enabled, enable it. */ >>>>> + vhost_change_zero_copy_mode(dev, mode, true); >>>>> + } else if (!(dev->vhost_driver_flags & >>>>> + RTE_VHOST_USER_DEQUEUE_ZERO_COPY) && dev- >>>>> dq_zc_enabled) { >>>>> + /* ZC enabled but requested to be disabled, disable it. */ >>>>> + vhost_change_zero_copy_mode(dev, mode, false); >>>>> + } >>>>> +} >>>>> + >>>>> /* >>>>> * Remove a virtio-net device from the specific vhost port. Use dev- >>>>> remove >>>>> * flag to stop any more packets from being sent or received to/from a >> VM >>>> and >>>>> @@ -2768,6 +2852,7 @@ destroy_device(int vid) >>>>> */ >>>>> ovsrcu_quiesce_start(); >>>>> VLOG_INFO("vHost Device '%s' has been removed", ifname); >>>>> + netdev_request_reconfigure(&dev->up); >>>>> } else { >>>>> VLOG_INFO("vHost Device '%s' not found", ifname); >>>>> } >>>>> @@ -3259,6 +3344,8 @@ dpdk_vhost_reconfigure_helper(struct >>>> netdev_dpdk *dev) >>>>> /* Carrier status may need updating. */ >>>>> netdev_change_seq_changed(&dev->up); >>>>> } >>>>> + } else { >>>>> + vhost_check_zero_copy_status(dev); >>>>> } >>>>> >>>>> return 0; >>>>> @@ -3421,7 +3508,7 @@ static const struct netdev_class >> dpdk_vhost_class >>>> = >>>>> NULL, >>>>> netdev_dpdk_vhost_construct, >>>>> netdev_dpdk_vhost_destruct, >>>>> - NULL, >>>>> + netdev_dpdk_vhost_set_config, >>>>> NULL, >>>>> netdev_dpdk_vhost_send, >>>>> netdev_dpdk_vhost_get_carrier, >>>>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml >>>>> index 4c317d0..e8409b4 100644 >>>>> --- a/vswitchd/vswitch.xml >>>>> +++ b/vswitchd/vswitch.xml >>>>> @@ -2669,6 +2669,17 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 >>>> type=patch options:peer=p1 \ >>>>> </p> >>>>> </column> >>>>> >>>>> + <column name="options" key="dq-zero-copy" >>>>> + type='{"type": "boolean"}'> >>>>> + <p> >>>>> + The value specifies whether or not to enable dequeue zero copy >> on >>>>> + the given interface. >>>>> + The port must be in an inactive state in order to enable or >>>>> disable >>>>> + this feature. >>>>> + Only supported by dpdkvhostuserclient and dpdkvhostuser >>>> interfaces. >>>>> + </p> >>>>> + </column> >>>>> + >>>>> <column name="options" key="n_rxq_desc" >>>>> type='{"type": "integer", "minInteger": 1, "maxInteger": >>>>> 4096}'> >>>>> <p> >>>>> _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev