Not a full review.

General thoughts:

If following conditions are true:

1. We don't need to add new feature to deprecated vhostuser port.

2. We actually don't need to have ability to change ZC config if 
vhost-server-path
   already configured.

   Let me explain this condition:
   To change 'dq-zero-copy' if VM already started we need to stop VM, change the
   'dq-zero-copy' and start the VM back. It's not much simpler than stop VM, 
re-add
   port with different value for 'dq-zero-copy' and start VM back. I'm assuming 
here
   that VM restart is much more complex and unlikely operation than port 
removing
   from OVS and adding back. This will require documenting, but we already have 
a
   bunch of docs about how to modify this config option in current version.

we may drop patch #1 from the series and use just something like following
code instead of code changes in this patch (not tested, just for a reference):

---------------
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 2325f0e..c4cbf7c 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -379,6 +379,9 @@ struct netdev_dpdk {
         /* True if vHost device is 'up' and has been reconfigured at least 
once */
         bool vhost_reconfigured;
         /* 3 pad bytes here. */
+
+        /* True if dq-zero-copy feature requested for vhost device. */
+        bool vhost_dq_zc;
     );
 
     PADDED_MEMBERS(CACHE_LINE_SIZE,
@@ -889,6 +892,7 @@ common_construct(struct netdev *netdev, dpdk_port_t port_no,
     dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
     ovsrcu_index_init(&dev->vid, -1);
     dev->vhost_reconfigured = false;
+    dev->vhost_dq_zc = false;
     dev->attached = false;
 
     ovsrcu_init(&dev->qos_conf, NULL);
@@ -1384,6 +1388,7 @@ netdev_dpdk_vhost_client_set_config(struct netdev *netdev,
         path = smap_get(args, "vhost-server-path");
         if (path && strcmp(path, dev->vhost_id)) {
             strcpy(dev->vhost_id, path);
+            dev->vhost_dq_zc = smap_get_bool(args, "dq-zero-copy", false);
             netdev_request_reconfigure(netdev);
         }
     }
@@ -3288,6 +3293,11 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev 
*netdev)
         if (dpdk_vhost_iommu_enabled()) {
             vhost_flags |= RTE_VHOST_USER_IOMMU_SUPPORT;
         }
+
+        /* Enable Zero Copy mode if configured. */
+        if (dev->vhost_dq_zc) {
+            vhost_flags |= RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
+        }
         err = rte_vhost_driver_register(dev->vhost_id, vhost_flags);
         if (err) {
             VLOG_ERR("vhost-user device setup failure for device %s\n",

---------------

Thoughts?


Best regards, Ilya Maximets.

P.S. Few comments about patch itself are inline.

On 08.12.2017 18:26, Ciara Loftus wrote:
> Enabled per port like so:
> ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true
> 
> The feature is disabled by default and can only be enabled/disabled when
> a vHost port is down.
> 
> When packets from a vHost device with zero copy enabled are destined for
> a 'dpdk' port, the number of tx descriptors on that 'dpdk' port must be
> set to a smaller value. 128 is recommended. This can be achieved like
> so:
> 
> ovs-vsctl set Interface dpdkport options:n_txq_desc=128
> 
> Signed-off-by: Ciara Loftus <ciara.lof...@intel.com>
> ---
> v5:
> * Rebase
> * Update docs with warning of potential packet loss
> * Update docs with info on NIC & Virtio descriptor values and working
> combinations
> 
>  Documentation/howto/dpdk.rst             | 33 ++++++++++++
>  Documentation/topics/dpdk/vhost-user.rst | 61 ++++++++++++++++++++++
>  NEWS                                     |  2 +
>  lib/netdev-dpdk.c                        | 89 
> +++++++++++++++++++++++++++++++-
>  vswitchd/vswitch.xml                     | 11 ++++
>  5 files changed, 195 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index d123819..3e1b8f8 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -709,3 +709,36 @@ devices to bridge ``br0``. Once complete, follow the 
> below steps:
>     Check traffic on multiple queues::
>  
>         $ cat /proc/interrupts | grep virtio
> +
> +PHY-VM-PHY (vHost Dequeue Zero Copy)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +vHost dequeue zero copy functionality can  be validated using the
> +PHY-VM-PHY configuration. To begin, follow the steps described in
> +:ref:`dpdk-phy-phy` to create and initialize the database, start
> +ovs-vswitchd and add ``dpdk``-type and ``dpdkvhostuser``-type devices
> +and flows to bridge ``br0``. Once complete, follow the below steps:
> +
> +1. Enable dequeue zero copy on the vHost devices.
> +
> +       $ ovs-vsctl set Interface dpdkvhostuser0 options:dq-zero-copy=true
> +       $ ovs-vsctl set Interface dpdkvhostuser1 options:dq-zero-copy=true
> +
> +The following log should be observed for each device:
> +
> +       netdev_dpdk|INFO|Zero copy enabled for vHost socket <name>
> +
> +2. Reduce the number of txq descriptors on the phy ports.
> +
> +       $ ovs-vsctl set Interface phy0 options:n_txq_desc=128
> +       $ ovs-vsctl set Interface phy1 options:n_txq_desc=128
> +
> +3. Proceed with the test by launching the VM and configuring guest
> +forwarding, be it via the vHost loopback method or kernel forwarding
> +method, and sending traffic. The following log should be oberved for
> +each device as it becomes active during VM boot:
> +
> +       VHOST_CONFIG: dequeue zero copy is enabled
> +
> +It is essential that step 1 is performed before booting the VM, otherwise
> +the feature will not be enabled.
> diff --git a/Documentation/topics/dpdk/vhost-user.rst 
> b/Documentation/topics/dpdk/vhost-user.rst
> index 8b1c671..8587370 100644
> --- a/Documentation/topics/dpdk/vhost-user.rst
> +++ b/Documentation/topics/dpdk/vhost-user.rst
> @@ -441,3 +441,64 @@ Sample XML
>      </domain>
>  
>  .. _QEMU documentation: 
> http://git.qemu-project.org/?p=qemu.git;a=blob;f=docs/specs/vhost-user.txt;h=7890d7169;hb=HEAD
> +
> +vhost-user Dequeue Zero Copy
> +-------------------------------------
> +
> +Normally when dequeuing a packet from a vHost User device, a memcpy operation
> +must be used to copy that packet from guest address space to host address
> +space. This memcpy can be removed by enabling dequeue zero-copy like so:
> +
> +    $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true
> +
> +With this feature enabled, a reference (pointer) to the packet is passed to
> +the host, instead of a copy of the packet. Removing this memcpy can give a
> +performance improvement for some use cases, for example switching large 
> packets
> +between different VMs. However additional packet loss may be observed.
> +
> +Note that the feature is disabled by default and must be explicitly enabled
> +by using the command above.
> +
> +The feature cannot be enabled when the device is active (ie. VM booted). If
> +you wish to enable the feature after the VM has booted, you must shutdown
> +the VM and bring it back up.
> +
> +The same logic applies for disabling the feature - it must be disabled when
> +the device is inactive, for example before VM boot. To disable the feature:
> +
> +    $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=false
> +
> +The feature is available to both dpdkvhostuser and dpdkvhostuserclient port
> +types.
> +
> +A limitation exists whereby if packets from a vHost port with 
> dq-zero-copy=true
> +are destined for a 'dpdk' type port, the number of tx descriptors 
> (n_txq_desc)
> +for that port must be reduced to a smaller number, 128 being the recommended
> +value. This can be achieved by issuing the following command:
> +
> +    $ ovs-vsctl set Interface dpdkport options:n_txq_desc=128
> +
> +More information on the n_txq_desc option can be found in the "DPDK Physical
> +Port Queue Sizes" section of the  `intro/install/dpdk.rst` guide.

I think some :ref: link should be here.

> +
> +The reason for this limitation is due to how the zero copy functionality is
> +implemented. The vHost device's 'tx used vring', a virtio structure used for
> +tracking used ie. sent descriptors, will only be updated when the NIC frees
> +the corresponding mbuf. If we don't free the mbufs frequently enough, that
> +vring will be starved and packets will no longer be processed. One way to
> +ensure we don't encounter this scenario, is to configure n_txq_desc to a 
> small
> +enough number such that the 'mbuf free threshold' for the NIC will be hit 
> more
> +often and thus free mbufs more frequently. The value of 128 is suggested, but
> +values of 64 and 256 have been tested and verified to work too, with 
> differing
> +performance characteristics. A value of 512 can be used too, if the virtio
> +queue size in the guest is increased to 1024 (available to configure in QEMU
> +versions v2.10 and greater). This value can be set like so:
> +
> +    $ qemu-system-x86_64 ... -chardev socket,id=char1,path=<sockpath>,server
> +      -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
> +      -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,
> +      tx_queue_size=1024
> +
> +Further information can be found in the
> +`DPDK documentation
> +<http://dpdk.readthedocs.io/en/v17.05/prog_guide/vhost_lib.html>`__
> diff --git a/NEWS b/NEWS
> index d45904e..50630f8 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -18,6 +18,8 @@ Post-v2.8.0
>     - DPDK:
>       * Add support for DPDK v17.11
>       * Add support for vHost IOMMU
> +     * Optional dequeue zero copy feature for vHost ports can be enabled per
> +       port via the boolean 'dq-zero-copy' option.
>  
>  v2.8.0 - 31 Aug 2017
>  --------------------
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 879019e..03cc6cb 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -361,6 +361,9 @@ struct netdev_dpdk {
>          /* True if vHost device is 'up' and has been reconfigured at least 
> once */
>          bool vhost_reconfigured;
>          /* 3 pad bytes here. */
> +
> +        /* True if dq-zero-copy feature has successfully been enabled */
> +        bool dq_zc_enabled;

"pad bytes" should be fixed above or removed entirely, or we may just revert 
padding
here too because it's incorrect just like in dp_netdev_pmd_thread. The only 
difference
is that struct netdev_dpdk aligned to cacheline.

>      );
>  
>      PADDED_MEMBERS(CACHE_LINE_SIZE,
> @@ -871,6 +874,7 @@ common_construct(struct netdev *netdev, dpdk_port_t 
> port_no,
>      dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
>      ovsrcu_index_init(&dev->vid, -1);
>      dev->vhost_reconfigured = false;
> +    dev->dq_zc_enabled = false;
>      dev->attached = false;
>  
>      ovsrcu_init(&dev->qos_conf, NULL);
> @@ -1383,6 +1387,29 @@ netdev_dpdk_ring_set_config(struct netdev *netdev, 
> const struct smap *args,
>      return 0;
>  }
>  
> +static void
> +dpdk_vhost_set_config_helper(struct netdev_dpdk *dev,
> +                             const struct smap *args)
> +{
> +    bool needs_reconfigure = false;
> +    bool zc_requested = smap_get_bool(args, "dq-zero-copy", false);
> +
> +    if (zc_requested &&
> +            !(dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) {
> +        dev->vhost_driver_flags |= RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
> +        needs_reconfigure = true;
> +    } else if (!zc_requested &&
> +            (dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) {
> +        dev->vhost_driver_flags &= ~RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
> +        needs_reconfigure = true;
> +    }
> +
> +    /* Only try to change ZC mode when device is down */
> +    if (needs_reconfigure && (netdev_dpdk_get_vid(dev) == -1)) {
> +        netdev_request_reconfigure(&dev->up);
> +    }
> +}
> +
>  static int
>  netdev_dpdk_vhost_client_set_config(struct netdev *netdev,
>                                      const struct smap *args,
> @@ -1399,6 +1426,23 @@ netdev_dpdk_vhost_client_set_config(struct netdev 
> *netdev,
>              netdev_request_reconfigure(netdev);
>          }
>      }
> +
> +    dpdk_vhost_set_config_helper(dev, args);
> +
> +    ovs_mutex_unlock(&dev->mutex);
> +
> +    return 0;
> +}
> +
> +static int
> +netdev_dpdk_vhost_set_config(struct netdev *netdev,
> +                             const struct smap *args,
> +                             char **errp OVS_UNUSED)
> +{
> +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> +
> +    ovs_mutex_lock(&dev->mutex);
> +    dpdk_vhost_set_config_helper(dev, args);
>      ovs_mutex_unlock(&dev->mutex);
>  
>      return 0;
> @@ -2723,6 +2767,46 @@ netdev_dpdk_txq_map_clear(struct netdev_dpdk *dev)
>      }
>  }
>  
> +static void
> +vhost_change_zero_copy_mode(struct netdev_dpdk *dev, bool client_mode,
> +                            bool enable)
> +{
> +    int err = rte_vhost_driver_unregister(dev->vhost_id);
> +
> +    if (err) {
> +        VLOG_ERR("Error unregistering vHost socket %s; can't change zero 
> copy "
> +                "mode", dev->vhost_id);
> +    } else {
> +        err = dpdk_setup_vhost_device(dev, client_mode);
> +        if (err) {
> +            VLOG_ERR("Error changing zero copy mode for vHost socket %s",
> +                    dev->vhost_id);
> +        } else if (enable) {
> +            dev->dq_zc_enabled = true;
> +            VLOG_INFO("Zero copy enabled for vHost socket %s", 
> dev->vhost_id);
> +        } else {
> +            dev->dq_zc_enabled = false;
> +            VLOG_INFO("Zero copy disabled for vHost socket %s", 
> dev->vhost_id);
> +        }
> +    }
> +}
> +
> +static void
> +vhost_check_zero_copy_status(struct netdev_dpdk *dev)
> +{
> +    bool mode = dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT;
> +
> +    if ((dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY)
> +                && !dev->dq_zc_enabled) {
> +        /* ZC disabled but requested to be enabled, enable it. */
> +        vhost_change_zero_copy_mode(dev, mode, true);
> +    } else if (!(dev->vhost_driver_flags &
> +            RTE_VHOST_USER_DEQUEUE_ZERO_COPY) && dev->dq_zc_enabled) {
> +        /* ZC enabled but requested to be disabled, disable it. */
> +        vhost_change_zero_copy_mode(dev, mode, false);
> +    }
> +}
> +
>  /*
>   * Remove a virtio-net device from the specific vhost port.  Use dev->remove
>   * flag to stop any more packets from being sent or received to/from a VM and
> @@ -2768,6 +2852,7 @@ destroy_device(int vid)
>           */
>          ovsrcu_quiesce_start();
>          VLOG_INFO("vHost Device '%s' has been removed", ifname);
> +        netdev_request_reconfigure(&dev->up);
>      } else {
>          VLOG_INFO("vHost Device '%s' not found", ifname);
>      }
> @@ -3259,6 +3344,8 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev)
>              /* Carrier status may need updating. */
>              netdev_change_seq_changed(&dev->up);
>          }
> +    } else {
> +        vhost_check_zero_copy_status(dev);
>      }
>  
>      return 0;
> @@ -3421,7 +3508,7 @@ static const struct netdev_class dpdk_vhost_class =
>          NULL,
>          netdev_dpdk_vhost_construct,
>          netdev_dpdk_vhost_destruct,
> -        NULL,
> +        netdev_dpdk_vhost_set_config,
>          NULL,
>          netdev_dpdk_vhost_send,
>          netdev_dpdk_vhost_get_carrier,
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 4c317d0..e8409b4 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -2669,6 +2669,17 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 
> type=patch options:peer=p1 \
>          </p>
>        </column>
>  
> +      <column name="options" key="dq-zero-copy"
> +              type='{"type": "boolean"}'>
> +        <p>
> +          The value specifies whether or not to enable dequeue zero copy on
> +          the given interface.
> +          The port must be in an inactive state in order to enable or disable
> +          this feature.
> +          Only supported by dpdkvhostuserclient and dpdkvhostuser interfaces.
> +        </p>
> +      </column>
> +
>        <column name="options" key="n_rxq_desc"
>                type='{"type": "integer", "minInteger": 1, "maxInteger": 
> 4096}'>
>          <p>
> 
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to