On 21.09.2016 20:37, Loftus, Ciara wrote: >> >> Few comments inline. > > Thanks for the feedback Ilya. > >> >>> The 'options:n_rxq_desc' and 'n_txq_desc' fields allow the number of rx >>> and tx descriptors for dpdk ports to be modified. By default the values >>> are set to 2048, but can be modified to an integer between 1 and 4096 >>> that is a power of two. The values can be modified at runtime, however >>> require the NIC to restart when changed. >>> >>> Signed-off-by: Ciara Loftus <ciara.loftus at intel.com> >>> >>> --- >>> v3: >>> * Make queue sizes per-port rather than global >>> * Check if queue size is power of 2 - fail if so. >>> >>> v2: >>> * Rebase >>> >>> INSTALL.DPDK-ADVANCED.md | 16 ++++++++++++++-- >>> NEWS | 2 ++ >>> lib/netdev-dpdk.c | 48 >> +++++++++++++++++++++++++++++++++++++++++++----- >>> vswitchd/vswitch.xml | 22 ++++++++++++++++++++++ >>> 4 files changed, 81 insertions(+), 7 deletions(-) >>> >>> diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md >>> index d7b9873..488e84f 100644 >>> --- a/INSTALL.DPDK-ADVANCED.md >>> +++ b/INSTALL.DPDK-ADVANCED.md >>> @@ -257,7 +257,19 @@ needs to be affinitized accordingly. >>> The rx queues are assigned to pmd threads on the same NUMA node in a >>> round-robin fashion. >>> >>> -### 4.4 Exact Match Cache >>> +### 4.4 DPDK Physical Port Queue Sizes >>> + `ovs-vsctl set Interface dpdk0 options:n_rxq_desc=<integer>` >>> + `ovs-vsctl set Interface dpdk0 options:n_txq_desc=<integer>` >>> + >>> + The command above sets the number of rx/tx descriptors that the NIC >>> + associated with dpdk0 will be initialised with. >>> + >>> + Different 'n_rxq_desc' and 'n_txq_desc' configurations yield different >>> + benefits in terms of throughput and latency for different scenarios. >>> + Generally, smaller queue sizes can have a positive impact for latency at >> the >>> + expense of throughput. The opposite is often true for larger queue sizes. >> >> Here we can mention that increasing the number of rx descriptors may lead >> to performance degradation because of using non-vectorized rx functions. >> At least this is true for i40e and maybe ixgbe dpdk drivers. Setting >> 'n_rxq_desc=4096' for them will lead to disabling of vectorized rx. > > It seems the same applies for ixgbe (IXGBE_MAX_RING_DESC=4096) > http://dpdk.org/doc/guides/nics/ixgbe.html#rx-constraints > I will include this info in the next version. > >> >>> + >>> +### 4.5 Exact Match Cache >>> >>> Each pmd thread contains one EMC. After initial flow setup in the >>> datapath, the EMC contains a single table and provides the lowest level >>> @@ -274,7 +286,7 @@ needs to be affinitized accordingly. >>> avoiding datapath classifier lookups is to have multiple pmd threads >>> running. This can be done as described in section 4.2. >>> >>> -### 4.5 Rx Mergeable buffers >>> +### 4.6 Rx Mergeable buffers >>> >>> Rx Mergeable buffers is a virtio feature that allows chaining of multiple >>> virtio descriptors to handle large packet sizes. As such, large packets >>> diff --git a/NEWS b/NEWS >>> index 21ab538..901886d 100644 >>> --- a/NEWS >>> +++ b/NEWS >>> @@ -125,6 +125,8 @@ v2.6.0 - xx xxx xxxx >>> * Remove dpdkvhostcuse port type. >>> * OVS client mode for vHost and vHost reconnect (Requires QEMU 2.7) >>> * 'dpdkvhostuserclient' port type. >>> + * New option 'n_rxq_desc' and 'n_txq_desc' fields for DPDK interfaces >>> + which set the number of rx and tx descriptors to use for the given >> port. >>> - Increase number of registers to 16. >>> - ovs-benchmark: This utility has been removed due to lack of use and >>> bitrot. >>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c >>> index 89bdc4d..228993f 100644 >>> --- a/lib/netdev-dpdk.c >>> +++ b/lib/netdev-dpdk.c >>> @@ -132,8 +132,9 @@ BUILD_ASSERT_DECL((MAX_NB_MBUF / >> ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF)) >>> >>> #define SOCKET0 0 >>> >>> -#define NIC_PORT_RX_Q_SIZE 2048 /* Size of Physical NIC RX Queue, >> Max (n+32<=4096)*/ >>> -#define NIC_PORT_TX_Q_SIZE 2048 /* Size of Physical NIC TX Queue, >> Max (n+32<=4096)*/ >>> +#define NIC_PORT_DEFAULT_RXQ_SIZE 2048 /* Default size of Physical >> NIC RXQ */ >>> +#define NIC_PORT_DEFAULT_TXQ_SIZE 2048 /* Default size of Physical >> NIC TXQ */ >>> +#define NIC_PORT_MAX_Q_SIZE 4096 /* Maximum size of Physical >> NIC Queue */ >>> >>> #define OVS_VHOST_MAX_QUEUE_NUM 1024 /* Maximum number of >> vHost TX queues. */ >>> #define OVS_VHOST_QUEUE_MAP_UNKNOWN (-1) /* Mapping not >> initialized. */ >>> @@ -372,6 +373,12 @@ struct netdev_dpdk { >>> int requested_mtu; >>> int requested_n_txq; >>> int requested_n_rxq; >>> + int requested_rxq_size; >>> + int requested_txq_size; >>> + >>> + /* Number of rx/tx descriptors for physical devices */ >>> + int rxq_size; >>> + int txq_size; >>> >>> /* Socket ID detected when vHost device is brought up */ >>> int requested_socket_id; >>> @@ -646,7 +653,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk >> *dev, int n_rxq, int n_txq) >>> } >>> >>> for (i = 0; i < n_txq; i++) { >>> - diag = rte_eth_tx_queue_setup(dev->port_id, i, >> NIC_PORT_TX_Q_SIZE, >>> + diag = rte_eth_tx_queue_setup(dev->port_id, i, dev->txq_size, >>> dev->socket_id, NULL); >>> if (diag) { >>> VLOG_INFO("Interface %s txq(%d) setup error: %s", >>> @@ -662,7 +669,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk >> *dev, int n_rxq, int n_txq) >>> } >>> >>> for (i = 0; i < n_rxq; i++) { >>> - diag = rte_eth_rx_queue_setup(dev->port_id, i, >> NIC_PORT_RX_Q_SIZE, >>> + diag = rte_eth_rx_queue_setup(dev->port_id, i, dev->rxq_size, >>> dev->socket_id, NULL, >>> dev->dpdk_mp->mp); >>> if (diag) { >>> @@ -837,6 +844,10 @@ netdev_dpdk_init(struct netdev *netdev, >> unsigned int port_no, >>> netdev->n_txq = NR_QUEUE; >>> dev->requested_n_rxq = netdev->n_rxq; >>> dev->requested_n_txq = netdev->n_txq; >>> + dev->rxq_size = NIC_PORT_DEFAULT_RXQ_SIZE; >>> + dev->txq_size = NIC_PORT_DEFAULT_TXQ_SIZE; >>> + dev->requested_rxq_size = dev->rxq_size; >>> + dev->requested_txq_size = dev->txq_size; >>> >>> /* Initialize the flow control to NULL */ >>> memset(&dev->fc_conf, 0, sizeof dev->fc_conf); >>> @@ -1051,6 +1062,8 @@ netdev_dpdk_get_config(const struct netdev >> *netdev, struct smap *args) >>> smap_add_format(args, "configured_rx_queues", "%d", netdev- >>> n_rxq); >>> smap_add_format(args, "requested_tx_queues", "%d", dev- >>> requested_n_txq); >>> smap_add_format(args, "configured_tx_queues", "%d", netdev- >>> n_txq); >>> + smap_add_format(args, "rxq_descriptors", "%d", dev->rxq_size); >>> + smap_add_format(args, "txq_descriptors", "%d", dev->txq_size); >>> smap_add_format(args, "mtu", "%d", dev->mtu); >>> ovs_mutex_unlock(&dev->mutex); >>> >>> @@ -1069,6 +1082,21 @@ dpdk_set_rxq_config(struct netdev_dpdk *dev, >> const struct smap *args) >>> } >>> } >>> >>> +static void >>> +dpdk_process_queue_size(struct netdev *netdev, const struct smap >> *args, >>> + char *flag, int *new_size) >>> +{ >>> + int queue_size; >>> + >>> + queue_size = smap_get_int(args, flag, 0); >>> + if (queue_size > 0 && queue_size <= NIC_PORT_MAX_Q_SIZE >>> + && rte_is_power_of_2(queue_size) >> >> I'm suggesting to use OVS internal functions to do the check: 'IS_POW2()' >> macro or function 'is_pow2()'. >> >>> + && queue_size != *new_size) { >>> + *new_size = queue_size; >>> + netdev_request_reconfigure(netdev); >>> + } >> >> Also, some sane error message would be nice in case of wrong value. > > This is generally avoided for set_config since it is called so often. An > error isn't printed in when setting n_rxq for example. > If the user sets a bogus value you can end up with many prints of the same > log. And these logs again later when you set another value eg. n_rxq. > For this reason I don't plan to make this change. > > Thanks, > Ciara
Yes, you're right. My bad. >> >>> +} >>> + >>> static int >>> netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args) >>> { >>> @@ -1078,6 +1106,11 @@ netdev_dpdk_set_config(struct netdev >> *netdev, const struct smap *args) >>> >>> dpdk_set_rxq_config(dev, args); >>> >>> + dpdk_process_queue_size(netdev, args, "n_rxq_desc", >>> + &dev->requested_rxq_size); >>> + dpdk_process_queue_size(netdev, args, "n_txq_desc", >>> + &dev->requested_txq_size); >>> + >>> /* Flow control support is only available for DPDK Ethernet ports. */ >>> bool rx_fc_en = false; >>> bool tx_fc_en = false; >>> @@ -2923,7 +2956,9 @@ netdev_dpdk_reconfigure(struct netdev >> *netdev) >>> >>> if (netdev->n_txq == dev->requested_n_txq >>> && netdev->n_rxq == dev->requested_n_rxq >>> - && dev->mtu == dev->requested_mtu) { >>> + && dev->mtu == dev->requested_mtu >>> + && dev->rxq_size == dev->requested_rxq_size >>> + && dev->txq_size == dev->requested_txq_size) { >>> /* Reconfiguration is unnecessary */ >>> >>> goto out; >>> @@ -2938,6 +2973,9 @@ netdev_dpdk_reconfigure(struct netdev >> *netdev) >>> netdev->n_txq = dev->requested_n_txq; >>> netdev->n_rxq = dev->requested_n_rxq; >>> >>> + dev->rxq_size = dev->requested_rxq_size; >>> + dev->txq_size = dev->requested_txq_size; >>> + >>> rte_free(dev->tx_q); >>> err = dpdk_eth_dev_init(dev); >>> netdev_dpdk_alloc_txq(dev, netdev->n_txq); >>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml >>> index e73023d..1e96ada 100644 >>> --- a/vswitchd/vswitch.xml >>> +++ b/vswitchd/vswitch.xml >>> @@ -2375,6 +2375,28 @@ >>> Only supported by dpdkvhostuserclient interfaces. >>> </p> >>> </column> >>> + >>> + <column name="options" key="n_rxq_desc" >>> + type='{"type": "integer", "minInteger": 1, "maxInteger": >>> 4096}'> >>> + <p> >>> + Specifies the rx queue size (number rx descriptors) for dpdk >>> ports. >>> + The value must be a multiple of 2, less than 4096 and supported >> >> s/multiple/power/ >> >>> + by the hardware of the device being configured. >>> + If not specified or an incorrect value is specified, 2048 rx >>> + descriptors will be used by default. >>> + </p> >>> + </column> >>> + >>> + <column name="options" key="n_txq_desc" >>> + type='{"type": "integer", "minInteger": 1, "maxInteger": >>> 4096}'> >>> + <p> >>> + Specifies the tx queue size (number tx descriptors) for dpdk >>> ports. >>> + The value must be a multiple of 2, less than 4096 and supported >> >> s/multiple/power/ >> >>> + by the hardware of the device being configured. >>> + If not specified or an incorrect value is specified, 2048 tx >>> + descriptors will be used by default. >>> + </p> >>> + </column> >>> </group> >>> >>> <group title="MTU"> >>> -- >>> 2.4.3 > _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev