On Mon, 2018-04-09 at 15:16 +0000, Stokes, Ian wrote: > > This continues the breakup of the huge DPDK "howto" into smaller > > components. There are a couple of related changes included, such as using > > "Rx queue" instead of "rxq" and noting how Tx queues cannot be configured. > > > > We enable the TODO directive, so we can actually start calling out some > > TODOs. > > > > Signed-off-by: Stephen Finucane <step...@that.guru> > > --- > > Documentation/conf.py | 2 +- > > Documentation/howto/dpdk.rst | 86 ------------------- > > Documentation/topics/dpdk/index.rst | 1 + > > Documentation/topics/dpdk/phy.rst | 10 +++ > > Documentation/topics/dpdk/pmd.rst | 139 > > +++++++++++++++++++++++++++++++ > > Documentation/topics/dpdk/vhost-user.rst | 17 ++-- > > 6 files changed, 159 insertions(+), 96 deletions(-) create mode 100644 > > Documentation/topics/dpdk/pmd.rst > > > > diff --git a/Documentation/conf.py b/Documentation/conf.py index > > 6ab144c5d..babda21de 100644 > > --- a/Documentation/conf.py > > +++ b/Documentation/conf.py > > @@ -32,7 +32,7 @@ needs_sphinx = '1.1' > > # Add any Sphinx extension module names here, as strings. They can be # > > extensions coming with Sphinx (named 'sphinx.ext.*') or your custom # > > ones. > > -extensions = [] > > +extensions = ['sphinx.ext.todo'] > > > > # Add any paths that contain templates here, relative to this directory. > > templates_path = ['_templates'] > > diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst > > index d717d2ebe..c2324118d 100644 > > --- a/Documentation/howto/dpdk.rst > > +++ b/Documentation/howto/dpdk.rst > > @@ -81,92 +81,6 @@ To stop ovs-vswitchd & delete bridge, run:: > > $ ovs-appctl -t ovsdb-server exit > > $ ovs-vsctl del-br br0 > > > > -PMD Thread Statistics > > ---------------------- > > - > > -To show current stats:: > > - > > - $ ovs-appctl dpif-netdev/pmd-stats-show > > - > > -To clear previous stats:: > > - > > - $ ovs-appctl dpif-netdev/pmd-stats-clear > > - > > -Port/RXQ Assigment to PMD Threads > > ---------------------------------- > > - > > -To show port/rxq assignment:: > > - > > - $ ovs-appctl dpif-netdev/pmd-rxq-show > > - > > -To change default rxq assignment to pmd threads, rxqs may be manually > > pinned to -desired cores using:: > > - > > - $ ovs-vsctl set Interface <iface> \ > > - other_config:pmd-rxq-affinity=<rxq-affinity-list> > > - > > -where: > > - > > -- ``<rxq-affinity-list>`` is a CSV list of ``<queue-id>:<core-id>`` > > values > > - > > -For example:: > > - > > - $ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \ > > - other_config:pmd-rxq-affinity="0:3,1:7,3:8" > > - > > -This will ensure: > > - > > -- Queue #0 pinned to core 3 > > -- Queue #1 pinned to core 7 > > -- Queue #2 not pinned > > -- Queue #3 pinned to core 8 > > - > > -After that PMD threads on cores where RX queues was pinned will become - > > ``isolated``. This means that this thread will poll only pinned RX queues. > > - > > -.. warning:: > > - If there are no ``non-isolated`` PMD threads, ``non-pinned`` RX queues > > will > > - not be polled. Also, if provided ``core_id`` is not available (ex. this > > - ``core_id`` not in ``pmd-cpu-mask``), RX queue will not be polled by > > any PMD > > - thread. > > - > > -If pmd-rxq-affinity is not set for rxqs, they will be assigned to pmds > > (cores) -automatically. The processing cycles that have been stored for > > each rxq -will be used where known to assign rxqs to pmd based on a round > > robin of the -sorted rxqs. > > - > > -For example, in the case where here there are 5 rxqs and 3 cores (e.g. > > 3,7,8) -available, and the measured usage of core cycles per rxq over the > > last -interval is seen to be: > > - > > -- Queue #0: 30% > > -- Queue #1: 80% > > -- Queue #3: 60% > > -- Queue #4: 70% > > -- Queue #5: 10% > > - > > -The rxqs will be assigned to cores 3,7,8 in the following order: > > - > > -Core 3: Q1 (80%) | > > -Core 7: Q4 (70%) | Q5 (10%) > > -core 8: Q3 (60%) | Q0 (30%) > > - > > -To see the current measured usage history of pmd core cycles for each > > rxq:: > > - > > - $ ovs-appctl dpif-netdev/pmd-rxq-show > > - > > -.. note:: > > - > > - A history of one minute is recorded and shown for each rxq to allow for > > - traffic pattern spikes. An rxq's pmd core cycles usage changes due to > > traffic > > - pattern or reconfig changes will take one minute before they are fully > > - reflected in the stats. > > - > > -Rxq to pmds assignment takes place whenever there are configuration > > changes -or can be triggered by using:: > > - > > - $ ovs-appctl dpif-netdev/pmd-rxq-rebalance > > - > > QoS > > --- > > > > diff --git a/Documentation/topics/dpdk/index.rst > > b/Documentation/topics/dpdk/index.rst > > index 5f836a6e9..dfde88377 100644 > > --- a/Documentation/topics/dpdk/index.rst > > +++ b/Documentation/topics/dpdk/index.rst > > @@ -31,3 +31,4 @@ The DPDK Datapath > > phy > > vhost-user > > ring > > + pmd > > diff --git a/Documentation/topics/dpdk/phy.rst > > b/Documentation/topics/dpdk/phy.rst > > index 1c18e4e3d..222fa3e9f 100644 > > --- a/Documentation/topics/dpdk/phy.rst > > +++ b/Documentation/topics/dpdk/phy.rst > > @@ -109,3 +109,13 @@ tool:: > > For more information, refer to the `DPDK documentation <dpdk-drivers>`__. > > > > .. _dpdk-drivers: http://dpdk.org/doc/guides/linux_gsg/linux_drivers.html > > + > > +Multiqueue > > +---------- > > + > > +Poll Mode Driver (PMD) threads are the threads that do the heavy > > +lifting for the DPDK datapath. Correct configuration of PMD threads and > > +the Rx queues they utilize is a requirement in order to deliver the > > +high-performance possible with the DPDK datapath. It is possible to > > +configure multiple Rx queues for ``dpdk`` ports, thus ensuring this is > > +not a bottleneck for performance. For information on configuring PMD > > threads, refer to :doc:`pmd`. > > diff --git a/Documentation/topics/dpdk/pmd.rst > > b/Documentation/topics/dpdk/pmd.rst > > new file mode 100644 > > index 000000000..e15e8cc3b > > --- /dev/null > > +++ b/Documentation/topics/dpdk/pmd.rst > > @@ -0,0 +1,139 @@ > > +.. > > + Licensed under the Apache License, Version 2.0 (the "License"); you > > may > > + not use this file except in compliance with the License. You may > > obtain > > + a copy of the License at > > + > > + http://www.apache.org/licenses/LICENSE-2.0 > > + > > + Unless required by applicable law or agreed to in writing, software > > + distributed under the License is distributed on an "AS IS" BASIS, > > WITHOUT > > + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > > See the > > + License for the specific language governing permissions and > > limitations > > + under the License. > > + > > + Convention for heading levels in Open vSwitch documentation: > > + > > + ======= Heading 0 (reserved for the title in a document) > > + ------- Heading 1 > > + ~~~~~~~ Heading 2 > > + +++++++ Heading 3 > > + ''''''' Heading 4 > > + > > + Avoid deeper levels because they do not render well. > > + > > +=========== > > +PMD Threads > > +=========== > > + > > +Poll Mode Driver (PMD) threads are the threads that do the heavy > > +lifting for the DPDK datapath and perform tasks such as continuous > > +polling of input ports for packets, classifying packets once received, > > +and executing actions on the packets once they are classified. > > + > > +PMD threads utilize Receive (Rx) and Transmit (Tx) queues, commonly > > +known as *rxq*\s and *txq*\s. While Tx queue configuration happens > > +automatically, Rx queues can be configured by the user. This can happen > > in one of two ways: > > Just on above, could be a to-do but it's a good opportunity to add a > note on the "automatic" behavior of tx queues, number created and how > it relates to the number of PMDs etc. Could be a separate section in > the PMD doc.
Yeah, if it's OK with you I'll add this as a TODO and then work with you to write this additional section. > > + > > +- For physical interfaces, configuration is done using the > > + :program:`ovs-appctl` utility. > > + > > +- For virtual interfaces, configuration is done using the > > +:program:`ovs-appctl` > > + utility, but this configuration must be reflected in the guest > > +configuration > > + (e.g. QEMU command line arguments). > > + > > +The :program:`ovs-appctl` utility also provides a number of commands > > +for querying PMD threads and their respective queues. This, and all of > > +the above, is discussed here. > > + > > +PMD Thread Statistics > > +--------------------- > > + > > +To show current stats:: > > + > > + $ ovs-appctl dpif-netdev/pmd-stats-show > > + > > +To clear previous stats:: > > + > > + $ ovs-appctl dpif-netdev/pmd-stats-clear > > + > > +Port/Rx Queue Assigment to PMD Threads > > +-------------------------------------- > > + > > +.. todo:: > > + > > + This needs a more detailed overview of *why* this should be done, > > along with > > + the impact on things like NUMA affinity. > > + > > +To show port/RX queue assignment:: > > + > > + $ ovs-appctl dpif-netdev/pmd-rxq-show > > + > > +Rx queues may be manually pinned to cores. This will change the default > > +Rx queue assignment to PMD threads:: > > + > > + $ ovs-vsctl set Interface <iface> \ > > + other_config:pmd-rxq-affinity=<rxq-affinity-list> > > + > > +where: > > + > > +- ``<rxq-affinity-list>`` is a CSV list of ``<queue-id>:<core-id>`` > > +values > > + > > +For example:: > > + > > + $ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \ > > + other_config:pmd-rxq-affinity="0:3,1:7,3:8" > > + > > +This will ensure there are *4* Rx queues and that these queues are > > +configured like so: > > + > > +- Queue #0 pinned to core 3 > > +- Queue #1 pinned to core 7 > > +- Queue #2 not pinned > > +- Queue #3 pinned to core 8 > > + > > +PMD threads on cores where Rx queues are *pinned* will become > > +*isolated*. This means that this thread will only poll the *pinned* Rx > > queues. > > + > > +.. warning:: > > + > > + If there are no *non-isolated* PMD threads, *non-pinned* RX queues > > + will not be polled. Also, if the provided ``<core-id>`` is not > > + available (e.g. the ``<core-id>`` is not in ``pmd-cpu-mask``), the RX > > + queue will not be polled by any PMD thread. > > + > > +If ``pmd-rxq-affinity`` is not set for Rx queues, they will be assigned > > +to PMDs > > +(cores) automatically. Where known, the processing cycles that have > > +been stored for each Rx queue will be used to assign Rx queue to PMDs > > +based on a round robin of the sorted Rx queues. For example, take the > > +following example, where there are five Rx queues and three cores - 3, > > +7, and 8 - available and the measured usage of core cycles per Rx queue > > +over the last interval is seen to > > +be: > > + > > +- Queue #0: 30% > > +- Queue #1: 80% > > +- Queue #3: 60% > > +- Queue #4: 70% > > +- Queue #5: 10% > > + > > +The Rx queues will be assigned to the cores in the following order: > > + > > +Core 3: Q1 (80%) | > > +Core 7: Q4 (70%) | Q5 (10%) > > +core 8: Q3 (60%) | Q0 (30%) > > + > > This functionality was introduced in OVS 2.8. Do we need to warn the > user with a versionchanged:: 2.8.0 and that it's unavailable prior to > this? The behavior in that case was round robin without taking > processing cycles into consideration. There would also be no history > tracking for the stats and no pmd rebalance command. Yes, I'll add this. > > +To see the current measured usage history of PMD core cycles for each > > +Rx > > +queue:: > > + > > + $ ovs-appctl dpif-netdev/pmd-rxq-show > > + > > +.. note:: > > + > > + A history of one minute is recorded and shown for each Rx queue to > > + allow for traffic pattern spikes. Any changes in the Rx queue's PMD > > + core cycles usage, due to traffic pattern or reconfig changes, will > > + take one minute to be fully reflected in the stats. > > + > > +Rx queue to PMD assignment takes place whenever there are configuration > > +changes or can be triggered by using:: > > + > > + $ ovs-appctl dpif-netdev/pmd-rxq-rebalance > > We should probably flag to users considerations for PMD and multi queue > specific to phy and vhost ports. > > Perhaps a link to the specific documents below along with the heads up: > > Documentation/topics/dpdk/vhost-user.rst > Documentation/topics/dpdk/phy.rst Yup, good call. Done. Stephen > Ian > > > diff --git a/Documentation/topics/dpdk/vhost-user.rst > > b/Documentation/topics/dpdk/vhost-user.rst > > index 95517a676..d84d99246 100644 > > --- a/Documentation/topics/dpdk/vhost-user.rst > > +++ b/Documentation/topics/dpdk/vhost-user.rst > > @@ -127,11 +127,10 @@ an additional set of parameters:: > > -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce > > -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2 > > > > -In addition, QEMU must allocate the VM's memory on hugetlbfs. > > vhost-user > > -ports access a virtio-net device's virtual rings and packet buffers > > mapping the -VM's physical memory on hugetlbfs. To enable vhost-user ports > > to map the VM's -memory into their process address space, pass the > > following parameters to > > -QEMU:: > > +In addition, QEMU must allocate the VM's memory on hugetlbfs. > > +vhost-user ports access a virtio-net device's virtual rings and packet > > +buffers mapping the VM's physical memory on hugetlbfs. To enable > > +vhost-user ports to map the VM's memory into their process address space, > > pass the following parameters to QEMU:: > > > > -object memory-backend-file,id=mem,size=4096M,mem- > > path=/dev/hugepages,share=on > > -numa node,memdev=mem -mem-prealloc @@ -151,18 +150,18 @@ where: > > The number of vectors, which is ``$q`` * 2 + 2 > > > > The vhost-user interface will be automatically reconfigured with required > > -number of rx and tx queues after connection of virtio device. Manual > > +number of Rx and Tx queues after connection of virtio device. Manual > > configuration of ``n_rxq`` is not supported because OVS will work > > properly only if ``n_rxq`` will match number of queues configured in > > QEMU. > > > > -A least 2 PMDs should be configured for the vswitch when using > > multiqueue. > > +A least two PMDs should be configured for the vswitch when using > > multiqueue. > > Using a single PMD will cause traffic to be enqueued to the same vhost > > queue rather than being distributed among different vhost queues for a > > vhost-user interface. > > > > If traffic destined for a VM configured with multiqueue arrives to the > > vswitch -via a physical DPDK port, then the number of rxqs should also be > > set to at -least 2 for that physical DPDK port. This is required to > > increase the > > +via a physical DPDK port, then the number of Rx queues should also be > > +set to at least two for that physical DPDK port. This is required to > > +increase the > > probability that a different PMD will handle the multiqueue transmission > > to the guest using a different vhost queue. > > > > -- > > 2.14.3 > > > > _______________________________________________ > > dev mailing list > > d...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev