Thanks Herbert for the review, please see my reply inline.

> -----Original Message-----
> From: Thomas F Herbert [mailto:thomasfherb...@gmail.com]
> Sent: Thursday, May 12, 2016 6:56 PM
> To: Bodireddy, Bhanuprakash <bhanuprakash.bodire...@intel.com>;
> dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH 2/2] doc: Refactor DPDK install guide, add
> ADVANCED doc
> 
> On 5/9/16 2:32 AM, Bhanuprakash Bodireddy wrote:
> > Add INSTALL.DPDK-ADVANCED document that is forked off from original
> > INSTALL.DPDK guide. This document is targeted at users looking for
> > optimum performance on OVS using dpdk datapath.
> Thanks for this effort.
> >
> > Signed-off-by: Bhanuprakash Bodireddy
> > <bhanuprakash.bodire...@intel.com>
> > ---
> >   INSTALL.DPDK-ADVANCED.md | 809
> +++++++++++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 809 insertions(+)
> >   create mode 100644 INSTALL.DPDK-ADVANCED.md
> >
> > diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
> new
> > file mode 100644 index 0000000..dd09d36
> > --- /dev/null
> > +++ b/INSTALL.DPDK-ADVANCED.md
> > @@ -0,0 +1,809 @@
> > +OVS DPDK ADVANCED INSTALL GUIDE
> > +=================================
> > +
> > +## Contents
> > +
> > +1. [Overview](#overview)
> > +2. [Building Shared Library](#build)
> > +3. [System configuration](#sysconf)
> > +4. [Performance Tuning](#perftune)
> > +5. [OVS Testcases](#ovstc)
> > +6. [Vhost Walkthrough](#vhost)
> > +7. [QOS](#qos)
> > +8. [Static Code Analysis](#staticanalyzer) 9. [Vsperf](#vsperf)
> > +
> > +## <a name="overview"></a> 1. Overview
> > +
> > +The Advanced Install Guide explains how to improve OVS performance
> > +using DPDK datapath. This guide also provides information on tuning,
> > +system configuration, troubleshooting, static code analysis and testcases.
> > +
> > +## <a name="build"></a> 2. Building Shared Library
> > +
> > +DPDK can be built as static or shared library and shall be linked by
> > +applications using DPDK datapath. The section lists steps to build
> > +shared library and dynamically link DPDK against OVS.
> > +
> > +Note: Minor performance loss is seen with OVS when using shared DPDK
> > +library as compared to static library.
> > +
> > +Check section 2.2, 2.3 of INSTALL.DPDK on download instructions for
> > +DPDK and OVS.
> > +
> > +  * Configure the DPDK library
> > +
> > +  Set `CONFIG_RTE_BUILD_SHARED_LIB=y` in `config/common_base`  to
> > + generate shared DPDK library
> > +
> > +
> > +  * Build and install DPDK
> > +
> > +    For Default install (without IVSHMEM), set `export
> DPDK_TARGET=x86_64-native-linuxapp-gcc`
> > +    For IVSHMEM case, set `export
> > + DPDK_TARGET=x86_64-ivshmem-linuxapp-gcc`
> > +
> > +    ```
> > +    export DPDK_DIR=/usr/src/dpdk-16.04
> > +    export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET
> > +    make install T=$DPDK_TARGET DESTDIR=install
> > +    ```
> > +
> > +  * Build, Install and Setup OVS.
> > +
> > +  Export the DPDK shared library location and setup OVS as listed in
> > + section 3.3 of INSTALL.DPDK.
> > +
> > +  `export LD_LIBRARY_PATH=$DPDK_DIR/x86_64-native-linuxapp-gcc/lib`
> > +
> > +## <a name="sysconf"></a> 3. System Configuration
> > +
> > +To achieve optimal OVS performance, the system can be configured and
> > +that includes BIOS tweaks, Grub cmdline additions, better
> > +understanding of NUMA nodes and apt selection of PCIe slots for NIC
> placement.
> > +
> > +### 3.1 Recommended BIOS settings
> > +
> > +  ```
> > +  | Settings                  | values    | comments
> > +  |---------------------------|-----------|-----------
> > +  | C3 power state            | Disabled  | -
> > +  | C6 power state            | Disabled  | -
> > +  | MLC Streamer              | Enabled   | -
> > +  | MLC Spacial prefetcher    | Enabled   | -
> > +  | DCU Data prefetcher       | Enabled   | -
> > +  | DCA                       | Enabled   | -
> > +  | CPU power and performance | Performance -
> > +  | Memory RAS and perf       |           | -
> > +    config-> NUMA optimized   | Enabled   | -
> > +  ```
> > +
> > +### 3.2 PCIe Slot Selection
> > +
> > +The fastpath performance also depends on factors like the NIC
> > +placement, Channel speeds between PCIe slot and CPU, proximity of
> > +PCIe slot to the CPU cores running DPDK application. Listed below are
> > +the steps to identify right PCIe slot.
> > +
> > +- Retrieve host details using cmd `dmidecode -t baseboard | grep
> > +"Product Name"`
> > +- Download the technical specification for Product listed eg: S2600WT2.
> > +- Check the Product Architecture Overview on the Riser slot
> > +placement,
> > +  CPU sharing info and also PCIe channel speeds.
> > +
> > +  example: On S2600WT, CPU1 and CPU2 share Riser Slot 1 with Channel
> > + speed between
> > +  CPU1 and Riser Slot1 at 32GB/s, CPU2 and Riser Slot1 at 16GB/s.
> > + Running DPDK app  on CPU1 cores and NIC inserted in to Riser card
> > + Slots will optimize OVS performance  in this case.
> > +
> > +- Check the Riser Card #1 - Root Port mapping information, on the
> > +available slots
> > +  and individual bus speeds. In S2600WT slot 1, slot 2 has high bus
> > +speeds and are
> > +  potential slots for NIC placement.
> > +
> > +### 3.3 Setup Hugepages
> Advanced Hugepage setup.
Agree.

> > +
> Basic huge page setup for 2MB huge pages is covered in INSTALL.DPDK.md.
Agree, but I have to mention 2MB huge pages here as I was distinguishing 
persistent vs run-time allocation
and more importantly for the sake of completeness.

> This section
> > +  1. Allocate Huge pages
> > +
> > +     For persistent allocation of huge pages, add the following options to
> the kernel bootline
> > +     - 2MB huge pages:
> > +
> > +       Add `hugepages=N`
> > +
> > +     - 1G huge pages:
> > +
> > +       Add `default_hugepagesz=1GB hugepagesz=1G hugepages=N`
> > +
> > +       For platforms supporting multiple huge page sizes, Add options
> > +
> > +       `default_hugepagesz=<size> hugepagesz=<size> hugepages=N`
> > +       where 'N' = Number of huge pages requested, 'size' = huge page size,
> > +       optional suffix [kKmMgG]
> > +
> > +    For run-time allocation of huge pages
> > +
> > +     - 2MB huge pages:
> > +
> > +       `echo N > /proc/sys/vm/nr_hugepages`
> > +
> > +     - 1G huge pages:
> > +
> > +       `echo N > /sys/devices/system/node/nodeX/hugepages/hugepages-
> 1048576kB/nr_hugepages`
> > +       where 'N' = Number of huge pages requested, 'X' = NUMA Node
> > +
> > +       Note: For run-time allocation of 1G huge pages, Contiguous Memory
> Allocator(CONFIG_CMA)
> > +       has to be supported by kernel, check with your Linux distro.
> > +
> > +  2. Mount huge pages
> > +     - 2MB huge pages:
> > +
> > +       `mount -t hugetlbfs none /dev/hugepages`
> > +
> > +     - 1G huge pages:
> > +
> > +       `mount -t hugetlbfs -o pagesize=1G none /dev/hugepages`
> > +
> > +### 3.4 Enable Hyperthreading
> > +
> > +  Requires BIOS changes
> > +
> > +  With HT/SMT enabled, A Physical core appears as two logical cores.
> > +  SMT can be utilized to spawn worker threads on logical cores of the
> > + same  physical core there by saving additional cores.
> > +
> > +  With DPDK, When pinning pmd threads to logical cores, care must be
> > + taken  to set the correct bits in the pmd-cpu-mask to ensure that
> > + the pmd threads are  pinned to SMT siblings.
> > +
> > +  Example System configuration:
> > +  Dual socket Machine, 2x 10 core processors, HT enabled, 40 logical
> > + cores
> > +
> > +  To use two logical cores which share the same physical core for pmd
> > + threads,  the following command can be used to identify a pair of logical
> cores.
> > +
> > +  `cat /sys/devices/system/cpu/cpuN/topology/thread_siblings_list`,
> > + where N is the  logical core number.
> > +
> > +  In this example, it would show that cores 1 and 21 share the same
> physical core.
> > +  The pmd-cpu-mask to enable two pmd threads running on these two
> > + logical cores  (one physical core) is.
> > +
> > +  `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=100002`
> > +
> > +### 3.5 Isolate cores
> > +
> > +  'isolcpus' option can be used to isolate cores from the linux scheduler.
> > +  The isolated cores can then be used to dedicatedly run HPC
> applications/threads.
> > +  This helps in better application performance due to zero context
> > + switching and  minimal cache thrashing. To run platform logic on
> > + core 0 and isolate cores  between 1 and 19 from scheduler, Add
> `isolcpus=1-19` to GRUB cmdline.
> > +
> > +  Note: It has been verified that core isolation has minimal
> > + advantage due to  mature Linux scheduler in some circumstances.
> > +
> > +### 3.6 NUMA/Cluster on Die
> > +
> > +  Ideally inter NUMA datapaths should be avoided where possible as
> > + packets  will go across QPI and there may be a slight performance
> > + penalty when  compared with intra NUMA datapaths. On Intel Xeon
> > + Processor E5 v3,  Cluster On Die is introduced on models that have 10
> cores or more.
> > +  This makes it possible to logically split a socket into two NUMA
> > + regions  and again it is preferred where possible to keep critical
> > + datapaths  within the one cluster.
> > +
> > +  It is good practice to ensure that threads that are in the datapath
> > + are  pinned to cores in the same NUMA area. e.g. pmd threads and
> > + QEMU vCPUs  responsible for forwarding.
> > +
> > +### 3.7 Compiler Optimizations
> > +
> > +  The default compiler optimization level is '-O2'. Changing this to
> > + more aggressive compiler optimizations such as '-O3' or  '-Ofast
> > + -march=native' with gcc(verified on 5.3.1) can produce performance
> > + gains though not siginificant. '-march=native' will produce
> > + optimized code  on local machine and should be used when SW
> compilation is done on Testbed.
> > +
> > +## <a name="perftune"></a> 4. Performance Tuning
> > +
> > +### 4.1 Affinity
> > +
> > +For superior performance, DPDK pmd threads and Qemu vCPU threads
> > +needs to be affinitized accordingly.
> > +
> > +  * PMD thread Affinity
> > +
> > +    A poll mode driver (pmd) thread handles the I/O of all DPDK
> > +    interfaces assigned to it. A pmd thread shall poll the ports
> > +    for incoming packets, switch the packets and send to tx port.
> > +    pmd thread is CPU bound, and needs to be affinitized to isolated
> > +    cores for optimum performance.
> > +
> > +    By setting a bit in the mask, a pmd thread is created and pinned
> > +    to the corresponding CPU core. e.g. to run a pmd thread on core 2
> > +
> > +    `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=4`
> > +
> > +    Note: pmd thread on a NUMA node is only created if there is
> > +    at least one DPDK interface from that NUMA node added to OVS.
> > +
> > +  * Qemu vCPU thread Affinity
> > +
> > +    A VM performing simple packet forwarding or running complex packet
> > +    pipelines has to ensure that the vCPU threads performing the work has
> > +    as much CPU occupancy as possible.
> > +
> > +    Example: On a multicore VM, multiple QEMU vCPU threads shall be
> spawned.
> > +    when the DPDK 'testpmd' application that does packet forwarding
> > +    is invoked, 'taskset' cmd should be used to affinitize the vCPU threads
> > +    to the dedicated isolated cores on the host system.
> > +
> > +### 4.2 Multiple poll mode driver threads
> > +
> > +  With pmd multi-threading support, OVS creates one pmd thread  for
> > + each NUMA node by default. However, it can be seen that in cases
> > + where there are multiple ports/rxq's producing traffic, performance
> > + can be improved by creating multiple pmd threads running on separate
> > + cores. These pmd threads can then share the workload by each being
> > + responsible for different ports/rxq's. Assignment of ports/rxq's to
> > + pmd threads is done automatically.
> > +
> > +  A set bit in the mask means a pmd thread is created and pinned  to
> > + the corresponding CPU core. e.g. to run pmd threads on core 1 and 2
> > +
> > +  `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6`
> > +
> > +  For example, when using dpdk and dpdkvhostuser ports in a
> > + bi-directional  VM loopback as shown below, spreading the workload
> > + over 2 or 4 pmd  threads shows significant improvements as there
> > + will be more total CPU  occupancy available.
> > +
> > +  NIC port0 <-> OVS <-> VM <-> OVS <-> NIC port 1
> > +
> > +### 4.3 DPDK port Rx Queues
> > +
> > +  `ovs-vsctl set Interface <DPDK interface> options:n_rxq=<integer>`
> > +
> > +  The command above sets the number of rx queues for DPDK interface.
> > +  The rx queues are assigned to pmd threads on the same NUMA node in
> > + a  round-robin fashion.  For more information, please refer to the
> > + Open_vSwitch TABLE section in
> > +
> > +  `man ovs-vswitchd.conf.db`
> > +
> > +### 4.4 Exact Match Cache
> > +
> > +  Each pmd thread contains one EMC. After initial flow setup in the
> > + datapath, the EMC contains a single table and provides the lowest
> > + level
> > +  (fastest) switching for DPDK ports. If there is a miss in the EMC
> > + then  the next level where switching will occur is the datapath 
> > classifier.
> > +  Missing in the EMC and looking up in the datapath classifier incurs
> > + a  significant performance penalty. If lookup misses occur in the
> > + EMC  because it is too small to handle the number of flows, its size
> > + can  be increased. The EMC size can be modified by editing the
> > + define  EM_FLOW_HASH_SHIFT in lib/dpif-netdev.c.
> > +
> > +  As mentioned above an EMC is per pmd thread. So an alternative way
> > + of  increasing the aggregate amount of possible flow entries in EMC
> > + and  avoiding datapath classifier lookups is to have multiple pmd
> > + threads  running. This can be done as described in section 4.2.
> > +
> > +### 4.5 Rx Mergeable buffers
> > +
> > +  Rx Mergeable buffers is a virtio feature that allows chaining of
> > + multiple  virtio descriptors to handle large packet sizes. As such,
> > + large packets  are handled by reserving and chaining multiple free
> > + descriptors  together. Mergeable buffer support is negotiated
> > + between the virtio  driver and virtio device and is supported by the DPDK
> vhost library.
> > +  This behavior is typically supported and enabled by default,
> > + however  in the case where the user knows that rx mergeable buffers
> > + are not needed  i.e. jumbo frames are not needed, it can be forced
> > + off by adding  mrg_rxbuf=off to the QEMU command line options. By
> > + not reserving multiple  chains of descriptors it will make more
> > + individual virtio descriptors  available for rx to the guest using
> > + dpdkvhost ports and this can improve  performance.
> > +
> > +## <a name="ovstc"></a> 5. OVS Testcases ### 5.1 PHY-VM-PHY [VHOST
> > +LOOPBACK]
> > +
> > +The section 5.2 in INSTALL.DPDK guide lists steps for PVP loopback
> > +testcase and packet forwarding using DPDK testpmd application in the
> Guest VM.
> > +For users wanting to do packet forwarding using kernel stack below are
> the steps.
> > +
> > +  ```
> > +  ifconfig eth1 1.1.1.2/24
> > +  ifconfig eth2 1.1.2.2/24
> > +  systemctl stop firewalld.service
> > +  systemctl stop iptables.service
> > +  sysctl -w net.ipv4.ip_forward=1
> > +  sysctl -w net.ipv4.conf.all.rp_filter=0  sysctl -w
> > + net.ipv4.conf.eth1.rp_filter=0  sysctl -w
> > + net.ipv4.conf.eth2.rp_filter=0  route add -net 1.1.2.0/24 eth2
> > + route add -net 1.1.1.0/24 eth1  arp -s 1.1.2.99 DE:AD:BE:EF:CA:FE
> > + arp -s 1.1.1.99 DE:AD:BE:EF:CA:EE  ```
> > +
> > +### 5.2 PHY-VM-PHY [IVSHMEM]
> > +
> > +IVSHMEM works only with 1GB huge pages.
> IVSHMEM will not work with 2MG huge pages. It will work only...
Agree, will correct this line. 

> > +
> > +  The steps (1-5) in 3.3 section of INSTALL.DPDK guide will create &
> > + initialize DB,  start vswitchd and add dpdk devices to bridge br0.
> > +
> > +  1. Add DPDK ring port to the bridge
> > +
> > +       ```
> > +       ovs-vsctl add-port br0 dpdkr0 -- set Interface dpdkr0 type=dpdkr
> > +       ```
> > +
> > +  2. Copy runtime configuration to VM, To achieve this copy the files to a
> temporary
> > +     directory, say /tmp/rte_config and export the directory to the
> > + VM
> > +
> > +       ```
> > +       mkdir /tmp/rte_config
> > +       chmod 644 /tmp/rte_config
> > +       cp -a /run/.rte_config /run/.rte_hugepage_info /tmp/rte_config
> > +       ```
> > +
> > +  3. Build modified Qemu
> > +
> > +      ```
> > +      cd /usr/src/
> > +      wget https://github.com/01org/dpdk-ovs/archive/development.zip
> > +      unzip development.zip
> > +      cd dpdk-ovs-development/qemu
> > +      ./configure --target-list=x86_64-softmmu --enable-debug --extra-
> cflags='-g'
> > +      make -j 4
> > +      ```
> > +
> > +  4. start Guest VM
> > +
> > +       ```
> > +       export VM_NAME=ivshmem-vm
> > +       export QCOW2_IMAGE=CentOS7_x86_64.qcow2
> > +       export
> > + QEMU_BIN=/usr/src/dpdk-ovs-development/qemu/x86_64-
> softmmu/qemu-syst
> > + em-x86_64
> > +
> > +       taskset 0x20 $QEMU_BIN -cpu host -smp 2,cores=2 -hda
> $QCOW2_IMAGE -drive file=fat:rw:/tmp/rte_config,snapshot=off -m 4096M
> --enable-kvm -name $VM_NAME -nographic -vnc :2 -pidfile /tmp/vm1.pid -
> mem-path /dev/hugepages -mem-prealloc -device
> ivshmem,size=1024M,shm=fd:/dev/hugepages/rtemap_0:0x0:0x40000000
> > +       ```
> > +
> > +  5. Running sample "dpdk ring" app in VM
> > +
> > +       ```
> > +       umount /dev/hugepages
> > +       mount -t hugetlbfs hugetlbfs /mnt/hugepages
> > +       ln -s /sys/devices/pci0000:00/0000:00:04.0/resource2
> /dev/hugepages/rtemap_0
> > +       mount -o iocharset=utf8 /dev/sdb1 /mnt/ovs
> > +       cp /mnt/ovs/.rte_config /run/.
> > +       cp /mnt/ovs/.rte_hugepage_info /run/.
> > +
> > +       # Build the DPDK ring application in the VM
> > +       export RTE_SDK=/root/dpdk-16.04
> > +       export RTE_TARGET=x86_64-ivshmem-linuxapp-gcc
> > +       make
> > +
> > +       # Run dpdkring application
> > +       ./build/dpdkr -c 1 -n 4 --proc-type=secondary -- -n 0
> > +       where "-n 0" refers to ring '0' i.e dpdkr0
> > +       ```
> > +
> > +## <a name="vhost"></a> 6. Vhost Walkthrough
> > +
> > +DPDK 16.04 supports two types of vhost:
> > +1. vhost-user - enabled default
> > +2. vhost-cuse - Legacy, disabled by default
> > +
> > +### 6.1 vhost-user
> > +
> > +  - Prerequisites:
> > +
> > +    QEMU version >= 2.2
> > +
> > +  - Adding vhost-user ports to Switch
> > +
> > +    Unlike DPDK ring ports, DPDK vhost-user ports can have arbitrary
> names,
> > +    except that forward and backward slashes are prohibited in the names.
> > +
> > +    For vhost-user, the name of the port type is `dpdkvhostuser`
> > +
> > +    ```
> > +    ovs-vsctl add-port br0 vhost-user-1 -- set Interface vhost-user-1
> > +    type=dpdkvhostuser
> > +    ```
> > +
> > +    This action creates a socket located at
> > +    `/usr/local/var/run/openvswitch/vhost-user-1`, which you must provide
> > +    to your VM on the QEMU command line. More instructions on this can
> be
> > +    found in the next section "Adding vhost-user ports to VM"
> > +
> > +    Note: If you wish for the vhost-user sockets to be created in a
> > +    sub-directory of `/usr/local/var/run/openvswitch`, you may specify
> > +    this directory in the ovsdb like so:
> > +
> > +    `./utilities/ovs-vsctl --no-wait \
> > +      set Open_vSwitch . other_config:vhost-sock-dir=subdir`
> > +
> > +  - Adding vhost-user ports to VM
> > +
> > +    1. Configure sockets
> > +
> > +       Pass the following parameters to QEMU to attach a vhost-user device:
> > +
> > +       ```
> > +       -chardev
> socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user-1
> > +       -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
> > +       -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1
> > +       ```
> > +
> > +       where vhost-user-1 is the name of the vhost-user port added
> > +       to the switch.
> > +       Repeat the above parameters for multiple devices, changing the
> > +       chardev path and id as necessary. Note that a separate and different
> > +       chardev path needs to be specified for each vhost-user device. For
> > +       example you have a second vhost-user port named 'vhost-user-2', you
> > +       append your QEMU command line with an additional set of
> parameters:
> > +
> > +       ```
> > +       -chardev
> socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2
> > +       -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce
> > +       -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2
> > +       ```
> > +
> > +    2. Configure huge pages.
> > +
> > +       QEMU must allocate the VM's memory on hugetlbfs. vhost-user ports
> access
> > +       a virtio-net device's virtual rings and packet buffers mapping the 
> > VM's
> > +       physical memory on hugetlbfs. To enable vhost-user ports to map the
> VM's
> > +       memory into their process address space, pass the following
> parameters
> > +       to QEMU:
> > +
> > +       ```
> > +       -object memory-backend-file,id=mem,size=4096M,mem-
> path=/dev/hugepages,
> > +       share=on -numa node,memdev=mem -mem-prealloc
> > +       ```
> > +
> > +    3. Enable multiqueue support(OPTIONAL)
> > +
> > +       The vhost-user interface must be configured in Open vSwitch with the
> > +       desired amount of queues with:
> > +
> > +       ```
> > +       ovs-vsctl set Interface vhost-user-2 options:n_rxq=<requested
> queues>
> > +       ```
> > +
> > +       QEMU needs to be configured as well.
> > +       The $q below should match the queues requested in OVS (if $q is
> more,
> > +       packets will not be received).
> > +       The $v is the number of vectors, which is '$q x 2 + 2'.
> > +
> > +       ```
> > +       -chardev
> socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2
> > +       -netdev type=vhost-
> user,id=mynet2,chardev=char2,vhostforce,queues=$q
> > +       -device virtio-net-
> pci,mac=00:00:00:00:00:02,netdev=mynet2,mq=on,vectors=$v
> > +       ```
> > +
> > +       If one wishes to use multiple queues for an interface in the guest, 
> > the
> > +       driver in the guest operating system must be configured to do so. 
> > It is
> > +       recommended that the number of queues configured be equal to '$q'.
> > +
> > +       For example, this can be done for the Linux kernel virtio-net driver
> with:
> > +
> > +       ```
> > +       ethtool -L <DEV> combined <$q>
> > +       ```
> > +       where `-L`: Changes the numbers of channels of the specified network
> device
> > +       and `combined`: Changes the number of multi-purpose channels.
> > +
> > +### 6.2 vhost-cuse
> > +
> > +  - Prerequisites:
> > +
> > +    QEMU version >= 2.2
> > +
> > +  - Enable vhost-cuse support
> > +
> > +    1. Enable vhost cuse support in DPDK
> > +
> > +       Set `CONFIG_RTE_LIBRTE_VHOST_USER=n` in
> config/common_linuxapp and follow the
> > +       steps in 2.2 section of INSTALL.DPDK guide to build DPDK with cuse
> support.
> > +       OVS will detect that DPDK has vhost-cuse libraries compiled and in 
> > turn
> will enable
> > +       support for it in the switch and disable vhost-user support.
> > +
> > +    2. Insert the Cuse module
> > +
> > +       `modprobe cuse`
> > +
> > +    3. Build and insert the `eventfd_link` module
> > +
> > +       ```
> > +       cd $DPDK_DIR/lib/librte_vhost/eventfd_link/
> > +       make
> > +       insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko
> > +       ```
> > +
> > +  - Adding vhost-cuse ports to Switch
> > +
> > +    Unlike DPDK ring ports, DPDK vhost-cuse ports can have arbitrary
> names.
> > +    For vhost-cuse, the name of the port type is `dpdkvhostcuse`
> > +
> > +    ```
> > +    ovs-vsctl add-port br0 vhost-cuse-1 -- set Interface vhost-cuse-1
> > +    type=dpdkvhostcuse
> > +    ```
> > +
> > +    When attaching vhost-cuse ports to QEMU, the name provided during
> the
> > +    add-port operation must match the ifname parameter on the QEMU
> cmd line.
> > +
> > +  - Adding vhost-cuse ports to VM
> > +
> > +    vhost-cuse ports use a Linux* character device to communicate with
> QEMU.
> > +    By default it is set to `/dev/vhost-net`. It is possible to reuse this
> > +    standard device for DPDK vhost, which makes setup a little simpler but 
> > it
> > +    is better practice to specify an alternative character device in order 
> > to
> > +    avoid any conflicts if kernel vhost is to be used in parallel.
> > +
> > +    1. This step is only needed if using an alternative character device.
> > +
> > +       ```
> > +       ./utilities/ovs-vsctl --no-wait set Open_vSwitch . \
> > +            other_config:cuse-dev-name=my-vhost-net
> > +       ```
> > +
> > +       In the example above, the character device to be used will be
> > +       `/dev/my-vhost-net`.
> > +
> > +    2. In case of reusing kernel vhost character device, there would be
> conflict
> > +       user should remove it.
> > +
> > +       `rm -rf /dev/vhost-net`
> > +
> > +    3. Configure virtio-net adapters
> > +
> > +       The following parameters must be passed to the QEMU binary, repeat
> > +       the below parameters for multiple devices.
> > +
> > +       ```
> > +       -netdev
> tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on
> > +       -device virtio-net-pci,netdev=net1,mac=<mac>
> > +       ```
> > +
> > +       The DPDK vhost library will negotiate its own features, so they
> > +       need not be passed in as command line params. Note that as offloads
> > +       are disabled this is the equivalent of setting
> > +
> > +       `csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off`
> > +
> > +       When using an alternative character device, it must be explicitly
> > +       passed to QEMU using the `vhostfd` argument
> > +
> > +       ```
> > +       -netdev
> tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on,
> > +       vhostfd=<open_fd> -device virtio-net-pci,netdev=net1,mac=<mac>
> > +       ```
> > +
> > +       The open file descriptor must be passed to QEMU running as a child
> > +       process. This could be done with a simple python script.
> > +
> > +       ```
> > +       #!/usr/bin/python
> > +       fd = os.open("/dev/usvhost", os.O_RDWR)
> > +       subprocess.call("qemu-system-x86_64 .... -netdev tap,id=vhostnet0,\
> > +                       vhost=on,vhostfd=" + fd +"...", shell=True)
> > +       ```
> > +
> > +    4. Configure huge pages
> > +
> > +       QEMU must allocate the VM's memory on hugetlbfs. Vhost ports
> access a
> > +       virtio-net device's virtual rings and packet buffers mapping the 
> > VM's
> > +       physical memory on hugetlbfs. To enable vhost-ports to map the VM's
> > +       memory into their process address space, pass the following
> parameters
> > +       to QEMU
> > +
> > +       `-object memory-backend-file,id=mem,size=4096M,mem-
> path=/dev/hugepages,
> > +       share=on -numa node,memdev=mem -mem-prealloc`
> > +
> > +  - VM Configuration with QEMU wrapper
> > +
> > +    The QEMU wrapper script automatically detects and calls QEMU with
> the
> > +    necessary parameters. It performs the following actions:
> > +
> > +    * Automatically detects the location of the hugetlbfs and inserts this
> > +    into the command line parameters.
> > +    * Automatically open file descriptors for each virtio-net device and
> > +    inserts this into the command line parameters.
> > +    * Calls QEMU passing both the command line parameters passed to the
> > +    script itself and those it has auto-detected.
> > +
> > +    Before use, you **must** edit the configuration parameters section of
> the
> > +    script to point to the correct emulator location and set additional
> > +    settings. Of these settings, `emul_path` and `us_vhost_path` **must**
> be
> > +    set. All other settings are optional.
> > +
> > +    To use directly from the command line simply pass the wrapper some of
> the
> > +    QEMU parameters: it will configure the rest. For example:
> > +
> > +    ```
> > +    qemu-wrap.py -cpu host -boot c -hda <disk image> -m 4096 -smp 4
> > +    --enable-kvm -nographic -vnc none -net none -netdev tap,id=net1,
> > +    script=no,downscript=no,ifname=if1,vhost=on -device virtio-net-pci,
> > +    netdev=net1,mac=00:00:00:00:00:01
> > +    ```
> > +
> > +  - VM Configuration with libvirt
> > +
> > +    If you are using libvirt, you must enable libvirt to access the 
> > character
> > +    device by adding it to controllers cgroup for libvirtd using the 
> > following
> > +    steps.
> > +
> > +    1. In `/etc/libvirt/qemu.conf` add/edit the following lines:
> > +
> > +       ```
> > +       clear_emulator_capabilities = 0
> > +       user = "root"
> > +       group = "root"
> > +       cgroup_device_acl = [
> > +             "/dev/null", "/dev/full", "/dev/zero",
> > +             "/dev/random", "/dev/urandom",
> > +             "/dev/ptmx", "/dev/kvm", "/dev/kqemu",
> > +             "/dev/rtc", "/dev/hpet", "/dev/net/tun",
> > +             "/dev/<my-vhost-device>",
> > +             "/dev/hugepages"]
> > +       ```
> > +
> > +       <my-vhost-device> refers to "vhost-net" if using the 
> > `/dev/vhost-net`
> > +       device. If you have specificed a different name in the database
> > +       using the "other_config:cuse-dev-name" parameter, please specify
> that
> > +       filename instead.
> > +
> > +    2. Disable SELinux or set to permissive mode
> > +
> > +    3. Restart the libvirtd process
> > +       For example, on Fedora:
> > +
> > +       `systemctl restart libvirtd.service`
> > +
> > +    After successfully editing the configuration, you may launch your
> > +    vhost-enabled VM. The XML describing the VM can be configured like
> so
> > +    within the <qemu:commandline> section:
> > +
> > +    1. Set up shared hugepages:
> > +
> > +       ```
> > +       <qemu:arg value='-object'/>
> > +       <qemu:arg value='memory-backend-file,id=mem,size=4096M,mem-
> path=/dev/hugepages,share=on'/>
> > +       <qemu:arg value='-numa'/>
> > +       <qemu:arg value='node,memdev=mem'/>
> > +       <qemu:arg value='-mem-prealloc'/>
> > +       ```
> > +
> > +    2. Set up your tap devices:
> > +
> > +       ```
> > +       <qemu:arg value='-netdev'/>
> > +       <qemu:arg
> value='type=tap,id=net1,script=no,downscript=no,ifname=vhost0,vhost=on'
> />
> > +       <qemu:arg value='-device'/>
> > +       <qemu:arg value='virtio-net-
> pci,netdev=net1,mac=00:00:00:00:00:01'/>
> > +       ```
> > +
> > +    Repeat for as many devices as are desired, modifying the id, ifname
> > +    and mac as necessary.
> > +
> > +    Again, if you are using an alternative character device (other than
> > +    `/dev/vhost-net`), please specify the file descriptor like so:
> > +
> > +    `<qemu:arg
> > +
> value='type=tap,id=net3,script=no,downscript=no,ifname=vhost0,vhost=
> > + on,vhostfd=<open_fd>'/>`
> > +
> > +    Where <open_fd> refers to the open file descriptor of the character
> device.
> > +    Instructions of how to retrieve the file descriptor can be found in the
> > +    "DPDK vhost VM configuration" section.
> > +    Alternatively, the process is automated with the qemu-wrap.py script,
> > +    detailed in the next section.
> > +
> > +    Now you may launch your VM using virt-manager, or like so:
> > +
> > +   `virsh create my_vhost_vm.xml`
> > +
> > +  - VM Configuration with libvirt & QEMU wrapper
> > +
> > +    To use the qemu-wrapper script in conjuntion with libvirt, follow the
> > +    steps in the previous section before proceeding with the following
> steps:
> > +
> > +    1. Place `qemu-wrap.py` in libvirtd binary search PATH ($PATH)
> > +       Ideally in the same directory that the QEMU binary is located.
> > +
> > +    2. Ensure that the script has the same owner/group and file permissions
> > +       as the QEMU binary.
> > +
> > +    3. Update the VM xml file using "virsh edit VM.xml"
> > +
> > +       Set the VM to use the launch script.
> > +       Set the emulator path contained in the `<emulator><emulator/>` tags.
> > +       For example, replace `<emulator>/usr/bin/qemu-kvm<emulator/>`
> with
> > +          `<emulator>/usr/bin/qemu-wrap.py<emulator/>`
> > +
> > +    4. Edit the Configuration Parameters section of the script to point to
> > +       the correct emulator location and set any additional options. If 
> > you are
> > +       using a alternative character device name, please set 
> > "us_vhost_path"
> to the
> > +       location of that device. The script will automatically detect and 
> > insert
> > +       the correct "vhostfd" value in the QEMU command line arguments.
> > +
> > +    5. Use virt-manager to launch the VM
> > +
> > +### 6.3 DPDK backend inside VM
> > +
> > +  Please note that additional configuration is required if you want
> > + to run  ovs-vswitchd with DPDK backend inside a QEMU virtual
> > + machine. Ovs-vswitchd  creates separate DPDK TX queues for each CPU
> > + core available. This operation  fails inside QEMU virtual machine
> > + because, by default, VirtIO NIC provided  to the guest is configured
> > + to support only single TX queue and single RX  queue. To change this
> > + behavior, you need to turn on 'mq' (multiqueue)  property of all virtio-
> net-pci devices emulated by QEMU and used by DPDK.
> Add the following comment.
> May not work with some old versions of Qemu found in some distros.
> Requires Qemu version >= 2.x
Point taken, will this. 

> > +  You may do it manually (by changing QEMU command line) or, if you
> > + use Libvirt,  by adding the following string:
> > +
> > +  `<driver name='vhost' queues='N'/>`
> > +
> > +  to <interface> sections of all network devices used by DPDK. Parameter
> 'N'
> > +  determines how many queues can be used by the guest.
> > +
> > +## <a name="qos"></a> 7. QOS
> > +
> > +Here is an example on QOS usage.
> > +Assuming you have a vhost-user port transmitting traffic consisting
> > +of packets of size 64 bytes, the following command would limit the
> > +egress transmission rate of the port to ~1,000,000 packets per second
> > +
> > +`ovs-vsctl set port vhost-user0 qos=@newqos -- --id=@newqos create
> > +qos type=egress-policer other-config:cir=46000000
> > +other-config:cbs=2048`
> > +
> > +To examine the QoS configuration of the port:
> > +
> > +`ovs-appctl -t ovs-vswitchd qos/show vhost-user0`
> > +
> > +To clear the QoS configuration from the port and ovsdb use the following:
> > +
> > +`ovs-vsctl destroy QoS vhost-user0 -- clear Port vhost-user0 qos`
> > +
> > +For more details regarding egress-policer parameters please refer to
> > +the vswitch.xml.
> > +
> > +## <a name="staticanalyzer"></a> 8. Static Code Analysis
> > +
> > +Static Analysis is method of debugging SW by examining the code
> > +rather than actually executing it. Many third party Software is
> > +available to carry Static analysis, few being open source and rest
> commercial.
> > +
> > +Below are the steps to run clang static analyzer on OVS codebase.
> > +
> > +  ```
> > +  apt-get install clang [ On Ubuntu]
> > +  dnf install clang clang-analyzer -y [ On fedora]
> > +
> > +  cd $OVS_DIR
> > +  ./boot.sh
> > +  ./configure --with-dpdk
> > +  make clean
> > +  scan-build make CFLAGS="-std=gnu99"
> > +  scan-view --host=<ip address> --port 8183
> > + /tmp/scan-build-yyyy-mm-dd-114251-1027-1 --allow-all-hosts  ```
> > +
> > +  The results can be viewed on the browser using ip address and port no.
> > +
> > +  `http://<ip address>:8183/`
> > +
> > +## <a name="vsperf"></a> 9. Vsperf
> > +
> > +Vsperf project goal is to develop vSwitch test framework that can be
> > +used to validate the suitability of different vSwitch implementations
> > +in a Telco deployment environment. More information can be found in
> below link.
> > +
> > +https://wiki.opnfv.org/display/vsperf/VSperf+Home
> > +
> > +
> > +Bug Reporting:
> > +--------------
> > +
> > +Please report problems to b...@openvswitch.org.
> > +
> > +
> > +[INSTALL.userspace.md]:INSTALL.userspace.md
> > +[INSTALL.md]:INSTALL.md
> > +[DPDK Linux GSG]:
> > +http://www.dpdk.org/doc/guides/linux_gsg/build_dpdk.html#binding-
> and-
> > +unbinding-network-ports-to-from-the-igb-uioor-vfio-modules
> > +[DPDK Docs]: http://dpdk.org/doc
> >

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to