On 11/24/2016 12:58 PM, Kevin Traynor wrote: > On 11/23/2016 09:00 PM, Maxime Coquelin wrote: >> Having reference benchmarks is important in order to obtain >> reproducible performance figures. >> >> This patch describes required steps to configure a PVP setup >> using testpmd in both host and guest. >> >> Not relying on external vSwitch ease integration in a CI loop by >> not being impacted by DPDK API changes. >> >> Signed-off-by: Maxime Coquelin <maxime.coquelin at redhat.com> > > A short template/hint of the main things to report after running could > be useful to help ML discussions about results e.g. > > Traffic Generator: IXIA > Acceptable Loss: 100% (i.e. raw throughput test) > DPDK version/commit: v16.11 > QEMU version/commit: v2.7.0 > Patches applied: <link to patchwork> > CPU: E5-2680 v3, 2.8GHz > Result: x mpps > NIC: ixgbe 82599
Good idea, I'll add a section in the end providing this template. > >> --- >> doc/guides/howto/img/pvp_2nics.svg | 556 >> +++++++++++++++++++++++++++ >> doc/guides/howto/index.rst | 1 + >> doc/guides/howto/pvp_reference_benchmark.rst | 389 +++++++++++++++++++ >> 3 files changed, 946 insertions(+) >> create mode 100644 doc/guides/howto/img/pvp_2nics.svg >> create mode 100644 doc/guides/howto/pvp_reference_benchmark.rst >> > > <snip> > >> +Host tuning >> +~~~~~~~~~~~ > > I would add turbo boost =disabled on BIOS. > +1, will be in next revision. >> + >> +#. Append these options to Kernel command line: >> + >> + .. code-block:: console >> + >> + intel_pstate=disable mce=ignore_ce default_hugepagesz=1G hugepagesz=1G >> hugepages=6 isolcpus=2-7 rcu_nocbs=2-7 nohz_full=2-7 iommu=pt intel_iommu=on >> + >> +#. Disable hyper-threads at runtime if necessary and BIOS not accessible: >> + >> + .. code-block:: console >> + >> + cat /sys/devices/system/cpu/cpu*[0-9]/topology/thread_siblings_list \ >> + | sort | uniq \ >> + | awk -F, '{system("echo 0 > >> /sys/devices/system/cpu/cpu"$2"/online")}' >> + >> +#. Disable NMIs: >> + >> + .. code-block:: console >> + >> + echo 0 > /proc/sys/kernel/nmi_watchdog >> + >> +#. Exclude isolated CPUs from the writeback cpumask: >> + >> + .. code-block:: console >> + >> + echo ffffff03 > /sys/bus/workqueue/devices/writeback/cpumask >> + >> +#. Isolate CPUs from IRQs: >> + >> + .. code-block:: console >> + >> + clear_mask=0xfc #Isolate CPU2 to CPU7 from IRQs >> + for i in /proc/irq/*/smp_affinity >> + do >> + echo "obase=16;$(( 0x$(cat $i) & ~$clear_mask ))" | bc > $i >> + done >> + >> +Qemu build >> +~~~~~~~~~~ >> + >> + .. code-block:: console >> + >> + git clone git://dpdk.org/dpdk >> + cd dpdk >> + export RTE_SDK=$PWD >> + make install T=x86_64-native-linuxapp-gcc DESTDIR=install >> + >> +DPDK build >> +~~~~~~~~~~ >> + >> + .. code-block:: console >> + >> + git clone git://dpdk.org/dpdk >> + cd dpdk >> + export RTE_SDK=$PWD >> + make install T=x86_64-native-linuxapp-gcc DESTDIR=install >> + >> +Testpmd launch >> +~~~~~~~~~~~~~~ >> + >> +#. Assign NICs to DPDK: >> + >> + .. code-block:: console >> + >> + modprobe vfio-pci >> + $RTE_SDK/install/sbin/dpdk-devbind -b vfio-pci 0000:11:00.0 0000:11:00.1 >> + >> +*Note: Sandy Bridge family seems to have some limitations wrt its IOMMU, >> +giving poor performance results. To achieve good performance on these >> machines, >> +consider using UIO instead.* >> + >> +#. Launch testpmd application: >> + >> + .. code-block:: console >> + >> + $RTE_SDK/install/bin/testpmd -l 0,2,3,4,5 --socket-mem=1024 -n 4 \ >> + --vdev 'net_vhost0,iface=/tmp/vhost-user1' \ >> + --vdev 'net_vhost1,iface=/tmp/vhost-user2' -- \ >> + --portmask=f --disable-hw-vlan -i --rxq=1 --txq=1 >> + --nb-cores=4 --forward-mode=io >> + >> +#. In testpmd interactive mode, set the portlist to obtin the right >> chaining: >> + >> + .. code-block:: console >> + >> + set portlist 0,2,1,3 >> + start >> + >> +VM launch >> +~~~~~~~~~ >> + >> +The VM may be launched ezither by calling directly QEMU, or by using >> libvirt. > > s/ezither/either > >> + >> +#. Qemu way: >> + >> +Launch QEMU with two Virtio-net devices paired to the vhost-user sockets >> created by testpmd: >> + >> + .. code-block:: console >> + >> + <QEMU path>/bin/x86_64-softmmu/qemu-system-x86_64 \ >> + -enable-kvm -cpu host -m 3072 -smp 3 \ >> + -chardev socket,id=char0,path=/tmp/vhost-user1 \ >> + -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \ >> + -device >> virtio-net-pci,netdev=mynet1,mac=52:54:00:02:d9:01,addr=0x10 \ >> + -chardev socket,id=char1,path=/tmp/vhost-user2 \ >> + -netdev type=vhost-user,id=mynet2,chardev=char1,vhostforce \ >> + -device >> virtio-net-pci,netdev=mynet2,mac=52:54:00:02:d9:02,addr=0x11 \ >> + -object >> memory-backend-file,id=mem,size=3072M,mem-path=/dev/hugepages,share=on \ >> + -numa node,memdev=mem -mem-prealloc \ >> + -net user,hostfwd=tcp::1002$1-:22 -net nic \ >> + -qmp unix:/tmp/qmp.socket,server,nowait \ >> + -monitor stdio <vm_image>.qcow2 > > Probably mergeable rx data path =off would want to be tested also when > evaluating any performance improvements/regressions. Right, I'll add few lines about this. > >> + >> +You can use this qmp-vcpu-pin script to pin vCPUs: >> + >> + .. code-block:: python >> + >> + #!/usr/bin/python >> + # QEMU vCPU pinning tool >> + # >> + # Copyright (C) 2016 Red Hat Inc. >> + # >> + # Authors: >> + # Maxime Coquelin <maxime.coquelin at redhat.com> >> + # >> + # This work is licensed under the terms of the GNU GPL, version 2. See >> + # the COPYING file in the top-level directory >> + import argparse >> + import json >> + import os >> + >> + from subprocess import call >> + from qmp import QEMUMonitorProtocol >> + >> + pinned = [] >> + >> + parser = argparse.ArgumentParser(description='Pin QEMU vCPUs to >> physical CPUs') >> + parser.add_argument('-s', '--server', type=str, required=True, >> + help='QMP server path or address:port') >> + parser.add_argument('cpu', type=int, nargs='+', >> + help='Physical CPUs IDs') >> + args = parser.parse_args() >> + >> + devnull = open(os.devnull, 'w') >> + >> + srv = QEMUMonitorProtocol(args.server) >> + srv.connect() >> + >> + for vcpu in srv.command('query-cpus'): >> + vcpuid = vcpu['CPU'] >> + tid = vcpu['thread_id'] >> + if tid in pinned: >> + print 'vCPU{}\'s tid {} already pinned, >> skipping'.format(vcpuid, tid) >> + continue >> + >> + cpuid = args.cpu[vcpuid % len(args.cpu)] >> + print 'Pin vCPU {} (tid {}) to physical CPU {}'.format(vcpuid, tid, >> cpuid) >> + try: >> + call(['taskset', '-pc', str(cpuid), str(tid)], stdout=devnull) >> + pinned.append(tid) >> + except OSError: >> + print 'Failed to pin vCPU{} to CPU{}'.format(vcpuid, cpuid) >> + >> + >> +That can be used this way, for example to pin 3 vCPUs to CPUs 1, 6 and 7: > > I think it would be good to explicitly explain the link you've made on > core numbers in this case between isolcpus, the vCPU pinning above and > the core list in the testpmd cmd line later. Yes. So vCPU0 doesn't run testpmd lcores, so doesn't need to be pinned to an isolated CPU. vCPU1&2 are used as lcores, so are pinned to isolated CPUs. I'll add this information to next version. Thanks, Maxime