On 4/26/2024 4:48 PM, Stephen Hemminger wrote: > The driver support of flows has changed and the wording in > the guide was awkward. > > Drop references to DPDK pktgen in this documentation since > it is not required and confusing. > > Signed-off-by: Stephen Hemminger <step...@networkplumber.org> > --- > doc/guides/linux_gsg/sys_reqs.rst | 3 + > doc/guides/nics/tap.rst | 274 +++++++++--------------------- > 2 files changed, 80 insertions(+), 197 deletions(-) > > diff --git a/doc/guides/linux_gsg/sys_reqs.rst > b/doc/guides/linux_gsg/sys_reqs.rst > index 13be715933..0254568517 100644 > --- a/doc/guides/linux_gsg/sys_reqs.rst > +++ b/doc/guides/linux_gsg/sys_reqs.rst > @@ -101,6 +101,9 @@ Running DPDK Applications > > To run a DPDK application, some customization may be required on the target > machine. > > +.. _linux_gsg_kernel_version: > + > + >
I guess there was an intention to use this document as reference, but I don't see the link. Perhaps anchor is no more required? > System Software > ~~~~~~~~~~~~~~~ > > diff --git a/doc/guides/nics/tap.rst b/doc/guides/nics/tap.rst > index d4f45c02a1..55e38fb25b 100644 > --- a/doc/guides/nics/tap.rst > +++ b/doc/guides/nics/tap.rst > @@ -1,47 +1,51 @@ > .. SPDX-License-Identifier: BSD-3-Clause > Copyright(c) 2016 Intel Corporation. > > -Tun|Tap Poll Mode Driver > -======================== > +TAP Poll Mode Driver > +==================== > > -The ``rte_eth_tap.c`` PMD creates a device using TAP interfaces on the > -local host. The PMD allows for DPDK and the host to communicate using a raw > -device interface on the host and in the DPDK application. > +The TAP Poll Mode Driver (PMD) is a virtual device for injecting packets to > be processed > +by the Linux kernel. This PMD is useful when writing DPDK application > +for offloading network functionality (such as tunneling) from the kernel. > > -The device created is a TAP device, which sends/receives packet in a raw > -format with a L2 header. The usage for a TAP PMD is for connectivity to the > -local host using a TAP interface. When the TAP PMD is initialized it will > -create a number of tap devices in the host accessed via ``ifconfig -a`` or > -``ip`` command. The commands can be used to assign and query the virtual like > -device. > +From the kernel point of view, the TAP device looks like a regular network > interface. > +The network device can be managed by standard tools such as ``ip`` and > ``ethtool`` commands. > +It is also possible to use existing packet tools such as ``wireshark`` or > ``tcpdump``. > > -These TAP interfaces can be used with Wireshark or tcpdump or Pktgen-DPDK > -along with being able to be used as a network connection to the DPDK > -application. The method enable one or more interfaces is to use the > -``--vdev=net_tap0`` option on the DPDK application command line. Each > -``--vdev=net_tap1`` option given will create an interface named dtap0, dtap1, > -and so on. > +From the DPDK application, the TAP device looks like a DPDK ethdev. > +Packets are sent and received in L2 (Ethernet) format. The standare DPDK > +API's to query for information, statistics and send and receive packets > +work as expected. > > -The interface name can be changed by adding the ``iface=foo0``, for example:: > +Requirements > +~~~~~~~~~~~~ > + > +The TAP PMD requires kernel support for multiple queues in TAP device as > +well as the multi-queue ``multiq`` and incoming ``ingress`` queue > disciplines. > +These are standard kernel features in most Linux distributions. > + > +Arguments > +--------- > + > +TAP devices are created with the command line > +``--vdev=net_tap0`` option. This option maybe specified more the once by > repeating > +with a different ``net_tapX`` device. > + > +By default, the Linux interfaces are named ``dtap0``, ``dtap1``, etc. > +The interface name can be specified by adding the ``iface=foo0``, for > example:: > > --vdev=net_tap0,iface=foo0 --vdev=net_tap1,iface=foo1, ... > > -Normally the PMD will generate a random MAC address, but when testing or with > -a static configuration the developer may need a fixed MAC address style. > -Using the option ``mac=fixed`` you can create a fixed known MAC address:: > +Normally the PMD will generate a random MAC address. > +If a static address is desired instead, the ``mac=fixed`` can be used. > > --vdev=net_tap0,mac=fixed > > -The MAC address will have a fixed value with the last octet incrementing by > one > -for each interface string containing ``mac=fixed``. The MAC address is > formatted > -as 02:'d':'t':'a':'p':[00-FF]. Convert the characters to hex and you get the > -actual MAC address: ``02:64:74:61:70:[00-FF]``. > - > - --vdev=net_tap0,mac="02:64:74:61:70:11" > +With the fixed option, the MAC address will have the first octets: > +as 02:'d':'t':'a':'p':[00-FF] and the last octets are the interface number. > > -The MAC address will have a user value passed as string. The MAC address is > in > -format with delimiter ``:``. The string is byte converted to hex and you get > -the actual MAC address: ``02:64:74:61:70:11``. > +To specify a specific MAC address use the conventional representation. > +The string is byte converted to hex, the result is MAC address: > ``02:64:74:61:70:11``. > > It is possible to specify a remote netdevice to capture packets from by > adding > ``remote=foo1``, for example:: > @@ -59,40 +63,20 @@ netdevice that has no support in the DPDK. It is possible > to add explicit > rte_flow rules on the tap PMD to capture specific traffic (see next section > for > examples). > > -After the DPDK application is started you can send and receive packets on the > -interface using the standard rx_burst/tx_burst APIs in DPDK. From the host > -point of view you can use any host tool like tcpdump, Wireshark, ping, Pktgen > -and others to communicate with the DPDK application. The DPDK application may > -not understand network protocols like IPv4/6, UDP or TCP unless the > -application has been written to understand these protocols. > - > -If you need the interface as a real network interface meaning running and has > -a valid IP address then you can do this with the following commands:: > - > - sudo ip link set dtap0 up; sudo ip addr add 192.168.0.250/24 dev dtap0 > - sudo ip link set dtap1 up; sudo ip addr add 192.168.1.250/24 dev dtap1 > - > -Please change the IP addresses as you see fit. > - > -If routing is enabled on the host you can also communicate with the DPDK App > -over the internet via a standard socket layer application as long as you > -account for the protocol handling in the application. > - > -If you have a Network Stack in your DPDK application or something like it you > -can utilize that stack to handle the network protocols. Plus you would be > able > -to address the interface using an IP address assigned to the internal > -interface. > - > Normally, when the DPDK application exits, > the TAP device is marked down and is removed. > -But this behaviour can be overridden by the use of the persist flag, > example:: > +But this behavior can be overridden by the use of the persist flag, example:: > > --vdev=net_tap0,iface=tap0,persist ... > > -The TUN PMD allows user to create a TUN device on host. The PMD allows user > -to transmit and receive packets via DPDK API calls with L3 header and > payload. > -The devices in host can be accessed via ``ifconfig`` or ``ip`` command. TUN > -interfaces are passed to DPDK ``rte_eal_init`` arguments as > ``--vdev=net_tunX``, > +TUN devices > +----------- > + > +The TAP device can be used an L3 tunnel only device (TUN). > +This type of device does not include the Ethernet (L2) header; all packets > +are sent and received as IP packets. > + > +TUN devices are created with the command line arguments ``--vdev=net_tunX``, > where X stands for unique id, example:: > > --vdev=net_tun0 --vdev=net_tun1,iface=foo1, ... > @@ -103,27 +87,33 @@ options. Default interface name is ``dtunX``, where X > stands for unique id. > Flow API support > ---------------- > > -The tap PMD supports major flow API pattern items and actions, when running > on > -linux kernels above 4.2 ("Flower" classifier required). > -The kernel support can be checked with this command:: > +The TAP PMD supports major flow API pattern items and actions. > + > +Requirements > +~~~~~~~~~~~~ > > - zcat /proc/config.gz | ( grep 'CLS_FLOWER=' || echo 'not supported' ) | > - tee -a /dev/stderr | grep -q '=m' && > - lsmod | ( grep cls_flower || echo 'try modprobe cls_flower' ) > +Flow support in TAP driver requires the Linux kernel support of flow based > +traffic control filter ``flower``. This was added in Linux 4.3 kernel. > > -Supported items: > +The implementation of RSS action uses an eBPF module that requires additional > +libraries and tools. Building the RSS support requires the ``clang`` > +compiler to compile the C code to BPF target; ``bpftool`` to convert the > +compiled BPF object to a header file; and ``libbpf`` to load the eBPF > +action into the kernel. > > -- eth: src and dst (with variable masks), and eth_type (0xffff mask). > -- vlan: vid, pcp, but not eid. (requires kernel 4.9) > -- ipv4/6: src and dst (with variable masks), and ip_proto (0xffff mask). > -- udp/tcp: src and dst port (0xffff) mask. > +Supported match items: > + > + - eth: src and dst (with variable masks), and eth_type (0xffff mask). > + - vlan: vid, pcp, but not eid. (requires kernel 4.9) > + - ipv4/6: src and dst (with variable masks), and ip_proto (0xffff mask). > + - udp/tcp: src and dst port (0xffff) mask. > > Supported actions: > > - DROP > - QUEUE > - PASSTHRU > -- RSS (requires kernel 4.9) > +- RSS > > It is generally not possible to provide a "last" item. However, if the "last" > item, once masked, is identical to the masked spec, then it is supported. > @@ -133,7 +123,7 @@ full mask (exact match). > > As rules are translated to TC, it is possible to show them with something > like:: > > - tc -s filter show dev tap1 parent 1: > + tc -s filter show dev dtap1 parent 1: > > Examples of testpmd flow rules > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > @@ -174,135 +164,25 @@ The IPC synchronization of Rx/Tx queues is currently > limited: > - Maximum 8 queues shared > - Synchronized on probing, but not on later port update > > -Example > -------- > - > -The following is a simple example of using the TAP PMD with the Pktgen > -packet generator. It requires that the ``socat`` utility is installed on the > -test system. > - > -Build DPDK, then pull down Pktgen and build pktgen using the DPDK SDK/Target > -used to build the dpdk you pulled down. > - > -Run pktgen from the pktgen directory in a terminal with a commandline like > the > -following:: > - > - sudo ./app/app/x86_64-native-linux-gcc/app/pktgen -l 1-5 -n 4 \ > - --proc-type auto --log-level debug --socket-mem 512,512 --file-prefix > pg \ > - --vdev=net_tap0 --vdev=net_tap1 -b 05:00.0 -b 05:00.1 \ > - -b 04:00.0 -b 04:00.1 -b 04:00.2 -b 04:00.3 \ > - -b 81:00.0 -b 81:00.1 -b 81:00.2 -b 81:00.3 \ > - -b 82:00.0 -b 83:00.0 -- -T -P -m [2:3].0 -m [4:5].1 \ > - -f themes/black-yellow.theme > - > -.. Note: > - > - Change the ``-b`` options to exclude all of your physical ports. The > - following command line is all one line. > - > - Also, ``-f themes/black-yellow.theme`` is optional if the default colors > - work on your system configuration. See the Pktgen docs for more > - information. > - > -Verify with ``ifconfig -a`` command in a different xterm window, should have > a > -``dtap0`` and ``dtap1`` interfaces created. > - > -Next set the links for the two interfaces to up via the commands below:: > - > - sudo ip link set dtap0 up; sudo ip addr add 192.168.0.250/24 dev dtap0 > - sudo ip link set dtap1 up; sudo ip addr add 192.168.1.250/24 dev dtap1 > - > -Then use socat to create a loopback for the two interfaces:: > - > - sudo socat interface:dtap0 interface:dtap1 > - > -Then on the Pktgen command line interface you can start sending packets using > -the commands ``start 0`` and ``start 1`` or you can start both at the same > -time with ``start all``. The command ``str`` is an alias for ``start all`` > and > -``stp`` is an alias for ``stop all``. > - > -While running you should see the 64 byte counters increasing to verify the > -traffic is being looped back. You can use ``set all size XXX`` to change the > -size of the packets after you stop the traffic. Use pktgen ``help`` > -command to see a list of all commands. You can also use the ``-f`` option to > -load commands at startup in command line or Lua script in pktgen. > > RSS specifics > ------------- > -Packet distribution in TAP is done by the kernel which has a default > -distribution. This feature is adding RSS distribution based on eBPF code. > -The default eBPF code calculates RSS hash based on Toeplitz algorithm for > -a fixed RSS key. It is calculated on fixed packet offsets. For IPv4 and IPv6 > it > -is calculated over src/dst addresses (8 or 32 bytes for IPv4 or IPv6 > -respectively) and src/dst TCP/UDP ports (4 bytes). > - > -The RSS algorithm is written in file ``tap_bpf_program.c`` which > -does not take part in TAP PMD compilation. Instead this file is compiled > -in advance to eBPF object file. The eBPF object file is then parsed and > -translated into eBPF byte code in the format of C arrays of eBPF > -instructions. The C array of eBPF instructions is part of TAP PMD tree and > -is taking part in TAP PMD compilation. At run time the C arrays are uploaded > to > -the kernel via BPF system calls and the RSS hash is calculated by the > -kernel. > - > -It is possible to support different RSS hash algorithms by updating file > -``tap_bpf_program.c`` In order to add a new RSS hash algorithm follow these > -steps: > - > -#. Write the new RSS implementation in file ``tap_bpf_program.c`` > - > - BPF programs which are uploaded to the kernel correspond to > - C functions under different ELF sections. > - > -#. Install ``LLVM`` library and ``clang`` compiler versions 3.7 and above > - > -#. Use make to compile `tap_bpf_program.c`` via ``LLVM`` into an object file > - and extract the resulting instructions into ``tap_bpf_insn.h``:: > - > - cd bpf; make > - > -#. Recompile the TAP PMD. > - > -The C arrays are uploaded to the kernel using BPF system calls. > - > -``tc`` (traffic control) is a well known user space utility program used to > -configure the Linux kernel packet scheduler. It is usually packaged as > -part of the ``iproute2`` package. > -Since commit 11c39b5e9 ("tc: add eBPF support to f_bpf") ``tc`` can be used > -to uploads eBPF code to the kernel and can be patched in order to print the > -C arrays of eBPF instructions just before calling the BPF system call. > -Please refer to ``iproute2`` package file ``lib/bpf.c`` function > -``bpf_prog_load()``. > - > -An example utility for eBPF instruction generation in the format of C arrays > will > -be added in next releases > - > -TAP reports on supported RSS functions as part of dev_infos_get callback: > -``RTE_ETH_RSS_IP``, ``RTE_ETH_RSS_UDP`` and ``RTE_ETH_RSS_TCP``. > -**Known limitation:** TAP supports all of the above hash functions together > -and not in partial combinations. > - > -Systems supporting flow API > ---------------------------- > - > -- "tc flower" classifier requires linux kernel above 4.2 > -- eBPF/RSS requires linux kernel above 4.9 > - > -+--------------------+-----------------------+ > -| RH7.3 | No flow rule support | > -+--------------------+-----------------------+ > -| RH7.4 | No RSS action support | > -+--------------------+-----------------------+ > -| RH7.5 | No RSS action support | > -+--------------------+-----------------------+ > -| SLES 15, | No limitation | > -| kernel 4.12 | | > -+--------------------+-----------------------+ > -| Azure Ubuntu 16.04,| No limitation | > -| kernel 4.13 | | > -+--------------------+-----------------------+ > +The default packet distribution in TAP without flow rules is done by the > +kernel which has a default flow based distribution. > +When flow rules are used to distribute packets across a set of queues > +an eBPF program is used to calculate the RSS based on Toeplitz algorithm for > +with the given key. > + > +The hash is calculated for IPv4 and IPv6, over src/dst addresses > +(8 or 32 bytes for IPv4 or IPv6 respectively) and > +optionally the src/dst TCP/UDP ports (4 bytes). > + > > Limitations > ----------- > > -* Rx/Tx must have the same number of queues. > +- Since TAP device uses a file descriptors to talk to the kernel. > + The same number of queues must be specified for receive and transmit. > + > +- The RSS algorithm only support L3 or L4 functions. It does not support > + finer grain selections (for example: only IPV6 packets with extension > headers). Section headers in the document seems mixed up, it is currently as following, I don't think this was the intention: TAP Requirements Arguments TUN devices Flow API support Requirements Examples of testpmd flow rules Multi-process sharing RSS specifics Limitations