On 22/02/2023 15:43, Robin Jarry wrote:
Some control protocols are used to maintain link status between
forwarding engines (e.g. LACP). When the system is not sized properly,
the PMD threads may not be able to process all incoming traffic from the
configured Rx queues. When a signaling packet of such protocols is
dropped, it can cause link flapping, worsening the situation.

Use the RTE flow API to redirect these protocols into a dedicated Rx
queue. The assumption is made that the ratio between control protocol
traffic and user data traffic is very low and thus this dedicated Rx
queue will never get full. The RSS redirection table is re-programmed to
only use the other Rx queues. The RSS table size is stored in the
netdev_dpdk structure at port initialization to avoid requesting the
information again when changing the port configuration.

The additional Rx queue will be assigned a PMD core like any other Rx
queue. Polling that extra queue may introduce increased latency and
a slight performance penalty at the benefit of preventing link flapping.

This feature must be enabled per port on specific protocols via the
cp-protection option. This option takes a coma-separated list of
protocol names. It is only supported on ethernet ports. This feature is
experimental.

If the user has already configured multiple Rx queues on the port, an
additional one will be allocated for control plane packets. If the
hardware cannot satisfy the requested number of requested Rx queues, the
last Rx queue will be assigned for control plane. If only one Rx queue
is available, the cp-protection feature will be disabled. If the
hardware does not support the RTE flow matchers/actions, the feature
will be disabled.

It cannot be enabled when other_config:hw-offload=true as it may
conflict with the offloaded RTE flows. Similarly, if hw-offload is
enabled while some ports already have cp-protection enabled, RTE flow
offloading will be disabled on these ports.

Example use:

  ovs-vsctl add-bond br-phy bond0 phy0 phy1 -- \
    set interface phy0 type=dpdk options:dpdk-devargs=0000:ca:00.0 -- \
    set interface phy0 options:cp-protection=lacp -- \
    set interface phy1 type=dpdk options:dpdk-devargs=0000:ca:00.1 -- \
    set interface phy1 options:cp-protection=lacp

As a starting point, only one protocol is supported: LACP. Other
protocols can be added in the future. NIC compatibility should be
checked.

To validate that this works as intended, I used a traffic generator to
generate random traffic slightly above the machine capacity at line rate
on a two ports bond interface. OVS is configured to receive traffic on
two VLANs and pop/push them in a br-int bridge based on tags set on
patch ports.

    +----------------------+
    |         DUT          |
    |+--------------------+|
    ||       br-int       || 
in_port=patch10,actions=mod_dl_src:$patch11,mod_dl_dst:$tgen1,output:patch11
    ||                    || 
in_port=patch11,actions=mod_dl_src:$patch10,mod_dl_dst:$tgen0,output:patch10
    || patch10    patch11 ||
    |+---|-----------|----+|
    |    |           |     |
    |+---|-----------|----+|
    || patch00    patch01 ||
    ||  tag:10    tag:20  ||
    ||                    ||
    ||       br-phy       || default flow, action=NORMAL
    ||                    ||
    ||       bond0        || balance-slb, lacp=passive, lacp-time=fast
    ||    phy0   phy1     ||
    |+------|-----|-------+|
    +-------|-----|--------+
            |     |
    +-------|-----|--------+
    |     port0  port1     | balance L3/L4, lacp=active, lacp-time=fast
    |         lag          | mode trunk VLANs 10, 20
    |                      |
    |        switch        |
    |                      |
    |  vlan 10    vlan 20  |  mode access
    |   port2      port3   |
    +-----|----------|-----+
          |          |
    +-----|----------|-----+
    |   tgen0      tgen1   |  Random traffic that is properly balanced
    |                      |  across the bond ports in both directions.
    |  traffic generator   |
    +----------------------+

Without cp-protection, the bond0 links are randomly switching to
"defaulted" when one of the LACP packets sent by the switch is dropped
because the RX queues are full and the PMD threads did not process them
fast enough. When that happens, all traffic must go through a single
link which causes above line rate traffic to be dropped.

  ~# ovs-appctl lacp/show-stats bond0
  ---- bond0 statistics ----
  member: phy0:
    TX PDUs: 347246
    RX PDUs: 14865
    RX Bad PDUs: 0
    RX Marker Request PDUs: 0
    Link Expired: 168
    Link Defaulted: 0
    Carrier Status Changed: 0
  member: phy1:
    TX PDUs: 347245
    RX PDUs: 14919
    RX Bad PDUs: 0
    RX Marker Request PDUs: 0
    Link Expired: 147
    Link Defaulted: 1
    Carrier Status Changed: 0

When cp-protection is enabled, no LACP packet is dropped and the bond
links remain enabled at all times, maximizing the throughput. Neither
the "Link Expired" nor the "Link Defaulted" counters are incremented
anymore.

This feature may be considered as "QoS". However, it does not work by
limiting the rate of traffic explicitly. It only guarantees that some
protocols have a lower chance of being dropped because the PMD cores
cannot keep up with regular traffic.

The choice of protocols is limited on purpose. This is not meant to be
configurable by users. Some limited configurability could be considered
in the future but it would expose to more potential issues if users are
accidentally redirecting all traffic in the control plane queue.

Cc: Anthony Harivel <ahari...@redhat.com>
Cc: Christophe Fontaine <cfont...@redhat.com>
Cc: David Marchand <david.march...@redhat.com>
Cc: Kevin Traynor <ktray...@redhat.com>
Signed-off-by: Robin Jarry <rja...@redhat.com>
---
v8 -> v9:

* Rebased on cf288fdfe2bf ("AUTHORS: Add Liang Mancang and Viacheslav
   Galaktionov.")
* Reset rte_flow_error struct before passing it to functions.
* Refined some comments.
* Updated check for hw-offload on a per-port basis. That way, if a port
   already has cp-protection enabled, hw-offload will not be enabled on
   it but cp-protection will continue to work until next restart.
   However, On next restart, hw-offload will be checked first and
   therefore cp-protection will be disabled on all ports.


Hi Robin,

Regarding having both features enabled, I think it's an issue that it's chronological based if they are enabled while running. It introduces another variable that might confuse things.

For example, the operation could be changed from cp-proto to hw-offload on a port by restarting OVS, which would probably be unexpected by a user. I mentioned it while chatting to Ilya and he agreed that same state in ovsdb should mean same state in ovs-vswitchd.

So that would mean having a binary priority between the two features and removing one if the higher priority one was later enabled (either globally or per-port?).

Whatever the co-existance (or not) is, I think it's better to resolve it in mail first to avoid you having to rework code over again. I don't think it needs to be super-smart as these are experimental features, just needs to be consistent and clearly documented for the user.

Code wise, I've tested previous versions and I think the code is in pretty good shape overall. I'll do another pass review/testing when the hwol/cp-prot prio is resolved.

thanks,
Kevin.

Unless there are significant reserves about this patch. Would it be ok
to include it for 3.2?

Thanks!

  Documentation/topics/dpdk/phy.rst |  77 ++++++++
  NEWS                              |   4 +
  lib/netdev-dpdk.c                 | 310 +++++++++++++++++++++++++++++-
  vswitchd/vswitch.xml              |  26 +++
  4 files changed, 414 insertions(+), 3 deletions(-)


<snip>



_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to