On Tue, Mar 7, 2023 at 7:58 PM Aaron Conole <acon...@redhat.com> wrote: > > Robin Jarry <rja...@redhat.com> writes: > > > Some control protocols are used to maintain link status between > > forwarding engines (e.g. LACP). When the system is not sized properly, > > the PMD threads may not be able to process all incoming traffic from the > > configured Rx queues. When a signaling packet of such protocols is > > dropped, it can cause link flapping, worsening the situation. > > > > Use the RTE flow API to redirect these protocols into a dedicated Rx > > queue. The assumption is made that the ratio between control protocol > > traffic and user data traffic is very low and thus this dedicated Rx > > queue will never get full. The RSS redirection table is re-programmed to > > only use the other Rx queues. The RSS table size is stored in the > > netdev_dpdk structure at port initialization to avoid requesting the > > information again when changing the port configuration. > > > > The additional Rx queue will be assigned a PMD core like any other Rx > > queue. Polling that extra queue may introduce increased latency and > > a slight performance penalty at the benefit of preventing link flapping. > > > > This feature must be enabled per port on specific protocols via the > > cp-protection option. This option takes a coma-separated list of > > protocol names. It is only supported on ethernet ports. This feature is > > experimental. > > > > If the user has already configured multiple Rx queues on the port, an > > additional one will be allocated for control plane packets. If the > > hardware cannot satisfy the requested number of requested Rx queues, the > > last Rx queue will be assigned for control plane. If only one Rx queue > > is available, the cp-protection feature will be disabled. If the > > hardware does not support the RTE flow matchers/actions, the feature > > will be disabled. > > > > It cannot be enabled when other_config:hw-offload=true as it may > > conflict with the offloaded RTE flows. Similarly, if hw-offload is > > enabled while some ports already have cp-protection enabled, RTE flow > > offloading will be disabled on these ports. > > I'm concerned about this - this is a negative interference with rte_flow > offload. And rte_flow offload would also just alleviate these problems, > yes? Maybe we should encourage a user to just turn on flow offloading?
I agree that *in the near future* rte_flow is a great solution, but having a dedicated queue to handle control plane packets is a great addition. Yet, I don't think rte_flow offload will "just" alleviate these problems: - rte_flow offload, in the current state, isn't a working solution for all NICs (Niantic) - with a dedicated queue for LACP (and in the future, other CP packets such as BFD ), we can maintain the link status even when the system is overcapacity, whereas with don't a way to express this constraint with rte_flow. If I'm not mistaken, rte_flow priorities are not supported by Niantic nics, but with other nics, could we have: - high priority flow for the CP protection - lower priorities dedicated for rte_flow offload So, what do you think about getting that improvement now, and keeping it in mind as a requirement for rte_flow offload ? Christophe > > Additionally, it doesn't seem to have a good solution for kernel side. > And I worry about adding flows into the system that the ofproto layer > doesn't know about but will interact with - but maybe I'm just being > paranoid. > > > Example use: > > > > ovs-vsctl add-bond br-phy bond0 phy0 phy1 -- \ > > set interface phy0 type=dpdk options:dpdk-devargs=0000:ca:00.0 -- \ > > set interface phy0 options:cp-protection=lacp -- \ > > set interface phy1 type=dpdk options:dpdk-devargs=0000:ca:00.1 -- \ > > set interface phy1 options:cp-protection=lacp > > > > As a starting point, only one protocol is supported: LACP. Other > > protocols can be added in the future. NIC compatibility should be > > checked. > > > > To validate that this works as intended, I used a traffic generator to > > generate random traffic slightly above the machine capacity at line rate > > on a two ports bond interface. OVS is configured to receive traffic on > > two VLANs and pop/push them in a br-int bridge based on tags set on > > patch ports. > > > > +----------------------+ > > | DUT | > > |+--------------------+| > > || br-int || > > in_port=patch10,actions=mod_dl_src:$patch11,mod_dl_dst:$tgen1,output:patch11 > > || || > > in_port=patch11,actions=mod_dl_src:$patch10,mod_dl_dst:$tgen0,output:patch10 > > || patch10 patch11 || > > |+---|-----------|----+| > > | | | | > > |+---|-----------|----+| > > || patch00 patch01 || > > || tag:10 tag:20 || > > || || > > || br-phy || default flow, action=NORMAL > > || || > > || bond0 || balance-slb, lacp=passive, lacp-time=fast > > || phy0 phy1 || > > |+------|-----|-------+| > > +-------|-----|--------+ > > | | > > +-------|-----|--------+ > > | port0 port1 | balance L3/L4, lacp=active, lacp-time=fast > > | lag | mode trunk VLANs 10, 20 > > | | > > | switch | > > | | > > | vlan 10 vlan 20 | mode access > > | port2 port3 | > > +-----|----------|-----+ > > | | > > +-----|----------|-----+ > > | tgen0 tgen1 | Random traffic that is properly balanced > > | | across the bond ports in both directions. > > | traffic generator | > > +----------------------+ > > > > Without cp-protection, the bond0 links are randomly switching to > > "defaulted" when one of the LACP packets sent by the switch is dropped > > because the RX queues are full and the PMD threads did not process them > > fast enough. When that happens, all traffic must go through a single > > link which causes above line rate traffic to be dropped. > > > > ~# ovs-appctl lacp/show-stats bond0 > > ---- bond0 statistics ---- > > member: phy0: > > TX PDUs: 347246 > > RX PDUs: 14865 > > RX Bad PDUs: 0 > > RX Marker Request PDUs: 0 > > Link Expired: 168 > > Link Defaulted: 0 > > Carrier Status Changed: 0 > > member: phy1: > > TX PDUs: 347245 > > RX PDUs: 14919 > > RX Bad PDUs: 0 > > RX Marker Request PDUs: 0 > > Link Expired: 147 > > Link Defaulted: 1 > > Carrier Status Changed: 0 > > > > When cp-protection is enabled, no LACP packet is dropped and the bond > > links remain enabled at all times, maximizing the throughput. Neither > > the "Link Expired" nor the "Link Defaulted" counters are incremented > > anymore. > > > > This feature may be considered as "QoS". However, it does not work by > > limiting the rate of traffic explicitly. It only guarantees that some > > protocols have a lower chance of being dropped because the PMD cores > > cannot keep up with regular traffic. > > > > The choice of protocols is limited on purpose. This is not meant to be > > configurable by users. Some limited configurability could be considered > > in the future but it would expose to more potential issues if users are > > accidentally redirecting all traffic in the control plane queue. > > > > Cc: Anthony Harivel <ahari...@redhat.com> > > Cc: Christophe Fontaine <cfont...@redhat.com> > > Cc: David Marchand <david.march...@redhat.com> > > Cc: Kevin Traynor <ktray...@redhat.com> > > Signed-off-by: Robin Jarry <rja...@redhat.com> > > --- > > v8 -> v9: > > > > * Rebased on cf288fdfe2bf ("AUTHORS: Add Liang Mancang and Viacheslav > > Galaktionov.") > > * Reset rte_flow_error struct before passing it to functions. > > * Refined some comments. > > * Updated check for hw-offload on a per-port basis. That way, if a port > > already has cp-protection enabled, hw-offload will not be enabled on > > it but cp-protection will continue to work until next restart. > > However, On next restart, hw-offload will be checked first and > > therefore cp-protection will be disabled on all ports. > > > > Unless there are significant reserves about this patch. Would it be ok > > to include it for 3.2? > > > > Thanks! > > > > Documentation/topics/dpdk/phy.rst | 77 ++++++++ > > NEWS | 4 + > > lib/netdev-dpdk.c | 310 +++++++++++++++++++++++++++++- > > vswitchd/vswitch.xml | 26 +++ > > 4 files changed, 414 insertions(+), 3 deletions(-) > > > > diff --git a/Documentation/topics/dpdk/phy.rst > > b/Documentation/topics/dpdk/phy.rst > > index 4b0fe8dded3a..518b67134639 100644 > > --- a/Documentation/topics/dpdk/phy.rst > > +++ b/Documentation/topics/dpdk/phy.rst > > @@ -131,6 +131,83 @@ possible with DPDK acceleration. It is possible to > > configure multiple Rx queues > > for ``dpdk`` ports, thus ensuring this is not a bottleneck for > > performance. For > > information on configuring PMD threads, refer to :doc:`pmd`. > > > > +Control Plane Protection > > +------------------------ > > + > > +.. warning:: This feature is experimental. > > + > > +Some control protocols are used to maintain link status between forwarding > > +engines. In SDN environments, these packets share the same physical network > > +than the user data traffic. > > + > > +When the system is not sized properly, the PMD threads may not be able to > > +process all incoming traffic from the configured Rx queues. When a > > signaling > > +packet of such protocols is dropped, it can cause link flapping, worsening > > the > > +situation. > > + > > +Some physical NICs can be programmed to put these protocols in a dedicated > > +hardware Rx queue using the rte_flow__ API. > > + > > +__ https://doc.dpdk.org/guides-22.11/prog_guide/rte_flow.html > > + > > +The currently supported control plane protocols are: > > + > > +``lacp`` > > + `Link Aggregation Control Protocol`__. Ether type ``0x8809``. > > + > > + __ https://www.ieee802.org/3/ad/public/mar99/seaman_1_0399.pdf > > + > > +.. warning:: > > + > > + This feature is not compatible with all NICs. Refer to the DPDK > > + `compatibilty matrix`__ and vendor documentation for more details. > > + > > + __ https://doc.dpdk.org/guides-22.11/nics/overview.html > > + > > +Control plane protection must be enabled on specific protocols per port. > > The > > +``cp-protection`` option requires a coma separated list of protocol names:: > > + > > + $ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \ > > + options:dpdk-devargs=0000:01:00.0 options:n_rxq=2 \ > > + options:cp-protection=lacp > > + > > +.. note:: > > + > > + If multiple Rx queues are already configured, regular RSS (Receive Side > > + Scaling) queue balancing is done on all but the extra control plane > > + protection queue. > > + > > +.. tip:: > > + > > + You can check if control plane protection is supported on a port with > > the > > + following command:: > > + > > + $ ovs-vsctl get interface dpdk-p0 status > > + {cp_protection_queue="2", driver_name=..., rss_queues="0-1"} > > + > > + This will also show in ``ovs-vswitchd.log``:: > > + > > + INFO|dpdk-p0: cp-protection: redirecting lacp traffic to queue 2 > > + INFO|dpdk-p0: cp-protection: applying rss on queues 0-1 > > + > > + If the hardware does not support redirecting control plane traffic to > > + a dedicated queue, it will be explicit:: > > + > > + $ ovs-vsctl get interface dpdk-p0 status > > + {cp_protection=unsupported, driver_name=...} > > + > > + More details can often be found in ``ovs-vswitchd.log``:: > > + > > + WARN|dpdk-p0: cp-protection: failed to add lacp flow: Unsupported > > pattern > > + > > +To disable control plane protection on a port, use the following command:: > > + > > + $ ovs-vsctl remove Interface dpdk-p0 options cp-protection > > + > > +You can see that it has been disabled in ``ovs-vswitchd.log``:: > > + > > + INFO|dpdk-p0: cp-protection: disabled > > + > > .. _dpdk-phy-flow-control: > > > > Flow Control > > diff --git a/NEWS b/NEWS > > index 85b34962145e..22505ea9d4ad 100644 > > --- a/NEWS > > +++ b/NEWS > > @@ -10,6 +10,10 @@ Post-v3.1.0 > > in order to create OVSDB sockets with access mode of 0770. > > - QoS: > > * Added new configuration option 'jitter' for a linux-netem QoS type. > > + - DPDK: > > + * New experimental "cp-protection=<protocol>" option to redirect > > certain > > + protocols (for now, only LACP) to a dedicated hardware queue using > > + RTE flow. > > > > > > v3.1.0 - 16 Feb 2023 > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > > index fb0dd43f75c5..549fc425edba 100644 > > --- a/lib/netdev-dpdk.c > > +++ b/lib/netdev-dpdk.c > > @@ -415,6 +415,10 @@ enum dpdk_hw_ol_features { > > NETDEV_TX_SCTP_CHECKSUM_OFFLOAD = 1 << 4, > > }; > > > > +enum dpdk_cp_prot_flags { > > + DPDK_CP_PROT_LACP = 1 << 0, > > +}; > > + > > /* > > * In order to avoid confusion in variables names, following naming > > convention > > * should be used, if possible: > > @@ -504,11 +508,18 @@ struct netdev_dpdk { > > * so we remember the request and update them next time > > * netdev_dpdk*_reconfigure() is called */ > > int requested_mtu; > > + /* User input for n_rxq + an optional control plane protection > > queue > > + * (see netdev_dpdk_reconfigure). This field is different from the > > + * other requested_* fields as it may contain a different value > > than > > + * the user input. */ > > int requested_n_txq; > > int requested_n_rxq; > > int requested_rxq_size; > > int requested_txq_size; > > > > + /* User input for n_rxq (see dpdk_set_rxq_config). */ > > + int user_n_rxq; > > + > > /* Number of rx/tx descriptors for physical devices */ > > int rxq_size; > > int txq_size; > > @@ -534,6 +545,13 @@ struct netdev_dpdk { > > > > /* VF configuration. */ > > struct eth_addr requested_hwaddr; > > + > > + /* Requested control plane protection flags, > > + * from the enum set 'dpdk_cp_prot_flags'. */ > > + uint64_t requested_cp_prot_flags; > > + uint64_t cp_prot_flags; > > + size_t cp_prot_flows_num; > > + struct rte_flow **cp_prot_flows; > > ); > > > > PADDED_MEMBERS(CACHE_LINE_SIZE, > > @@ -1310,9 +1328,14 @@ common_construct(struct netdev *netdev, dpdk_port_t > > port_no, > > netdev->n_rxq = 0; > > netdev->n_txq = 0; > > dev->requested_n_rxq = NR_QUEUE; > > + dev->user_n_rxq = NR_QUEUE; > > dev->requested_n_txq = NR_QUEUE; > > dev->requested_rxq_size = NIC_PORT_DEFAULT_RXQ_SIZE; > > dev->requested_txq_size = NIC_PORT_DEFAULT_TXQ_SIZE; > > + dev->requested_cp_prot_flags = 0; > > + dev->cp_prot_flags = 0; > > + dev->cp_prot_flows_num = 0; > > + dev->cp_prot_flows = NULL; > > > > /* Initialize the flow control to NULL */ > > memset(&dev->fc_conf, 0, sizeof dev->fc_conf); > > @@ -1487,6 +1510,8 @@ common_destruct(struct netdev_dpdk *dev) > > ovs_mutex_destroy(&dev->mutex); > > } > > > > +static void dpdk_cp_prot_unconfigure(struct netdev_dpdk *dev); > > + > > static void > > netdev_dpdk_destruct(struct netdev *netdev) > > { > > @@ -1494,6 +1519,9 @@ netdev_dpdk_destruct(struct netdev *netdev) > > > > ovs_mutex_lock(&dpdk_mutex); > > > > + /* Destroy any rte flows to allow RXQs to be removed. */ > > + dpdk_cp_prot_unconfigure(dev); > > + > > rte_eth_dev_stop(dev->port_id); > > dev->started = false; > > > > @@ -1908,8 +1936,8 @@ dpdk_set_rxq_config(struct netdev_dpdk *dev, const > > struct smap *args) > > int new_n_rxq; > > > > new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1); > > - if (new_n_rxq != dev->requested_n_rxq) { > > - dev->requested_n_rxq = new_n_rxq; > > + if (new_n_rxq != dev->user_n_rxq) { > > + dev->user_n_rxq = new_n_rxq; > > netdev_request_reconfigure(&dev->up); > > } > > } > > @@ -1931,6 +1959,48 @@ dpdk_process_queue_size(struct netdev *netdev, const > > struct smap *args, > > } > > } > > > > +static void > > +dpdk_cp_prot_set_config(struct netdev *netdev, struct netdev_dpdk *dev, > > + const struct smap *args, char **errp) > > +{ > > + const char *arg = smap_get_def(args, "cp-protection", ""); > > + char *token, *saveptr, *buf; > > + uint64_t flags = 0; > > + > > + buf = xstrdup(arg); > > + token = strtok_r(buf, ",", &saveptr); > > + while (token) { > > + if (strcmp(token, "lacp") == 0) { > > + flags |= DPDK_CP_PROT_LACP; > > + } else { > > + VLOG_WARN_BUF(errp, "%s options:cp-protection " > > + "unknown protocol '%s'", > > + netdev_get_name(netdev), token); > > + } > > + token = strtok_r(NULL, ",", &saveptr); > > + } > > + free(buf); > > + > > + if (flags && dev->type != DPDK_DEV_ETH) { > > + VLOG_WARN_BUF(errp, "%s options:cp-protection " > > + "is only supported on ethernet ports", > > + netdev_get_name(netdev)); > > + flags = 0; > > + } > > + > > + if (flags && ovsrcu_get(void *, &netdev->hw_info.offload_data)) { > > + VLOG_WARN_BUF(errp, "%s options:cp-protection " > > + "is incompatible with hw-offload", > > + netdev_get_name(netdev)); > > + flags = 0; > > + } > > + > > + if (flags != dev->requested_cp_prot_flags) { > > + dev->requested_cp_prot_flags = flags; > > + netdev_request_reconfigure(netdev); > > + } > > +} > > + > > static int > > netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args, > > char **errp) > > @@ -1950,6 +2020,8 @@ netdev_dpdk_set_config(struct netdev *netdev, const > > struct smap *args, > > ovs_mutex_lock(&dpdk_mutex); > > ovs_mutex_lock(&dev->mutex); > > > > + dpdk_cp_prot_set_config(netdev, dev, args, errp); > > + > > dpdk_set_rxq_config(dev, args); > > > > dpdk_process_queue_size(netdev, args, "n_rxq_desc", > > @@ -3819,9 +3891,12 @@ netdev_dpdk_get_status(const struct netdev *netdev, > > struct smap *args) > > { > > struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > > struct rte_eth_dev_info dev_info; > > + size_t cp_prot_flows_num; > > + uint64_t cp_prot_flags; > > const char *bus_info; > > uint32_t link_speed; > > uint32_t dev_flags; > > + int n_rxq; > > > > if (!rte_eth_dev_is_valid_port(dev->port_id)) { > > return ENODEV; > > @@ -3833,6 +3908,9 @@ netdev_dpdk_get_status(const struct netdev *netdev, > > struct smap *args) > > link_speed = dev->link.link_speed; > > dev_flags = *dev_info.dev_flags; > > bus_info = rte_dev_bus_info(dev_info.device); > > + cp_prot_flags = dev->cp_prot_flags; > > + cp_prot_flows_num = dev->cp_prot_flows_num; > > + n_rxq = netdev->n_rxq; > > ovs_mutex_unlock(&dev->mutex); > > ovs_mutex_unlock(&dpdk_mutex); > > > > @@ -3875,6 +3953,19 @@ netdev_dpdk_get_status(const struct netdev *netdev, > > struct smap *args) > > ETH_ADDR_ARGS(dev->hwaddr)); > > } > > > > + if (cp_prot_flags) { > > + if (!cp_prot_flows_num) { > > + smap_add(args, "cp_protection", "unsupported"); > > + } else { > > + smap_add_format(args, "cp_protection_queue", "%d", n_rxq - 1); > > + if (n_rxq > 2) { > > + smap_add_format(args, "rss_queues", "0-%d", n_rxq - 2); > > + } else { > > + smap_add(args, "rss_queues", "0"); > > + } > > + } > > + } > > + > > return 0; > > } > > > > @@ -5178,16 +5269,203 @@ static const struct dpdk_qos_ops trtcm_policer_ops > > = { > > .qos_queue_dump_state_init = trtcm_policer_qos_queue_dump_state_init > > }; > > > > +static int > > +dpdk_cp_prot_add_flow(struct netdev_dpdk *dev, > > + const struct rte_flow_item items[], > > + const char *desc) > > +{ > > + const struct rte_flow_attr attr = { .ingress = 1 }; > > + const struct rte_flow_action actions[] = { > > + { > > + .type = RTE_FLOW_ACTION_TYPE_QUEUE, > > + .conf = &(const struct rte_flow_action_queue) { > > + .index = dev->up.n_rxq - 1, > > + }, > > + }, > > + { .type = RTE_FLOW_ACTION_TYPE_END }, > > + }; > > + struct rte_flow_error error; > > + struct rte_flow *flow; > > + size_t num; > > + int err; > > + > > + set_error(&error, RTE_FLOW_ERROR_TYPE_NONE); > > + err = rte_flow_validate(dev->port_id, &attr, items, actions, &error); > > + if (err) { > > + VLOG_WARN("%s: cp-protection: device does not support %s flow: %s", > > + netdev_get_name(&dev->up), desc, > > + error.message ? error.message : ""); > > + goto out; > > + } > > + > > + set_error(&error, RTE_FLOW_ERROR_TYPE_NONE); > > + flow = rte_flow_create(dev->port_id, &attr, items, actions, &error); > > + if (flow == NULL) { > > + VLOG_WARN("%s: cp-protection: failed to add %s flow: %s", > > + netdev_get_name(&dev->up), desc, > > + error.message ? error.message : ""); > > + err = rte_errno; > > + goto out; > > + } > > + > > + num = dev->cp_prot_flows_num + 1; > > + dev->cp_prot_flows = xrealloc(dev->cp_prot_flows, sizeof(flow) * num); > > + dev->cp_prot_flows[dev->cp_prot_flows_num] = flow; > > + dev->cp_prot_flows_num = num; > > + > > + VLOG_INFO("%s: cp-protection: redirected %s traffic to rx queue %d", > > + netdev_get_name(&dev->up), desc, dev->up.n_rxq - 1); > > +out: > > + return err; > > +} > > + > > +#define RETA_CONF_SIZE (RTE_ETH_RSS_RETA_SIZE_512 / > > RTE_ETH_RETA_GROUP_SIZE) > > + > > +static int > > +dpdk_cp_prot_rss_configure(struct netdev_dpdk *dev, int rss_n_rxq) > > +{ > > + struct rte_eth_rss_reta_entry64 reta_conf[RETA_CONF_SIZE]; > > + struct rte_eth_dev_info info; > > + int err; > > + > > + rte_eth_dev_info_get(dev->port_id, &info); > > + > > + if (info.reta_size % rss_n_rxq != 0 && > > + info.reta_size < RTE_ETH_RSS_RETA_SIZE_128) { > > + /* > > + * Some drivers set reta_size equal to the total number of rxqs > > that > > + * are configured when it is a power of two. Since we are actually > > + * reconfiguring the redirection table to exclude the last rxq, we > > may > > + * end up with an imbalanced redirection table. For example, such > > + * configuration: > > + * > > + * options:n_rxq=3 options:cp-protection=lacp > > + * > > + * Will actually configure 4 rxqs on the NIC, and the default reta > > to: > > + * > > + * [0, 1, 2, 3] > > + * > > + * And dpdk_cp_prot_rss_configure() will reconfigure reta to: > > + * > > + * [0, 1, 2, 0] > > + * > > + * Causing queue 0 to receive twice as much traffic as queues 1 > > and 2. > > + * > > + * Work around that corner case by forcing a bigger redirection > > table > > + * size to 128 entries when reta_size is not a multiple of > > rss_n_rxq > > + * and when reta_size is less than 128. This value seems to be > > + * supported by most of the drivers that also support rte flow. > > + */ > > + info.reta_size = RTE_ETH_RSS_RETA_SIZE_128; > > + } > > + > > + memset(reta_conf, 0, sizeof(reta_conf)); > > + for (uint16_t i = 0; i < info.reta_size; i++) { > > + uint16_t idx = i / RTE_ETH_RETA_GROUP_SIZE; > > + uint16_t shift = i % RTE_ETH_RETA_GROUP_SIZE; > > + reta_conf[idx].mask |= 1ULL << shift; > > + reta_conf[idx].reta[shift] = i % rss_n_rxq; > > + } > > + err = rte_eth_dev_rss_reta_update(dev->port_id, reta_conf, > > info.reta_size); > > + if (err < 0) { > > + VLOG_WARN("%s: failed to configure RSS redirection table: err=%d", > > + netdev_get_name(&dev->up), err); > > + } > > + > > + return err; > > +} > > + > > +static int > > +dpdk_cp_prot_configure(struct netdev_dpdk *dev) > > +{ > > + int err = 0; > > + > > + if (dev->up.n_rxq < 2) { > > + err = ENOTSUP; > > + VLOG_WARN("%s: cp-protection: not enough available rx queues", > > + netdev_get_name(&dev->up)); > > + goto out; > > + } > > + > > + if (dev->requested_cp_prot_flags & DPDK_CP_PROT_LACP) { > > + const struct rte_flow_item items[] = { > > + { > > + .type = RTE_FLOW_ITEM_TYPE_ETH, > > + .spec = &(const struct rte_flow_item_eth){ > > + .type = htons(ETH_TYPE_LACP), > > + }, > > + .mask = &(const struct rte_flow_item_eth){ > > + .type = htons(0xffff), > > + }, > > + }, > > + { .type = RTE_FLOW_ITEM_TYPE_END }, > > + }; > > + err = dpdk_cp_prot_add_flow(dev, items, "lacp"); > > + if (err) { > > + goto out; > > + } > > + } > > + > > + if (dev->cp_prot_flows_num) { > > + /* Reconfigure RSS reta in all but the cp protection queue. */ > > + err = dpdk_cp_prot_rss_configure(dev, dev->up.n_rxq - 1); > > + if (!err) { > > + if (dev->up.n_rxq == 2) { > > + VLOG_INFO("%s: cp-protection: redirected other traffic to " > > + "rx queue 0", netdev_get_name(&dev->up)); > > + } else { > > + VLOG_INFO("%s: cp-protection: applied rss on rx queue > > 0-%u", > > + netdev_get_name(&dev->up), dev->up.n_rxq - 2); > > + } > > + } > > + } > > + > > +out: > > + return err; > > +} > > + > > +static void > > +dpdk_cp_prot_unconfigure(struct netdev_dpdk *dev) > > +{ > > + struct rte_flow_error error; > > + > > + if (!dev->cp_prot_flows_num) { > > + return; > > + } > > + > > + VLOG_DBG("%s: cp-protection: reset flows", netdev_get_name(&dev->up)); > > + > > + for (int i = 0; i < dev->cp_prot_flows_num; i++) { > > + set_error(&error, RTE_FLOW_ERROR_TYPE_NONE); > > + if (rte_flow_destroy(dev->port_id, dev->cp_prot_flows[i], &error)) > > { > > + VLOG_DBG("%s: cp-protection: failed to destroy flow: %s", > > + netdev_get_name(&dev->up), > > + error.message ? error.message : ""); > > + } > > + } > > + free(dev->cp_prot_flows); > > + dev->cp_prot_flows_num = 0; > > + dev->cp_prot_flows = NULL; > > +} > > + > > static int > > netdev_dpdk_reconfigure(struct netdev *netdev) > > { > > struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > > + bool try_cp_prot; > > int err = 0; > > > > ovs_mutex_lock(&dev->mutex); > > > > + try_cp_prot = dev->requested_cp_prot_flags != 0; > > + dev->requested_n_rxq = dev->user_n_rxq; > > + if (try_cp_prot) { > > + dev->requested_n_rxq += 1; > > + } > > + > > if (netdev->n_txq == dev->requested_n_txq > > && netdev->n_rxq == dev->requested_n_rxq > > + && dev->cp_prot_flags == dev->requested_cp_prot_flags > > && dev->mtu == dev->requested_mtu > > && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode > > && dev->rxq_size == dev->requested_rxq_size > > @@ -5200,6 +5478,9 @@ netdev_dpdk_reconfigure(struct netdev *netdev) > > goto out; > > } > > > > +retry: > > + dpdk_cp_prot_unconfigure(dev); > > + > > if (dev->reset_needed) { > > rte_eth_dev_reset(dev->port_id); > > if_notifier_manual_report(); > > @@ -5224,6 +5505,7 @@ netdev_dpdk_reconfigure(struct netdev *netdev) > > dev->txq_size = dev->requested_txq_size; > > > > rte_free(dev->tx_q); > > + dev->tx_q = NULL; > > > > if (!eth_addr_equals(dev->hwaddr, dev->requested_hwaddr)) { > > err = netdev_dpdk_set_etheraddr__(dev, dev->requested_hwaddr); > > @@ -5255,6 +5537,23 @@ netdev_dpdk_reconfigure(struct netdev *netdev) > > */ > > dev->requested_hwaddr = dev->hwaddr; > > > > + if (try_cp_prot) { > > + err = dpdk_cp_prot_configure(dev); > > + if (err) { > > + /* No hw support, disable & recover gracefully. */ > > + try_cp_prot = false; > > + /* > > + * The extra queue must be explicitly removed here to ensure > > that > > + * it is unconfigured immediately. > > + */ > > + dev->requested_n_rxq = dev->user_n_rxq; > > + goto retry; > > + } > > + } else { > > + VLOG_INFO("%s: cp-protection: disabled", > > netdev_get_name(&dev->up)); > > + } > > + dev->cp_prot_flags = dev->requested_cp_prot_flags; > > + > > dev->tx_q = netdev_dpdk_alloc_txq(netdev->n_txq); > > if (!dev->tx_q) { > > err = ENOMEM; > > @@ -5468,7 +5767,12 @@ netdev_dpdk_flow_api_supported(struct netdev *netdev) > > ovs_mutex_lock(&dev->mutex); > > if (dev->type == DPDK_DEV_ETH) { > > /* TODO: Check if we able to offload some minimal flow. */ > > - ret = true; > > + if (dev->requested_cp_prot_flags) { > > + VLOG_WARN("%s: hw-offload is mutually exclusive with " > > + "cp-protection", netdev_get_name(netdev)); > > + } else { > > + ret = true; > > + } > > } > > ovs_mutex_unlock(&dev->mutex); > > out: > > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml > > index 05ac1fbe5ef6..e6575f84eb3c 100644 > > --- a/vswitchd/vswitch.xml > > +++ b/vswitchd/vswitch.xml > > @@ -3453,6 +3453,32 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 > > type=patch options:peer=p1 \ > > <p>This option may only be used with dpdk VF representors.</p> > > </column> > > > > + <column name="options" key="cp-protection" > > + type='{"type": "string", "enum": ["set", ["lacp"]]}'> > > + <p> > > + Allocate an extra Rx queue for control plane packets of the > > specified > > + protocol(s). > > + </p> > > + <p> > > + If the user has already configured multiple > > + <code>options:n_rxq</code> on the port, an additional one will be > > + allocated for control plane packets. If the hardware cannot > > satisfy > > + the requested number of requested Rx queues, the last Rx queue > > will > > + be assigned for control plane. If only one Rx queue is available > > or > > + if the hardware does not support the RTE flow matchers/actions > > + required to redirect the selected protocols, > > + <code>cp-protection</code> will be disabled. > > + </p> > > + <p> > > + This feature is multually exclusive with > > + <code>other_options:hw-offload</code> as it may conflict with the > > + offloaded RTE flows. > > + </p> > > + <p> > > + Disabled by default. > > + </p> > > + </column> > > + > > <column name="other_config" key="tx-steering" > > type='{"type": "string", > > "enum": ["set", ["thread", "hash"]]}'> > > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev