Re: [ovs-dev] [PATCH v12] netdev-dpdk: Add custom rx-steering configuration.

2023-06-30 Thread Ilya Maximets
On 6/30/23 10:00, Robin Jarry wrote:
> Hi Ilya,
> 
> Ilya Maximets, Jun 29, 2023 at 23:57:
>>> +if (flags && ovsrcu_get(void *, >hw_info.offload_data)) {
>>
>> We probbaly shouldn't use the offload_data outside of netdev-offload
>> modules.  Simply checking netdev_is_flow_api_enabled() should be
>> enough.  Since both features require rte_flow oflload it's unlikely
>> that rx-steering can work if flow api is enabled globally.  And
>> netdev-offload-dpdk doesn't really check device capabilities.
> 
> OK, I'll change that for v13.
> 
>>>  static int
>>>  netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
>>> char **errp)
>>
>> Aside from the set_config, we also need a get_config callback update.
>> get_config() should return everything that set_config() set.  i.e.
>> we should return the "rx-steering": "rss+lacp", if user configured it.
>> The output will be visible in the dpif/show.
> 
> Hmm, I had missed the get_config callback. I have added something for
> get status but I assume it is different.

Right.  'status' is anything the netdev provider wants to report in
a free form.  'config' is something that users can put into a database
to get a configuration equivalent to a current one.

get_config() in netdev-dpdk currently reports weird stuff and needs
fixing.  If you want an example, look at n_rxq_desc reporting, it is
more or less correctly done.

Having some extra information in the status is fine.

> 
>>> +memset(reta_conf, 0, sizeof(reta_conf));
>>> +for (uint16_t i = 0; i < info.reta_size; i++) {
>>> +uint16_t idx = i / RTE_ETH_RETA_GROUP_SIZE;
>>> +uint16_t shift = i % RTE_ETH_RETA_GROUP_SIZE;
>>
>> An empty line here.
>>
>>> +reta_conf[idx].mask |= 1ULL << shift;
>>> +reta_conf[idx].reta[shift] = i % rss_n_rxq;
>>> +}
>>
>> And here.
> 
> Ack.
> 
>>> +for (int i = 0; i < dev->rx_steer_flows_num; i++) {
>>> +set_error(, RTE_FLOW_ERROR_TYPE_NONE);
>>> +if (rte_flow_destroy(dev->port_id, dev->rx_steer_flows[i], 
>>> )) {
>>> +VLOG_DBG("%s: rx-steering: failed to destroy flow: %s",
>>
>> Maybe a WARN ?
> 
> Will change that.
> 
>> So, this function destroys all the flows, OK.
>> But we also need to restore the redirection table, right?  Or is it handled
>> somehow in the driver when number of queues changed?  From my experience,
>> drivers tend to not restore this kind of configuration even on device detach
>> and it gets preserved even across OVS restarts.
> 
> From what I saw during testing, the RETA is reset every time the device
> is reset. In the first versions of this patch, I did reset the table but
> I seem to remember some discussions with Kevin and/or David where we
> concluded that it was redundant. I will check again to make sure before
> sending a respin.

OK.  Thanks!
Was just checking that this part wasn't overlooked.  Maybe add a comment
to a function on why it doesn't restore the redirection table, so the
code readers don't have extra questions?

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v12] netdev-dpdk: Add custom rx-steering configuration.

2023-06-30 Thread Robin Jarry
Hi Ilya,

Ilya Maximets, Jun 29, 2023 at 23:57:
> > +if (flags && ovsrcu_get(void *, >hw_info.offload_data)) {
>
> We probbaly shouldn't use the offload_data outside of netdev-offload
> modules.  Simply checking netdev_is_flow_api_enabled() should be
> enough.  Since both features require rte_flow oflload it's unlikely
> that rx-steering can work if flow api is enabled globally.  And
> netdev-offload-dpdk doesn't really check device capabilities.

OK, I'll change that for v13.

> >  static int
> >  netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
> > char **errp)
>
> Aside from the set_config, we also need a get_config callback update.
> get_config() should return everything that set_config() set.  i.e.
> we should return the "rx-steering": "rss+lacp", if user configured it.
> The output will be visible in the dpif/show.

Hmm, I had missed the get_config callback. I have added something for
get status but I assume it is different.

> > +memset(reta_conf, 0, sizeof(reta_conf));
> > +for (uint16_t i = 0; i < info.reta_size; i++) {
> > +uint16_t idx = i / RTE_ETH_RETA_GROUP_SIZE;
> > +uint16_t shift = i % RTE_ETH_RETA_GROUP_SIZE;
>
> An empty line here.
>
> > +reta_conf[idx].mask |= 1ULL << shift;
> > +reta_conf[idx].reta[shift] = i % rss_n_rxq;
> > +}
>
> And here.

Ack.

> > +for (int i = 0; i < dev->rx_steer_flows_num; i++) {
> > +set_error(, RTE_FLOW_ERROR_TYPE_NONE);
> > +if (rte_flow_destroy(dev->port_id, dev->rx_steer_flows[i], 
> > )) {
> > +VLOG_DBG("%s: rx-steering: failed to destroy flow: %s",
>
> Maybe a WARN ?

Will change that.

> So, this function destroys all the flows, OK.
> But we also need to restore the redirection table, right?  Or is it handled
> somehow in the driver when number of queues changed?  From my experience,
> drivers tend to not restore this kind of configuration even on device detach
> and it gets preserved even across OVS restarts.

>From what I saw during testing, the RETA is reset every time the device
is reset. In the first versions of this patch, I did reset the table but
I seem to remember some discussions with Kevin and/or David where we
concluded that it was redundant. I will check again to make sure before
sending a respin.

Thanks for the review.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v12] netdev-dpdk: Add custom rx-steering configuration.

2023-06-29 Thread Ilya Maximets
On 6/21/23 21:24, Robin Jarry wrote:
> Some control protocols are used to maintain link status between
> forwarding engines (e.g. LACP). When the system is not sized properly,
> the PMD threads may not be able to process all incoming traffic from the
> configured Rx queues. When a signaling packet of such protocols is
> dropped, it can cause link flapping, worsening the situation.
> 
> Use the RTE flow API to redirect these protocols into a dedicated Rx
> queue. The assumption is made that the ratio between control protocol
> traffic and user data traffic is very low and thus this dedicated Rx
> queue will never get full. Re-program the RSS redirection table to only
> use the other Rx queues.
> 
> The additional Rx queue will be assigned a PMD core like any other Rx
> queue. Polling that extra queue may introduce increased latency and
> a slight performance penalty at the benefit of preventing link flapping.
> 
> This feature must be enabled per port on specific protocols via the
> rx-steering option. This option takes "rss" followed by a "+" separated
> list of protocol names. It is only supported on ethernet ports. This
> feature is experimental.
> 
> If the user has already configured multiple Rx queues on the port, an
> additional one will be allocated for control packets. If the hardware
> cannot satisfy the number of requested Rx queues, the last Rx queue will
> be assigned for control plane. If only one Rx queue is available, the
> rx-steering feature will be disabled. If the hardware does not support
> the RTE flow matchers/actions, the rx-steering feature will be
> completely disabled on the port.
> 
> It cannot be enabled when other_config:hw-offload=true as it may
> conflict with the offloaded RTE flow. Similarly, if hw-offload is> enabled, 
> custom rx-steering will be forcibly disabled on all ports.
> 
> Example use:
> 
>  ovs-vsctl add-bond br-phy bond0 phy0 phy1 -- \
>set interface phy0 type=dpdk options:dpdk-devargs=:ca:00.0 -- \
>set interface phy0 options:rx-steering=rss+lacp -- \
>set interface phy1 type=dpdk options:dpdk-devargs=:ca:00.1 -- \
>set interface phy1 options:rx-steering=rss+lacp
> 
> As a starting point, only one protocol is supported: LACP. Other
> protocols can be added in the future. NIC compatibility should be
> checked.
> 
> To validate that this works as intended, I used a traffic generator to
> generate random traffic slightly above the machine capacity at line rate
> on a two ports bond interface. OVS is configured to receive traffic on
> two VLANs and pop/push them in a br-int bridge based on tags set on
> patch ports.
> 
>+--+
>| DUT  |
>|++|
>||   br-int   || 
> in_port=patch10,actions=mod_dl_src:$patch11,mod_dl_dst:$tgen1,output:patch11
>|||| 
> in_port=patch11,actions=mod_dl_src:$patch10,mod_dl_dst:$tgen0,output:patch10
>|| patch10patch11 ||
>|+---|---|+|
>||   | |
>|+---|---|+|
>|| patch00patch01 ||
>||  tag:10tag:20  ||
>||||
>||   br-phy   || default flow, action=NORMAL
>||||
>||   bond0|| balance-slb, lacp=passive, lacp-time=fast
>||phy0   phy1 ||
>|+--|-|---+|
>+---|-|+
>| |
>+---|-|+
>| port0  port1 | balance L3/L4, lacp=active, lacp-time=fast
>| lag  | mode trunk VLANs 10, 20
>|  |
>|switch|
>|  |
>|  vlan 10vlan 20  |  mode access
>|   port2  port3   |
>+-|--|-+
>  |  |
>+-|--|-+
>|   tgen0  tgen1   |  Random traffic that is properly balanced
>|  |  across the bond ports in both directions.
>|  traffic generator   |
>+--+
> 
> Without rx-steering, the bond0 links are randomly switching to
> "defaulted" when one of the LACP packets sent by the switch is dropped
> because the RX queues are full and the PMD threads did not process them
> fast enough. When that happens, all traffic must go through a single
> link which causes above line rate traffic to be dropped.
> 
>  ~# ovs-appctl lacp/show-stats bond0
>   bond0 statistics 
>  member: phy0:
>TX PDUs: 347246
>RX PDUs: 14865
>RX Bad PDUs: 0
>RX Marker Request PDUs: 0
>Link Expired: 168
>Link Defaulted: 0
>Carrier Status Changed: 0
>  member: phy1:
>TX PDUs: 347245
>RX PDUs: 14919
>RX Bad PDUs: 0
>RX Marker Request PDUs: 0
>Link Expired: 147
>Link Defaulted: 1
>Carrier Status Changed: 0
> 
> When rx-steering is enabled, no LACP packet is dropped and the bond
> links remain enabled at all times, maximizing the throughput. Neither
> the "Link Expired" nor the 

Re: [ovs-dev] [PATCH v12] netdev-dpdk: Add custom rx-steering configuration.

2023-06-22 Thread Kevin Traynor

On 22/06/2023 14:06, Kevin Traynor wrote:

On 21/06/2023 20:24, Robin Jarry wrote:

Some control protocols are used to maintain link status between
forwarding engines (e.g. LACP). When the system is not sized properly,
the PMD threads may not be able to process all incoming traffic from the
configured Rx queues. When a signaling packet of such protocols is
dropped, it can cause link flapping, worsening the situation.

Use the RTE flow API to redirect these protocols into a dedicated Rx
queue. The assumption is made that the ratio between control protocol
traffic and user data traffic is very low and thus this dedicated Rx
queue will never get full. Re-program the RSS redirection table to only
use the other Rx queues.

The additional Rx queue will be assigned a PMD core like any other Rx
queue. Polling that extra queue may introduce increased latency and
a slight performance penalty at the benefit of preventing link flapping.

This feature must be enabled per port on specific protocols via the
rx-steering option. This option takes "rss" followed by a "+" separated
list of protocol names. It is only supported on ethernet ports. This
feature is experimental.

If the user has already configured multiple Rx queues on the port, an
additional one will be allocated for control packets. If the hardware
cannot satisfy the number of requested Rx queues, the last Rx queue will
be assigned for control plane. If only one Rx queue is available, the
rx-steering feature will be disabled. If the hardware does not support
the RTE flow matchers/actions, the rx-steering feature will be
completely disabled on the port.

It cannot be enabled when other_config:hw-offload=true as it may
conflict with the offloaded RTE flows. Similarly, if hw-offload is
enabled, custom rx-steering will be forcibly disabled on all ports.

Example use:

   ovs-vsctl add-bond br-phy bond0 phy0 phy1 -- \
 set interface phy0 type=dpdk options:dpdk-devargs=:ca:00.0 -- \
 set interface phy0 options:rx-steering=rss+lacp -- \
 set interface phy1 type=dpdk options:dpdk-devargs=:ca:00.1 -- \
 set interface phy1 options:rx-steering=rss+lacp

As a starting point, only one protocol is supported: LACP. Other
protocols can be added in the future. NIC compatibility should be
checked.

To validate that this works as intended, I used a traffic generator to
generate random traffic slightly above the machine capacity at line rate
on a two ports bond interface. OVS is configured to receive traffic on
two VLANs and pop/push them in a br-int bridge based on tags set on
patch ports.

 +--+
 | DUT  |
 |++|
 ||   br-int   || 
in_port=patch10,actions=mod_dl_src:$patch11,mod_dl_dst:$tgen1,output:patch11
 |||| 
in_port=patch11,actions=mod_dl_src:$patch10,mod_dl_dst:$tgen0,output:patch10
 || patch10patch11 ||
 |+---|---|+|
 ||   | |
 |+---|---|+|
 || patch00patch01 ||
 ||  tag:10tag:20  ||
 ||||
 ||   br-phy   || default flow, action=NORMAL
 ||||
 ||   bond0|| balance-slb, lacp=passive, lacp-time=fast
 ||phy0   phy1 ||
 |+--|-|---+|
 +---|-|+
 | |
 +---|-|+
 | port0  port1 | balance L3/L4, lacp=active, lacp-time=fast
 | lag  | mode trunk VLANs 10, 20
 |  |
 |switch|
 |  |
 |  vlan 10vlan 20  |  mode access
 |   port2  port3   |
 +-|--|-+
   |  |
 +-|--|-+
 |   tgen0  tgen1   |  Random traffic that is properly balanced
 |  |  across the bond ports in both directions.
 |  traffic generator   |
 +--+

Without rx-steering, the bond0 links are randomly switching to
"defaulted" when one of the LACP packets sent by the switch is dropped
because the RX queues are full and the PMD threads did not process them
fast enough. When that happens, all traffic must go through a single
link which causes above line rate traffic to be dropped.

   ~# ovs-appctl lacp/show-stats bond0
    bond0 statistics 
   member: phy0:
 TX PDUs: 347246
 RX PDUs: 14865
 RX Bad PDUs: 0
 RX Marker Request PDUs: 0
 Link Expired: 168
 Link Defaulted: 0
 Carrier Status Changed: 0
   member: phy1:
 TX PDUs: 347245
 RX PDUs: 14919
 RX Bad PDUs: 0
 RX Marker Request PDUs: 0
 Link Expired: 147
 Link Defaulted: 1
 Carrier Status Changed: 0

When rx-steering is enabled, no LACP packet is dropped and the bond
links remain enabled at all times, maximizing the throughput. Neither
the "Link Expired" nor the "Link Defaulted" counters are incremented
anymore.

This feature may be 

Re: [ovs-dev] [PATCH v12] netdev-dpdk: Add custom rx-steering configuration.

2023-06-22 Thread Robin Jarry
Hi Kevin,

Kevin Traynor, Jun 22, 2023 at 15:06:
> Hi Robin,
>
> As discussed offline, the issue is fixed in v12. In the case where 
> offload has failed on device A, and device B triggers set_config the 
> set_config for device B will see that device B has the correct number of 
> queues (user_n_rxq), so there will not be a reconfigure of *that* 
> device. I tested this by faking some failed offloads with gdb.
>
> If a reconfigure for device A does occur for some other reason, then it 
> will attempt to apply the config again, (presumably) the offload will 
> fail again and the normal retry occurs without offload. This seems the 
> correct behaviour to me.

Somehow I did assume that netdev_dpdk_reconfigure() would be called for
all ports regardless if their configuration had changed. Good to know
that the problem is now fixed.

> For v12, I also tested some basic increasing decreasing rxqs, enabling 
> hwol and it's working as expected.
>
> thanks,
> Kevin.
>
> Acked-by: Kevin Traynor 

Thanks a lot!

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v12] netdev-dpdk: Add custom rx-steering configuration.

2023-06-22 Thread Kevin Traynor

On 21/06/2023 20:24, Robin Jarry wrote:

Some control protocols are used to maintain link status between
forwarding engines (e.g. LACP). When the system is not sized properly,
the PMD threads may not be able to process all incoming traffic from the
configured Rx queues. When a signaling packet of such protocols is
dropped, it can cause link flapping, worsening the situation.

Use the RTE flow API to redirect these protocols into a dedicated Rx
queue. The assumption is made that the ratio between control protocol
traffic and user data traffic is very low and thus this dedicated Rx
queue will never get full. Re-program the RSS redirection table to only
use the other Rx queues.

The additional Rx queue will be assigned a PMD core like any other Rx
queue. Polling that extra queue may introduce increased latency and
a slight performance penalty at the benefit of preventing link flapping.

This feature must be enabled per port on specific protocols via the
rx-steering option. This option takes "rss" followed by a "+" separated
list of protocol names. It is only supported on ethernet ports. This
feature is experimental.

If the user has already configured multiple Rx queues on the port, an
additional one will be allocated for control packets. If the hardware
cannot satisfy the number of requested Rx queues, the last Rx queue will
be assigned for control plane. If only one Rx queue is available, the
rx-steering feature will be disabled. If the hardware does not support
the RTE flow matchers/actions, the rx-steering feature will be
completely disabled on the port.

It cannot be enabled when other_config:hw-offload=true as it may
conflict with the offloaded RTE flows. Similarly, if hw-offload is
enabled, custom rx-steering will be forcibly disabled on all ports.

Example use:

  ovs-vsctl add-bond br-phy bond0 phy0 phy1 -- \
set interface phy0 type=dpdk options:dpdk-devargs=:ca:00.0 -- \
set interface phy0 options:rx-steering=rss+lacp -- \
set interface phy1 type=dpdk options:dpdk-devargs=:ca:00.1 -- \
set interface phy1 options:rx-steering=rss+lacp

As a starting point, only one protocol is supported: LACP. Other
protocols can be added in the future. NIC compatibility should be
checked.

To validate that this works as intended, I used a traffic generator to
generate random traffic slightly above the machine capacity at line rate
on a two ports bond interface. OVS is configured to receive traffic on
two VLANs and pop/push them in a br-int bridge based on tags set on
patch ports.

+--+
| DUT  |
|++|
||   br-int   || 
in_port=patch10,actions=mod_dl_src:$patch11,mod_dl_dst:$tgen1,output:patch11
|||| 
in_port=patch11,actions=mod_dl_src:$patch10,mod_dl_dst:$tgen0,output:patch10
|| patch10patch11 ||
|+---|---|+|
||   | |
|+---|---|+|
|| patch00patch01 ||
||  tag:10tag:20  ||
||||
||   br-phy   || default flow, action=NORMAL
||||
||   bond0|| balance-slb, lacp=passive, lacp-time=fast
||phy0   phy1 ||
|+--|-|---+|
+---|-|+
| |
+---|-|+
| port0  port1 | balance L3/L4, lacp=active, lacp-time=fast
| lag  | mode trunk VLANs 10, 20
|  |
|switch|
|  |
|  vlan 10vlan 20  |  mode access
|   port2  port3   |
+-|--|-+
  |  |
+-|--|-+
|   tgen0  tgen1   |  Random traffic that is properly balanced
|  |  across the bond ports in both directions.
|  traffic generator   |
+--+

Without rx-steering, the bond0 links are randomly switching to
"defaulted" when one of the LACP packets sent by the switch is dropped
because the RX queues are full and the PMD threads did not process them
fast enough. When that happens, all traffic must go through a single
link which causes above line rate traffic to be dropped.

  ~# ovs-appctl lacp/show-stats bond0
   bond0 statistics 
  member: phy0:
TX PDUs: 347246
RX PDUs: 14865
RX Bad PDUs: 0
RX Marker Request PDUs: 0
Link Expired: 168
Link Defaulted: 0
Carrier Status Changed: 0
  member: phy1:
TX PDUs: 347245
RX PDUs: 14919
RX Bad PDUs: 0
RX Marker Request PDUs: 0
Link Expired: 147
Link Defaulted: 1
Carrier Status Changed: 0

When rx-steering is enabled, no LACP packet is dropped and the bond
links remain enabled at all times, maximizing the throughput. Neither
the "Link Expired" nor the "Link Defaulted" counters are incremented
anymore.

This feature may be considered as "QoS". However, it does not work by
limiting the rate of traffic explicitly. It only