Initial discussion on the OVS-DPDK RFC approach.
We will be discussing about the following points in the meeting today.


Regards
_Sugesh

From: Finn Christensen [mailto:f...@napatech.com]
Sent: Friday, January 26, 2018 1:41 PM
To: Chandran, Sugesh <sugesh.chand...@intel.com>; Loftus, Ciara 
<ciara.lof...@intel.com>; Doherty, Declan <declan.dohe...@intel.com>
Subject: RE: OVS-DPDK full offload RFC proposal discussion

Thanks Sugesh,

See my comments below.

I'll be on the conf call on Monday.

Regards,
Finn


From: Chandran, Sugesh [mailto:sugesh.chand...@intel.com]
Sent: 25. januar 2018 21:33
To: Finn Christensen <f...@napatech.com<mailto:f...@napatech.com>>; Loftus, 
Ciara <ciara.lof...@intel.com<mailto:ciara.lof...@intel.com>>; Doherty, Declan 
<declan.dohe...@intel.com<mailto:declan.dohe...@intel.com>>
Subject: RE: OVS-DPDK full offload RFC proposal discussion

Hi Finn,

Once again thank you for putting these up.
Please find my comments inline below.


Regards
_Sugesh

From: Finn Christensen [mailto:f...@napatech.com]
Sent: Tuesday, January 23, 2018 11:42 AM
To: Chandran, Sugesh 
<sugesh.chand...@intel.com<mailto:sugesh.chand...@intel.com>>; Loftus, Ciara 
<ciara.lof...@intel.com<mailto:ciara.lof...@intel.com>>; Doherty, Declan 
<declan.dohe...@intel.com<mailto:declan.dohe...@intel.com>>
Subject: OVS-DPDK full offload RFC proposal discussion

Hi Sugesh,

My apology for not sending this earlier.
As discussed in meeting, I here send you a semi-detailed description of how we 
see the next step towards OVS-DPDK hw full offload.
Please add all the Intel people who wants to participate in this email thread.


Proposal: OVS changes for full offload, as an addition to the partial offload 
currently proposed.

Generally let the hw-offloaded flow match+action functionality be a slave of 
the megaflow cache. Let it be seamlessly offloaded when applicable (when all 
flow actions are in the range of supported actions implemented). Otherwise 
failover to partial offload, and if no success, normal SW switching will be 
used.

1)      Handle OUTPUT action:
Map odp_port_no to DPDK port_id, so that an OVS_ACTION_ATTR_OUTPUT may be 
converted into a netdev_dpdk device known port_id. If the port is not found in 
dpdk_list, or the specific dpdk device does not handle hw-offloading, do 
partial-offload (don't use actions besides the partial-offload added MARK and 
RSS).
Multiple OUTPUT actions may be specified (in case of flooding), then don't full 
offload.
a.      Register ODP port number in netdev_dpdk on DPDK_DEV_ETH instances (put 
odp_port_no in netdev_dpdk structure).
b.      In netdev_dpdk_add_rte_flow_offload() function, catch 
OVS_ACTION_ATTR_OUTPUT and find the dpdk_dev from dpdk_list of which matches 
its odp_port_no. Then setup a RTE_FLOW_ACTION_TYPE_ETHDEV_PORT containing DPDK 
port_id for target port.
[Sugesh] Yes, that make sense. Do you think the representor port can also be 
defined as normal DPDK ports? We are experiencing some difficulties when trying 
to overload the same DPDK port for representor ports/accelerated ports. More 
comments below.
[Finn] Yes. But you are right, if you need special OVSDB settings to configure 
a vport, then you will need a new DPDK type. However, initially, we do not 
necessarily need this. I do not see this as a huge issue, and if we need it I 
think we can add that also to the patchset.

2)      Handle statistics:
Separate registration/mapping of partial offloaded flows and full offloaded 
flows and query statistics from full offloaded flows with a specific interval, 
updating the userspace datapath megaflow cache with these statistics. Done 
using rte_flow_query. This includes packet count (hits), bytes and seen 
tcp_flags.
a.      When a full offloaded flow has been successfully added, then add that 
rte_flow to a separate hw-offload map, containing only full-offloaded-flows.
[Sugesh] Yes. We also following the same method
b.      Add the RTE_FLOW_ACTION_TYPE_COUNT to the full-offloaded flows, so that 
statistics may be retrieved later, for that rte_flow.
[Sugesh] Make sense.
c.      Add a timed task to the hw-offload-thread, so that all full-offloaded 
flows can be stat-queried using rte_flow_query() function. Retreived with an 
interval of maybe 1 or 2 seconds. Call dp_netdev_flow_used with result.
[Sugesh]Ok, so we might need to use the stats in revalidator to expire the 
flows?
Just a note, some hardware may able to evict the flows by itself after the 
idle-timeout. The rte_flow_query logic should account that as well when polling 
the stats.
[Finn] Yes, good point. Let the flow_query also indicate if a flow has been 
canceled, and remove it accordingly in flow map.

d.      tcp_flags should be retrieved by rte_flow_query() also. This will need 
an extension to the current rte_flow_query_count structure.
[Sugesh] Ok
e.      Use the flow_get function in the DPDK_FLOW_OFFLOAD_API to implement the 
rte_flow_query call and convert format to dpif_flow_stats.
[Sugesh] Yes.

3)      OVS with hw-offloaded virtual ports:
NIC virtual ports (VFs or PMD local queue(s)), should not be specifically known 
by OVS, other than through a "normal" dpdk port (type=dpdk). This port is then 
a representor port, like any other phy port.
I know Intel has proposed an representor port proposal to DPDK, but IMO, how a 
PMD registers virtual ports should be a DPDK PMD implementation matter and 
therefore should be ignorant to OVS and thus not part of this RFC proposal.
[Sugesh] This is bit of tricky, because its very likely that the type/mode of 
virtual port is solely depends on the hardware, it can be VFs, vhost ports or 
anything else. Representing these ports as normal DPDK port in OVS is bit 
confusing.
Also we need to look at how these ports are being managed from an orchestrator, 
how it is differ from normal port.
[Finn] Well, the orchestrator should definitely know about these ports being 
virtual hw ports, and that should be possible to orchestrate these resources 
accordingly, but does OVS need to know? If so, it's fine with me. We can 
discuss this issue further on the conf call.

This proposal needs a few extensions to DPDK RTE FLOW api.
a.      A RTE_FLOW_ACTION_TYPE_ETHDEV_PORT  - to specify a target port (as 
proposed earlier by Intel on DPDK ML)
[Sugesh] There is already a work going on for OUTPUT action.
b.      Extend rte_flow_query to be able to add tcp_flags and potentially 
handle bulk requests.
[Sugesh] need to work on this. Also need to think about handling it when 
hardware cannot support the query.
[Finn] If tcp_flags are not in stats, then the question is if it still will be 
valid to fully offload. Anybody has a comment on that? - should that be known 
to an orchestrator?

Furthermore, it is based on a trial and error approach without specific device 
capability knowledge. Mostly because there are no device capability feature 
available yet. I think it can be added later as a separate issue, not 
necessarily hard bound to the full offload functionality.
[Sugesh] Totally agree with that.

I thought this was a good level of detail to start out with.
[Sugesh] I am setting up a call coming Monday(The meeting invite follows). Hope 
you can make it. So that we can discuss in detail about these points.


Thanks,
Finn

Disclaimer: This email and any files transmitted with it may contain 
confidential information intended for the addressee(s) only. The information is 
not to be surrendered or copied to unauthorized persons. If you have received 
this communication in error, please notify the sender immediately and delete 
this e-mail from your system.
Disclaimer: This email and any files transmitted with it may contain 
confidential information intended for the addressee(s) only. The information is 
not to be surrendered or copied to unauthorized persons. If you have received 
this communication in error, please notify the sender immediately and delete 
this e-mail from your system.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to