This is the userspace counterpart of the work being done in the kernel [1]. Sending it as RFC to get some early feedback on the overall solution.
** Problem description ** Currently, OVS supports several observability features, such as per-bridge IPFIX, per-flow IPFIX and sFlow. However, given the complexity of OVN-generated pipelines, a single sample is typically not enough to troubleshoot what is OVN/OVS is doing with the packet. We need highler level metadata alongside the packet sample. This can be achieved by the means of per-flow IPFIX sampling and NXAST_SAMPLE action, through which OVN can add two arbitrary 32-bit values (observation_domain_id and observation_point_id) that are sent alongside the packet information in the IPFIX frames. However, there is a fundamental limitation of the existing sampling infrastructure, specially in the kernel datapath: samples are handled by ovs-vswitchd, forcing the need for an upcall (userspace action). The fact that samples share the same netlink sockets and handler thread cpu time with actual packet misse, can easily cause packet drops making this solution unfit for use in highly loaded production systems. Users are left with little option than guessing what sampling rate will be OK for their traffic pattern and dealing with the lost accuracy. ** Feature Description ** In order to solve this situation and enable this feature to be safely turned on in production, this RFC uses the psample support proposed in [1] to send NXAST_SAMPLE samples to psample adding the observability domain and point information as metadata. ~~ API ~~ The API is simply a new field called "psample_group" in the Flow_Sample_Collector_Set (FSCS) table. Configuring this value is orthogonal to also associating the FSCS entry to an entry in the IPFIX table. Apart from that configuration, the controller needs to add NXAST_SAMPLE actions that refer the entry's id. ~~ HW Offload ~~ psample is already supported by tc using the act_sample action. The metadata is taken from the actions' cookie. This series also adds support for offloading the odp sample action, only when it includes psample information but not nested actions. A bit of special care has to be taken in the tc layer to not store the ufid in the sample's cookie as it's used to carry action-specific data. ~~ Datapath support ~~ This new behavior of the datapth sample action is only supported on the kernel datapath. A more detailed analysis is needed (and planned) to find a way to also optimize the userspace datapath. However, although potentially inefficient, there is not that much risk of dropping packets with the existing IPFIX infrastructure. ~~ Testing ~~ The series includes an utility program called "ovs-psample" that listens to packets multicasted by the psample module and prints them (also printing the obs_domain and obs_point ids). In addition the kernel series includes a tracepoint for easy testing and troubleshooting. [1] https://patchwork.kernel.org/project/netdevbpf/list/?series=847473 Adrian Moreno (11): odp-util: Add support to new psample uAPI. ofproto_dpif: Check for psample support. ofproto: Add ofproto-dpif-psample. vswitchd: Add psample to schema and configure it. ofproto_dpif_xlate: Use psample for OFP samples. utilities: Add ovs-psample. netlink-offload-tc: Rename act_cookie->flow_cookie. netdev-offload-tc: Add sample support. ofproto-dpif-xlate-cache: Add psample to xcache. ofproto-dpif-psample: Add command to show stats. tests: Add test for sample offloading. Documentation/automake.mk | 1 + Documentation/conf.py | 2 + Documentation/ref/index.rst | 1 + Documentation/ref/ovs-psample.8.rst | 47 ++++ include/linux/automake.mk | 6 +- include/linux/openvswitch.h | 49 ++++- include/linux/pkt_cls.h | 2 + include/linux/psample.h | 64 ++++++ include/linux/tc_act/tc_sample.h | 26 +++ lib/netdev-offload-tc.c | 75 ++++++- lib/odp-execute.c | 3 + lib/odp-util.c | 150 +++++++++---- lib/tc.c | 98 ++++++++- lib/tc.h | 9 +- ofproto/automake.mk | 2 + ofproto/ofproto-dpif-ipfix.c | 2 + ofproto/ofproto-dpif-psample.c | 266 ++++++++++++++++++++++ ofproto/ofproto-dpif-psample.h | 39 ++++ ofproto/ofproto-dpif-xlate-cache.c | 11 +- ofproto/ofproto-dpif-xlate-cache.h | 6 + ofproto/ofproto-dpif-xlate.c | 158 +++++++++---- ofproto/ofproto-dpif-xlate.h | 3 +- ofproto/ofproto-dpif.c | 96 +++++++- ofproto/ofproto-dpif.h | 7 +- ofproto/ofproto-provider.h | 9 + ofproto/ofproto.c | 10 + ofproto/ofproto.h | 8 + python/ovs/flow/odp.py | 2 + rhel/openvswitch-fedora.spec.in | 2 + rhel/openvswitch.spec.in | 1 + tests/odp.at | 5 + tests/system-common-macros.at | 4 + tests/system-offloads-traffic.at | 53 +++++ utilities/automake.mk | 9 + utilities/ovs-psample.c | 330 ++++++++++++++++++++++++++++ vswitchd/bridge.c | 54 ++++- vswitchd/vswitch.ovsschema | 7 +- vswitchd/vswitch.xml | 32 ++- 38 files changed, 1533 insertions(+), 116 deletions(-) create mode 100644 Documentation/ref/ovs-psample.8.rst create mode 100644 include/linux/psample.h create mode 100644 include/linux/tc_act/tc_sample.h create mode 100644 ofproto/ofproto-dpif-psample.c create mode 100644 ofproto/ofproto-dpif-psample.h create mode 100644 utilities/ovs-psample.c -- 2.44.0 _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev