On 3/7/24 16:18, Adrian Moreno wrote:
> ** Background **
> Currently, OVS supports several packet sampling mechanisms (sFlow,
> per-bridge IPFIX, per-flow IPFIX). These end up being translated into a
> userspace action that needs to be handled by ovs-vswitchd's handler
> threads only to be forwarded to some third party application that
> will somehow process the sample and provide observability on the
> datapath.
> 
> The fact that sampled traffic share netlink sockets and handler thread
> time with upcalls, apart from being a performance bottleneck in the
> sample extraction itself, can severely compromise the datapath,
> yielding this solution unfit for highly loaded production systems.
> 
> Users are left with little options other than guessing what sampling
> rate will be OK for their traffic pattern and system load and dealing
> with the lost accuracy.
> 
> ** Proposal **
> In this RFC, I'd like to request feedback on an attempt to fix this
> situation by adding a flag to the userspace action to indicate the
> upcall should be sent to a netlink multicast group instead of unicasted
> to ovs-vswitchd.
> 
> This would allow for other processes to read samples directly, freeing
> the netlink sockets and handler threads to process packet upcalls.
> 
> ** Notes on tc-offloading **
> I am aware of the efforts being made to offload the sample action with
> the help of psample.
> I did consider using psample to multicast the samples. However, I
> found a limitation that I'd like to discuss:
> I would like to support OVN-driven per-flow (IPFIX) sampling because
> it allows OVN to insert two 32-bit values (obs_domain_id and
> ovs_point_id) that can be used to enrich the sample with "high level"
> controller metadata (see debug_drop_domain_id NBDB option in ovn-nb(5)).
> 
> The existing fields in psample_metadata are not enough to carry this
> information. Would it be possible to extend this struct to make room for
> some extra "application-specific" metadata?
> 
> ** Alternatives **
> An alternative approach that I'm considering (apart from using psample
> as explained above) is to use a brand-new action. This lead to a cleaner
> separation of concerns with existing userspace action (used for slow
> paths and OFP_CONTROLLER actions) and cleaner statistics.
> Also, ovs-vswitchd could more easily make the layout of this
> new userdata part of the public API, allowing third party sample
> collectors to decode it.
> 
> I am currently exploring this alternative but wanted to send the RFC to
> get some early feedback, guidance or ideas.


Hi, Adrian.  Thanks for the patches!

Though I'm not sure if broadcasting is generally the best approach.
These messages contain opaque information that is not actually
parsable by any other entity than a process that created the action.
And I don't think the structure of these opaque fields should become
part of uAPI in neither kernel nor OVS in userspace.

The userspace() action already has a OVS_USERSPACE_ATTR_PID argument.
And it is not actually used when OVS_DP_F_DISPATCH_UPCALL_PER_CPU is
enabled.  All known users of OVS_DP_F_DISPATCH_UPCALL_PER_CPU are
setting the OVS_USERSPACE_ATTR_PID to UINT32_MAX, which is not a pid
that kernel could generate.

So, with a minimal and pretty much backward compatible change in
output_userspace() function, we can honor OVS_USERSPACE_ATTR_PID if
it's not U32_MAX.  This way userspace process can open a separate
socket and configure sampling to redirect all packets there while
normal MISS upcalls would still arrive to per-cpu sockets.  This
should cover the performance concern.

For the case without per-cpu dispatch, the feature comes for free
if userspace application wants to use it.  However, there is no
currently supported version of OVS that doesn't use per-cpu dispatch
when available.

What do you think?

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to