On Fri, Mar 06, 2026 at 01:41:53PM +0100, Ilya Maximets wrote:
> On 2/18/26 11:37 AM, Adrian Moreno via dev wrote:
> > ofproto/trace is one of the most useful debugging tools OVS provides.
> > However, it's "offline" nature comes with limitations:
> > - Users need to know exactly what their packets look like
> > - Runtime information such as conntrack states has to be guessed
> >
> > This RFC introduces the idea of upcall (live) tracing. In a nutshell,
> > the idea is:
> > - The user activates upcall tracing by specifying an openflow filter
> > - When packet is upcalled, OVS checks if it matches the filter and if it
> >   does, it collects traces from the xlate layer (same traces as
> >   ofproto/trace emits).
> > - If a frozen state is created from this upcall, an ID is stored in it
> >   so if a subsequent upcall resumes the frozen state, it inherits its
> >   ID, the "trace_id".
> > - Traces are stored in some kind of ring-buffer or fixed-size list
> >   arranged by these tracing IDs.
> > - Traces are accessed (printed) by the user after the experiment has
> >   ended.
> >
> > The code in this RFC is in early state but it can be used to play
> > around. I'm sending it out early to get some feedback.
> > The following topics are not clear to me at the moment:
> >
> > - Naming ana layering: I have called the thing "upcall-tracing", and
> >   unixctl commands are "upcall/trace/{create,list,get,show}". I tried to
> >   differentiate from "ofproto/trace" but maybe this is not very
> >   intuitive? A (kind of crazy) idea that I have as a followup is to
> >   persist the trace_id into the udpif_key so we can also trace
> >   revalidations associated with an upcall. In such scenario,
> >   "upcall/trace" might fall semantically short.
> >
> > - ofproto vs dpif: Connected to the previous topic. The tracing
> >   infrastructure is bound to the "udpif", the upcall engine, which is
> >   part of the datapath (dpif), not the bridge. However, ofproto flows
> >   are easier to write and more familiar to users so from their PoV,
> >   specifying a bridge and a ofp filter is nicer. Writing
> >   "upcall/trace/create br0 in_port=myport,ip" and then visualizing a
> >   trace associated with `system@ovs-system` feels weird.
> >
> > - Trace ID != Packet ID: The RFC generates a trace_id for each new
> >   upcall that matches the filter and persistes the trace_id inside the
> >   frozen_state. If the same packet gets recirculated and upcalled, it
> >   will (nicely) inherit the same trace_id and be grouped together. We
> >   see all the recirculation rounds of the same packet (same as with
> >   ofproto/trace but for real!). BUT, frozen_states are shared. If
> >   another packet hits it, it will also inherit that ID and be printed
> >   alongside the first. Is this good? bad? acceptable? Do we need to
> >   the packet's metadata instead of the frozen_state to persist this
> >   trace_id?
> >
> > - Another idea that I considered is binding the tracing to a specific
> >   port, i.e: "upcall/trace/create br0 p1 {ofp_filter}". Although this
> >   would deviate from the ofproto/trace syntax, it would make it easier
> >   to avoid an extra flow match on traffic that is not from the
> >   "port-under-test".
> >
> > Of course any other feedback is hugely appreciated.
> >
> > This RFC contains the core feature but lacks and some configurability
> > that might be interesting for the actual patch. Additionally, I would
> > like to measure the performance impact of enabling it in a loaded
> > system.
> >
>
> Hi, Adrian.  Thanks for the set!  It's definitely an interesting idea.
>
> I'm a little on the edge about adding a pile of new appctl APIs for yet
> another tracing mechanism.  It might be better if we can incorporate
> this functionality into what we already have, enhancing things people
> are already using and familiar with.
>
> We have today ofproto/trace and ofproto/detrace that retis is using, for
> example, to get something close to the trace, but limited in what it can
> report.  So, I was thinking if we could just enhance the detrace output
> with all the actual trace details.  We could do that relatively easily
> by creating a new type of the cache entry (XC_TRACE) and make the
> xlate_report() create those with the tracing type/text and maybe some
> way of tracking the nesting level.  Then both the tracing code and the
> detrace could construct the same output from the cache.

Thanks for the feedback, Ilya. Interesting idea indeed!

Currently xcache is populated by revalidators, not handlers.
Should we start doing it in upcalls as well? Alternatively, should
upcall handlers only mark the udpif_key for tracing and let the
revalidators populate the traces? I guess the xlate result of the first
revalidation should be the same as the one that took place during upcall...

I guess there can still be (edge?) cases where an upcall does not
generate a flow installation, and therefore there is no xcache.

Another complication could that lifetime of a trace would be bound to the
lifetime of the datapath flow, so if the users is not fast enough to run
the appropiate "ovs-appctl" command, traces will be lost.

>
> This has a few advantages:
>
> - Not only upcall tracing, revalidators will populate the cache if user
>   turns on the tracing, and clean it up when disabled.
>
> - It's not a new API, just a couple more knobs (just one?) for the
>   existing one, so retis could just get the benefits right away.

Consuming traces in ofproto/detrace felt a bit weird because
traces and the current output of ofproto/detrace are kind of
similar, in the sense that they are not complementary, one contains the
other. Dumping both could be kind of redundant. But maybe we could only
dump the traces if available.

>
> - We can still filter, if needed, or just populate traces for everything.
>
> - The code sharing between the trace and detrace sounds nice. :)
>
> Some disadvantages:
>
> - Tracing everything may be expensive, but filter (either the full packet
>   filter or the input port filter) can solve this the same as in the
>   current implementation.  And we don't need to store extra data per
>   packet in the additional buffer, just one trace per datapath flow,
>   updated when there are changes in the pipeline.
>
> - Filters may be tricky in a way that revalidating a post-recirculation
>   flow with a filter on pre-recirculation flow will still require having
>   a flag or something in the frozen state (Does it need to be a unique
>   id?  A boolean flag 'trace_this' may be enough.).  But that's also not
>   much different from the current implementation.

The unique id helps stitching the traces (or dp flows) together into
coherent recirculation rounds. I think this is an interesting feature
that brings clarity to OVS datapath.

So maybe we could think of a way to do this linking between the
udpif_keys? Maybe store the ufid in the frozen state and somehow link
a udpif_key to the "next" one?

>
> - Not a full trace.  detrace works per datapath flow, so in order to get
>   the full trace one will need to detrace all relevant datapath flows.
>   However, I suspect the primary user may be retis and it already tracks
>   the packet, so it will know which datapath flows to detrace.
>

I was thinking of this as a complement to ofproto/trace that users could
use without retis as well.

In fact, retis won't see the datapath flow on the first packet, which is
precisely the "gap" I was trying to fill here.

Another thought:
I did consider extending ofproto/trace command but I found it difficult
because of the asynchronous nature of upcall tracing, i.e: a potential
"ofproto/trace --live {filter}" would have to block until the
interesting upcall actually happens. Since this is an appctl command,
it would require blocking the main thread or implementing asynchronous
sessions in the appctl subsystem.
Another approach would be to make ofproto/{de}trace a standalone
utility, i.e: "ovs-trace" that can consume existing or new appctl
commands.

Do you have a concrete example of how the ofproto/detrace API would
look like?

Thanks.
Adrián

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to