I've recently been using GPUVis to look at trace events. On Intel platforms, GPUVis incorporates ftrace events from the i915 driver, performance metrics from igt-gpu-tools, and userspace ftrace markers that I locally hack up in Mesa.
It is very easy to compile the GPUVis UI. Userspace instrumentation requires a single C/C++ header. You don't have to access an external web service to analyze trace data (a big no-no for devs working on preproduction hardware). Is it possible to build and run the Perfetto UI locally? Can it display arbitrary trace events that are written to /sys/kernel/tracing/trace_marker ? Can it be extended to show i915 and i915-perf-recorder events? John Bates <jba...@chromium.org> writes: > I recently opened issue 4262 > <https://gitlab.freedesktop.org/mesa/mesa/-/issues/4262> to begin the > discussion on integrating perfetto into mesa. > > *Background* > > System-wide tracing is an invaluable tool for developers to find and fix > performance problems. The perfetto project enables a combined view of trace > data from kernel ftrace, GPU driver and various manually-instrumented > tracepoints throughout the application and system. This helps developers > quickly answer questions like: > > - How long are frames taking? > - What caused a particular frame drop? > - Is it CPU bound or GPU bound? > - Did a CPU core frequency drop cause something to go slower than usual? > - Is something else running that is stealing CPU or GPU time? Could I > fix that with better thread/context priorities? > - Are all CPU cores being used effectively? Do I need sched_setaffinity > to keep my thread on a big or little core? > - What’s the latency between CPU frame submit and GPU start? > > *What Does Mesa + Perfetto Provide?* > > Mesa is in a unique position to produce GPU trace data for several GPU > vendors without requiring the developer to build and install additional > tools like gfx-pps <https://gitlab.freedesktop.org/Fahien/gfx-pps>. > > The key is making it easy for developers to use. Ideally, perfetto is > eventually available by default in mesa so that if your system has perfetto > traced running, you just need to run perfetto (perhaps along with setting > an environment variable) with the mesa categories to see: > > - GPU processing timeline events. > - GPU counters. > - CPU events for potentially slow functions in mesa like shader compiles. > > Example of what this data might look like (with fake GPU events): > [image: percetto-gpu-example.png] > > *Runtime Characteristics* > > - ~500KB additional binary size. Even with using only the basic features > of perfetto, it will increase the binary size of mesa by about 500KB. > - Background thread. Perfetto uses a background thread for communication > with the system tracing daemon (traced) to advertise trace data and get > notification of trace start/stop. > - Runtime overhead when disabled is designed to be optimal with one > predicted branch, typically a few CPU cycles > <https://perfetto.dev/docs/instrumentation/track-events#performance> per > event. While enabled, the overhead can be around 1 us per event. > > *Integration Challenges* > > - The perfetto SDK is C++ and designed around macros, lambdas, inline > templates, etc. There are ongoing discussions on providing an official > perfetto C API, but it is not yet clear when this will land on the perfetto > roadmap. > - The perfetto SDK is an amalgamated .h and .cc that adds up to 100K > lines of code. > - Anything that includes perfetto.h takes a long time to compile. > - The current Perfetto SDK design is incompatible with being a shared > library behind a C API. > > *Percetto* > > The percetto library <https://github.com/olvaffe/percetto> was recently > implemented to provide an interim C API for perfetto. It provides efficient > support for scoped trace events, multiple categories, counters, custom > timestamps, and debug data annotations. Percetto also provides some > features that are important to mesa, but not available yet with perfetto > SDK: > > - Trace events from multiple perfetto instances in separate shared > libraries (like mesa and virglrenderer) show correctly in a single process > and thread view. > - Counter tracks and macro API. > > Percetto is missing API for perfetto's GPU DataSource and counter support, > but that feature could be implemented next if it is important for mesa. > With the existing percetto API mesa could present GPU trace data as named > 'slice' events and int64_t counters with custom timestamps as shown in the > image above (based on this sample > <https://github.com/olvaffe/percetto/blob/main/examples/timestamps.c>). > > *Mesa Integration Alternatives* > > Note: we have some pressing needs for performance analysis in Chrome OS, so > I'm intentionally leaving out the alternative of waiting for an official > perfetto C API. Of course, once that C API is available it would become an > option to migrate to it from any of the alternatives below. > > Ordered by difficulty with easiest first: > > 1. Statically link with percetto as an optional external dependency > (virglrenderer > now has this approach > <https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/480> > ). > - Pros: API already supports most common tracing needs. Tested and used > by an increasing number of CrOS components. > - Cons: External dependency for optional mesa build option. > 2. Embed Perfetto SDK + a Percetto fork/copy. > - Pros: API already supports most common tracing needs. No added > external dependency for mesa. > - Cons: Percetto code divergence, bug fixes need to land in two trees. > 3. Embed Perfetto SDK + custom C wrapper. > - Pros: Tailored API for mesa's needs. > - Cons: Nontrivial development efforts and maintenance. > 4. Generate C stubs for the Perfetto protobuf and reimplement the > Perfetto SDK in C. > - Pros: Tailored API for mesa's needs. Possible smaller binary impact > from simpler implementation. > - Cons: Significant development efforts and maintenance. > > Regardless of the integration direction, I expect we would disable perfetto > in the default build for now to minimize disruption. > > I like #1, because there are some nontrivial subtleties to the C wrapper > that provide both API conveniences and runtime performance that would need > to be reimplemented or maintained with the other options. I will also > volunteer to do #1 or #2, but I'm not sure I have time for #3 or #4 :D. > > Any other thoughts on how best to integrate perfetto into mesa? > > -jb > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev