Hi, On Thu, 2021-02-11 at 17:39 -0800, John Bates wrote: > I recently opened issue 4262 > <https://gitlab.freedesktop.org/mesa/mesa/-/issues/4262> to begin the > discussion on integrating perfetto into mesa. > > *Background* > > System-wide tracing is an invaluable tool for developers to find and > fix > performance problems. The perfetto project enables a combined view of > trace > data from kernel ftrace, GPU driver and various manually-instrumented > tracepoints throughout the application and system.
Unlike some other Linux tracing solutions, Perfetto appears to be for Android / Chrome(OS?), and not available from in common Linux distro repos. So, why Perfetto instead of one of the other solutions, e.g. from ones mentioned here: https://tracingsummit.org/ts/2018/ ? And, if tracing API is added to Mesa, shouldn't it support also tracepoints for other tracing solutions? I mean, code added to drivers themselves preferably should not have anything perfetto/percetto specific. Tracing system specific code should be only in one place (even if it's just macros in common header). > This helps developers > quickly answer questions like: > > - How long are frames taking? That doesn't require any changes to Mesa. Just set uprobe for suitable buffer swap function [1], and parse kernel ftrace events. This way starting tracing doesn't require even restarting the tracked processes. [1] glXSwapBuffers, eglSwapBuffers, eglSwapBuffersWithDamageEXT, anv_QueuePresentKHR[2].. [2] Many apps resolve "vkQueuePresentKHR" Vulkan API loader wrapper function and call the backend function like "anv_QueuePresentKHR" directly, so it's better to track latter instead. > - What caused a particular frame drop? > - Is it CPU bound or GPU bound? That doesn't require adding tracepoints to Mesa, just checking CPU & GPU utilization (which is lower level thing). > - Did a CPU core frequency drop cause something to go slower than > usual? Note that nowadays actual CPU frequencies are often controlled by HW / firmware, so you don't necessarily get any ftrace event from freq change, you would need to poll MSR registers instead (which is privileged operation, and polling can easily miss changes). > - Is something else running that is stealing CPU or GPU time? Could > I > fix that with better thread/context priorities? > - Are all CPU cores being used effectively? Do I need > sched_setaffinity > to keep my thread on a big or little core? I don't think these to require adding tracepoints to Mesa either... > - What’s the latency between CPU frame submit and GPU start? I think this would require tracepoints in kernel GPU code more than in Mesa? - Eero > *What Does Mesa + Perfetto Provide?* > > Mesa is in a unique position to produce GPU trace data for several GPU > vendors without requiring the developer to build and install > additional > tools like gfx-pps <https://gitlab.freedesktop.org/Fahien/gfx-pps>. > > The key is making it easy for developers to use. Ideally, perfetto is > eventually available by default in mesa so that if your system has > perfetto > traced running, you just need to run perfetto (perhaps along with > setting > an environment variable) with the mesa categories to see: > > - GPU processing timeline events. > - GPU counters. > - CPU events for potentially slow functions in mesa like shader > compiles. > > Example of what this data might look like (with fake GPU events): > [image: percetto-gpu-example.png] > > *Runtime Characteristics* > > - ~500KB additional binary size. Even with using only the basic > features > of perfetto, it will increase the binary size of mesa by about > 500KB. > - Background thread. Perfetto uses a background thread for > communication > with the system tracing daemon (traced) to advertise trace data and > get > notification of trace start/stop. > - Runtime overhead when disabled is designed to be optimal with one > predicted branch, typically a few CPU cycles > > <https://perfetto.dev/docs/instrumentation/track-events#performance> > per > event. While enabled, the overhead can be around 1 us per event. > > *Integration Challenges* > > - The perfetto SDK is C++ and designed around macros, lambdas, > inline > templates, etc. There are ongoing discussions on providing an > official > perfetto C API, but it is not yet clear when this will land on the > perfetto > roadmap. > - The perfetto SDK is an amalgamated .h and .cc that adds up to > 100K > lines of code. > - Anything that includes perfetto.h takes a long time to compile. > - The current Perfetto SDK design is incompatible with being a > shared > library behind a C API. > > *Percetto* > > The percetto library <https://github.com/olvaffe/percetto> was > recently > implemented to provide an interim C API for perfetto. It provides > efficient > support for scoped trace events, multiple categories, counters, custom > timestamps, and debug data annotations. Percetto also provides some > features that are important to mesa, but not available yet with > perfetto > SDK: > > - Trace events from multiple perfetto instances in separate shared > libraries (like mesa and virglrenderer) show correctly in a single > process > and thread view. > - Counter tracks and macro API. > > Percetto is missing API for perfetto's GPU DataSource and counter > support, > but that feature could be implemented next if it is important for > mesa. > With the existing percetto API mesa could present GPU trace data as > named > 'slice' events and int64_t counters with custom timestamps as shown in > the > image above (based on this sample > <https://github.com/olvaffe/percetto/blob/main/examples/timestamps.c>) > . > > *Mesa Integration Alternatives* > > Note: we have some pressing needs for performance analysis in Chrome > OS, so > I'm intentionally leaving out the alternative of waiting for an > official > perfetto C API. Of course, once that C API is available it would > become an > option to migrate to it from any of the alternatives below. > > Ordered by difficulty with easiest first: > > 1. Statically link with percetto as an optional external dependency > (virglrenderer > now has this approach > < > https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/480 > > > ). > - Pros: API already supports most common tracing needs. Tested and > used > by an increasing number of CrOS components. > - Cons: External dependency for optional mesa build option. > 2. Embed Perfetto SDK + a Percetto fork/copy. > - Pros: API already supports most common tracing needs. No added > external dependency for mesa. > - Cons: Percetto code divergence, bug fixes need to land in two > trees. > 3. Embed Perfetto SDK + custom C wrapper. > - Pros: Tailored API for mesa's needs. > - Cons: Nontrivial development efforts and maintenance. > 4. Generate C stubs for the Perfetto protobuf and reimplement the > Perfetto SDK in C. > - Pros: Tailored API for mesa's needs. Possible smaller binary > impact > from simpler implementation. > - Cons: Significant development efforts and maintenance. > > Regardless of the integration direction, I expect we would disable > perfetto > in the default build for now to minimize disruption. > > I like #1, because there are some nontrivial subtleties to the C > wrapper > that provide both API conveniences and runtime performance that would > need > to be reimplemented or maintained with the other options. I will also > volunteer to do #1 or #2, but I'm not sure I have time for #3 or #4 > :D. > > Any other thoughts on how best to integrate perfetto into mesa? > > -jb > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev