Hi all, As Steven Price explained, the "GPU top" kbase approach is often more useful and accurate than per-draw timing.
For a 3D game inside a GPU-accelerated desktop, the games' counters *should* include desktop overhead. This external overhead does affect the game's performance, especially if the contexts are competing for resources like memory bandwidth. An isolated sample is easy to achieve running only the app of interest; in ideal conditions (zero-copy fullscreen), desktop interference is negligible. For driver developers, the system-wide measurements are preferable, painting a complete system picture and avoiding disruptions. There is no risk of confusion, as the driver developers understand how the counters are exposed. Further, benchmarks rendering direct to a GBM surface are available (glmark2-es2-drm), eliminating interference even with poor desktop performance. For app developers, the confusion of multi-context interference is unfortunate. Nevertheless, if enabling counters were to slow down an app, the confusion could be worse. Consider second-order changes in the app's performance characteristics due to slowdown: if techniques like dynamic resolution scaling are employed, the counters' results can be invalid. Likewise, even if the lower-performance counters are appropriate for purely graphical workloads, complex apps with variable CPU overhead (e.g. from an FPS-dependent physics engine) can further confound counters. Low-overhead system-wide measurements mitigate these concerns. As Rob Clark suggested, system-wide counters could be exposed via a semi-standardized interface, perhaps within debugfs/sysfs. The interface could not be completely standard, as the list of counters exposed varies substantially by vendor and model. Nevertheless, the mechanics of discovering, enabling, reading, and disabling counters can be standardized, as can a small set of universally meaningful counters like total GPU utilization. This would permit a vendor-independent GPU top app as suggested, as is I believe currently possible with vendor-specific downstream kernels (e.g. via Gator/Streamline for Mali) It looks like this discussion is dormant. Could we try to get this sorted? For Panfrost, I'm hitting GPU-side bottlenecks that I'm unable to diagnose without access to the counters, so I'm eager for a mainline solution to be implemented. Thank you, Alyssa _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel