Hi all,

As Steven Price explained, the "GPU top" kbase approach is often more
useful and accurate than per-draw timing. 

For a 3D game inside a GPU-accelerated desktop, the games' counters
*should* include desktop overhead. This external overhead does affect
the game's performance, especially if the contexts are competing for
resources like memory bandwidth. An isolated sample is easy to achieve
running only the app of interest; in ideal conditions (zero-copy
fullscreen), desktop interference is negligible. 

For driver developers, the system-wide measurements are preferable,
painting a complete system picture and avoiding disruptions. There is no
risk of confusion, as the driver developers understand how the counters
are exposed. Further, benchmarks rendering direct to a GBM surface are
available (glmark2-es2-drm), eliminating interference even with poor
desktop performance.

For app developers, the confusion of multi-context interference is
unfortunate. Nevertheless, if enabling counters were to slow down an
app, the confusion could be worse. Consider second-order changes in the
app's performance characteristics due to slowdown: if techniques like
dynamic resolution scaling are employed, the counters' results can be
invalid.  Likewise, even if the lower-performance counters are
appropriate for purely graphical workloads, complex apps with variable
CPU overhead (e.g. from an FPS-dependent physics engine) can further
confound counters. Low-overhead system-wide measurements mitigate these
concerns.

As Rob Clark suggested, system-wide counters could be exposed via a
semi-standardized interface, perhaps within debugfs/sysfs. The interface
could not be completely standard, as the list of counters exposed varies
substantially by vendor and model. Nevertheless, the mechanics of
discovering, enabling, reading, and disabling counters can be
standardized, as can a small set of universally meaningful counters like
total GPU utilization. This would permit a vendor-independent GPU top
app as suggested, as is I believe currently possible with
vendor-specific downstream kernels (e.g. via Gator/Streamline for Mali)

It looks like this discussion is dormant. Could we try to get this
sorted? For Panfrost, I'm hitting GPU-side bottlenecks that I'm unable
to diagnose without access to the counters, so I'm eager for a mainline
solution to be implemented.

Thank you,

Alyssa
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Reply via email to