Quoting Tvrtko Ursulin (2018-01-24 18:01:14) > > On 22/01/2018 18:52, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2018-01-22 18:43:52) > >> From: Tvrtko Ursulin <tvrtko.ursu...@intel.com> > >> > >> Per-engine queue depths are an interesting metric for analyzing the system > >> load > >> and also for users who wish to use it to load balance their submissions > >> based > >> on it. > >> > >> In this version I have split the metrics into three separate counters: > >> > >> 1. QUEUED - From execbuf time to request being runnable - runnable meaning > >> until > >> dependencies have been resolved and fences signaled. > >> 2. RUNNABLE - From runnable to running on the GPU. > >> 3. RUNNING - Running on the GPU. > >> > >> When inspected with perf stat the output looks roughly like this: > >> > >> # time counts unit events > >> 201.160490145 0.01 i915/rcs0-queued/ > >> 201.160490145 19.13 i915/rcs0-runnable/ > >> 201.160490145 2.39 i915/rcs0-running/ > >> > >> The reported numbers are average queue depths for the last query period. > >> > >> Having split out metrics should be more flexible for all users, and it is > >> still > >> possible to fetch an atomic snapshot of all using the perf groups for those > >> wanting to combine them. > >> > >> For users wanting instantanous numbers instead of averaged, we could > >> potentially > >> expose them using the query API Lionel is working on. > >> (https://patchwork.freedesktop.org/series/36622/) > >> > >> For instance a query packet could look like: > >> > >> #define DRM_I915_QUERY_ENGINE_QUEUES 0x04 > >> > >> struct drm_i915_query_engine_queues { > >> __u8 class; > >> __u8 instance > >> > >> __u8 pad[2]; > >> > >> __u32 queued; > >> __u32 runnable; > >> __u32 running; > >> }; > >> > >> I also have patches to expose this via intel-gpu-top, using the perf API. > > > > Can you stick a ewma loadavg just after the hostname in intel-gpu-overlay, > > pretty please? :) > > Sure, just one period and all three counters aggregated?
Hmm, just runnable + running I think matches loadavg best. (For the cpu that is the number of tasks in the runqueue.) I think having the 1s, 30s, 15m figures would be useful but they can be computed in userspace from the single (combined) sampler. But the problem with just runnable + running is that we don't see those inter-engine dependencies so clearly (but it does hide inter-device waits etc), so I don't know. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx