Re: [Intel-gfx] [RFC 11/14] drm/i915: Engine busy time tracking
Quoting Tvrtko Ursulin (2017-07-19 10:12:33) > > On 18/07/2017 16:19, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2017-07-18 15:36:15) > >> From: Tvrtko Ursulin > >> > >> Track total time requests have been executing on the hardware. > >> > >> To make this cheap it is hidden behind a static branch with the > >> intention that it is only enabled when there is a consumer > >> listening. This means that in the default off case the total > >> cost of the tracking is just a few no-op instructions on the > >> fast paths. > > > >> +static inline void intel_engine_context_in(struct intel_engine_cs *engine) > >> +{ > >> + if (static_branch_unlikely(&i915_engine_stats_key)) { > >> + unsigned long flags; > >> + > >> + spin_lock_irqsave(&engine->stats.lock, flags); > > > > What's the purpose of this lock? RMW is ordered by virtue of the tasklet > > (only one cpu can be doing the rmw at any time). Did I miss another > > user? > > Hm it should be a plain spin_lock and a _bh variant on the external API. > > But the purpose is to allow atomic sampling of accumulated busy time > plus the current engine status. You can do that with a lockless read, if you treat it as a seqlock and the total as the latch. Something like u64 total, latch, start; total = engine->stats.total; do { latch = total; start = engine->stats.start; smp_rmb(); total = engine->stats.total; } while (total != latch); total += ktime_now() - latch; Assuming x86-64. If we only have 32b atomic reads, it is less fun. > >> + if (engine->stats.ref++ == 0) > >> + engine->stats.start = ktime_get_real_ns(); > > > > Use ktime_get_raw() and leave the conversion to ns to the caller. > > You mean both store in ktime_t and don't fetch the wall time but > monotonic? Do you say this because perf pmu perhaps needs monotonic or > for some other reason? We only want monotonic (otherwise we will observe time going backwards), but whether or not we want the clock source compensation? I was under the impression we could defer that compensation until later. > > What is the cost of a ktime nowadays? > > I don't see it in the profiles, so not much I think. readtsc is there, but not enough to worry. The effect on execution latency is in the noise on initial inspection. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC 11/14] drm/i915: Engine busy time tracking
On 18/07/2017 16:19, Chris Wilson wrote: Quoting Tvrtko Ursulin (2017-07-18 15:36:15) From: Tvrtko Ursulin Track total time requests have been executing on the hardware. To make this cheap it is hidden behind a static branch with the intention that it is only enabled when there is a consumer listening. This means that in the default off case the total cost of the tracking is just a few no-op instructions on the fast paths. +static inline void intel_engine_context_in(struct intel_engine_cs *engine) +{ + if (static_branch_unlikely(&i915_engine_stats_key)) { + unsigned long flags; + + spin_lock_irqsave(&engine->stats.lock, flags); What's the purpose of this lock? RMW is ordered by virtue of the tasklet (only one cpu can be doing the rmw at any time). Did I miss another user? Hm it should be a plain spin_lock and a _bh variant on the external API. But the purpose is to allow atomic sampling of accumulated busy time plus the current engine status. + if (engine->stats.ref++ == 0) + engine->stats.start = ktime_get_real_ns(); Use ktime_get_raw() and leave the conversion to ns to the caller. You mean both store in ktime_t and don't fetch the wall time but monotonic? Do you say this because perf pmu perhaps needs monotonic or for some other reason? What is the cost of a ktime nowadays? I don't see it in the profiles, so not much I think. Regards, Tvrtko ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC 11/14] drm/i915: Engine busy time tracking
Quoting Tvrtko Ursulin (2017-07-18 15:36:15) > From: Tvrtko Ursulin > > Track total time requests have been executing on the hardware. > > To make this cheap it is hidden behind a static branch with the > intention that it is only enabled when there is a consumer > listening. This means that in the default off case the total > cost of the tracking is just a few no-op instructions on the > fast paths. > +static inline void intel_engine_context_in(struct intel_engine_cs *engine) > +{ > + if (static_branch_unlikely(&i915_engine_stats_key)) { > + unsigned long flags; > + > + spin_lock_irqsave(&engine->stats.lock, flags); What's the purpose of this lock? RMW is ordered by virtue of the tasklet (only one cpu can be doing the rmw at any time). Did I miss another user? > + if (engine->stats.ref++ == 0) > + engine->stats.start = ktime_get_real_ns(); Use ktime_get_raw() and leave the conversion to ns to the caller. What is the cost of a ktime nowadays? How do you handle the currently active case when reporting? Imagine a continuous stream of requests, hoping between contexts without a break. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [RFC 11/14] drm/i915: Engine busy time tracking
From: Tvrtko Ursulin Track total time requests have been executing on the hardware. To make this cheap it is hidden behind a static branch with the intention that it is only enabled when there is a consumer listening. This means that in the default off case the total cost of the tracking is just a few no-op instructions on the fast paths. Signed-off-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/intel_engine_cs.c | 6 + drivers/gpu/drm/i915/intel_lrc.c| 2 ++ drivers/gpu/drm/i915/intel_ringbuffer.h | 39 + 3 files changed, 47 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c index 24db316e0fd1..3e5e08c6b5ef 100644 --- a/drivers/gpu/drm/i915/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/intel_engine_cs.c @@ -22,10 +22,14 @@ * */ +#include + #include "i915_drv.h" #include "intel_ringbuffer.h" #include "intel_lrc.h" +DEFINE_STATIC_KEY_FALSE(i915_engine_stats_key); + /* Haswell does have the CXT_SIZE register however it does not appear to be * valid. Now, docs explain in dwords what is in the context object. The full * size is 70720 bytes, however, the power context and execlist context will @@ -223,6 +227,8 @@ intel_engine_setup(struct drm_i915_private *dev_priv, /* Nothing to do here, execute in order of dependencies */ engine->schedule = NULL; + spin_lock_init(&engine->stats.lock); + ATOMIC_INIT_NOTIFIER_HEAD(&engine->context_status_notifier); dev_priv->engine[id] = engine; diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 89aa29f23a92..b72a2d7cd44c 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -310,12 +310,14 @@ execlists_context_status_change(struct drm_i915_gem_request *rq, static inline void execlists_context_schedule_in(struct drm_i915_gem_request *rq) { + intel_engine_context_in(rq->engine); execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_IN); } static inline void execlists_context_schedule_out(struct drm_i915_gem_request *rq) { + intel_engine_context_out(rq->engine); execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_OUT); } diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index 0877b151239d..2eb1e970ad06 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -443,6 +443,13 @@ struct intel_engine_cs { * certain bits to encode the command length in the header). */ u32 (*get_cmd_length_mask)(u32 cmd_header); + + struct { + spinlock_t lock; + unsigned int ref; + u64 start; /* Timestamp of the last idle to active transition. */ + u64 total; /* Total time engined was busy. */ + } stats; }; static inline unsigned int @@ -737,4 +744,36 @@ bool intel_engines_are_idle(struct drm_i915_private *dev_priv); void intel_engines_mark_idle(struct drm_i915_private *i915); void intel_engines_reset_default_submission(struct drm_i915_private *i915); +DECLARE_STATIC_KEY_FALSE(i915_engine_stats_key); + +static inline void intel_engine_context_in(struct intel_engine_cs *engine) +{ + if (static_branch_unlikely(&i915_engine_stats_key)) { + unsigned long flags; + + spin_lock_irqsave(&engine->stats.lock, flags); + if (engine->stats.ref++ == 0) + engine->stats.start = ktime_get_real_ns(); + GEM_BUG_ON(engine->stats.ref == 0); + spin_unlock_irqrestore(&engine->stats.lock, flags); + } +} + +static inline void intel_engine_context_out(struct intel_engine_cs *engine) +{ + if (static_branch_unlikely(&i915_engine_stats_key)) { + unsigned long flags; + + spin_lock_irqsave(&engine->stats.lock, flags); + /* +* After turning on the static key context out might be the +* first event which then needs to be ignored (ref == 0). +*/ + if (engine->stats.ref && --engine->stats.ref == 0) + engine->stats.total += ktime_get_real_ns() - + engine->stats.start; + spin_unlock_irqrestore(&engine->stats.lock, flags); + } +} + #endif /* _INTEL_RINGBUFFER_H_ */ -- 2.9.4 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx