Re: [Intel-gfx] [RFC 12/14] drm/i915: Interface for controling engine stats collection
On 17-07-19 10:34:14, Tvrtko Ursulin wrote: Hi Ben, On 18/07/2017 15:36, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Enables other i915 components to enable and disable the facility as needed. Signed-off-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/intel_engine_cs.c | 53 + drivers/gpu/drm/i915/intel_ringbuffer.h | 5 2 files changed, 58 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c index 3e5e08c6b5ef..03e7459bad06 100644 --- a/drivers/gpu/drm/i915/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/intel_engine_cs.c @@ -29,6 +29,8 @@ #include "intel_lrc.h" DEFINE_STATIC_KEY_FALSE(i915_engine_stats_key); +static DEFINE_MUTEX(i915_engine_stats_mutex); +static int i915_engine_stats_ref; /* Haswell does have the CXT_SIZE register however it does not appear to be * valid. Now, docs explain in dwords what is in the context object. The full @@ -1340,6 +1342,57 @@ void intel_engines_mark_idle(struct drm_i915_private *i915) } } +int intel_enable_engine_stats(struct drm_i915_private *dev_priv) +{ + if (!i915.enable_execlists) + return -ENODEV; + + mutex_lock(&i915_engine_stats_mutex); + if (i915_engine_stats_ref++ == 0) { + struct intel_engine_cs *engine; + enum intel_engine_id id; + + for_each_engine(engine, dev_priv, id) { + memset(&engine->stats, 0, sizeof(engine->stats)); + spin_lock_init(&engine->stats.lock); + } + + static_branch_enable(&i915_engine_stats_key); + } + mutex_unlock(&i915_engine_stats_mutex); + + return 0; +} + +void intel_disable_engine_stats(void) +{ + mutex_lock(&i915_engine_stats_mutex); + if (--i915_engine_stats_ref == 0) + static_branch_disable(&i915_engine_stats_key); + mutex_unlock(&i915_engine_stats_mutex); +} + +u64 intel_engine_get_current_busy_ns(struct intel_engine_cs *engine) +{ + unsigned long flags; + u64 total; + + spin_lock_irqsave(&engine->stats.lock, flags); + + total = engine->stats.total; + + /* +* If the engine is executing something at the moment +* add it to the total. +*/ + if (engine->stats.ref) + total += ktime_get_real_ns() - engine->stats.start; + + spin_unlock_irqrestore(&engine->stats.lock, flags); + + return total; +} + #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) #include "selftests/mock_engine.c" #endif diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index 2eb1e970ad06..e0f495a6d0d9 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -776,4 +776,9 @@ static inline void intel_engine_context_out(struct intel_engine_cs *engine) } } +int intel_enable_engine_stats(struct drm_i915_private *i915); +void intel_disable_engine_stats(void); + +u64 intel_engine_get_current_busy_ns(struct intel_engine_cs *engine); If we exported these symbols for other modules to use, what kind of API would they need? Presumably not per-engine but something to give the aggregated busyness of all engines? Or I have misunderstood you that there is this requirement? Regards, Tvrtko No misunderstanding. For our current usage, busyness of all engines would be easiest. If one of the engines doesn't contribute much to the total TDP though, it wouldn't need to actually be included, so we could perhaps leave room for per-engine. -- Ben Widawsky, Intel Open Source Technology Center ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC 12/14] drm/i915: Interface for controling engine stats collection
On 19/07/2017 12:04, Chris Wilson wrote: [snip] Long term though having a global static key is going to be a nasty wart. Joonas will definitely ask the question how much will it cost us to use an engine->bool and what we can do to minimise that cost. Why you think it is nasty? Sounds pretty cool to me. If we enable sampling on one device (engine even!), it affects another. But the device is the more compelling argument against. Since you mention engines, I can do it on engine granularity with normal branches. It makes sense for the pmu interface to have it per engine. Then as I said before, I can put in a late patch in the series which adds a static key (master enable/disable on or-ed per-engine enables) just in case we find it attractive. Regards, Tvrtko ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC 12/14] drm/i915: Interface for controling engine stats collection
Quoting Tvrtko Ursulin (2017-07-19 10:30:13) > > On 18/07/2017 16:22, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2017-07-18 15:36:16) > >> +u64 intel_engine_get_current_busy_ns(struct intel_engine_cs *engine) > >> +{ > >> + unsigned long flags; > >> + u64 total; > >> + > >> + spin_lock_irqsave(&engine->stats.lock, flags); > >> + > >> + total = engine->stats.total; > >> + > >> + /* > >> +* If the engine is executing something at the moment > >> +* add it to the total. > >> +*/ > >> + if (engine->stats.ref) > >> + total += ktime_get_real_ns() - engine->stats.start; > >> + > >> + spin_unlock_irqrestore(&engine->stats.lock, flags); > > > > Answers to another patch found here. I would say this is the other half > > of the interface and should be kept together. > > Yes, it was an ugly split. > > On 18/07/2017 16:43, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2017-07-18 15:36:16) > >> +int intel_enable_engine_stats(struct drm_i915_private *dev_priv) > >> +{ > >> + if (!i915.enable_execlists) > >> + return -ENODEV; > >> + > >> + mutex_lock(&i915_engine_stats_mutex); > >> + if (i915_engine_stats_ref++ == 0) { > >> + struct intel_engine_cs *engine; > >> + enum intel_engine_id id; > >> + > >> + for_each_engine(engine, dev_priv, id) { > >> + memset(&engine->stats, 0, sizeof(engine->stats)); > >> + spin_lock_init(&engine->stats.lock); > >> + } > >> + > >> + static_branch_enable(&i915_engine_stats_key); > >> + } > >> + mutex_unlock(&i915_engine_stats_mutex); > > > > I don't think static_branch_enable() is a might_sleep, so it looks like > > you can rewrite this avoiding the mutex and thus not requiring the > > worker and then can use the error code here to decide if you need to > > use the timer instead. > > Perhaps I could get rid of the mutex though by using atomic_inc/dec_return. > > But there is a mutex in jump label handling, Totally missed it. I wonder why it is a mutex, certainly serialising enable/disable need something. The comments suggest that once upon a time (or different arch?) it was much more of a stop_machine(). > so I think the workers have > to stay - and it is also beneficial to have delayed static branch disable, > since the perf core seems to like calling start/stop on the events a lot. > But it is recommended practice for static branches in any way. Interesting there is a static_key_slow_dec_deferred. But honestly I think it is hard to defend a global static_key for a per-device interface. > So from that angle I could perhaps even move the delayed disable to this > patch so it is automatically shared by all callers. > > >> +static DEFINE_MUTEX(i915_engine_stats_mutex); > >> +static int i915_engine_stats_ref; > >> > >> /* Haswell does have the CXT_SIZE register however it does not appear to > >> be > >>* valid. Now, docs explain in dwords what is in the context object. The > >> full > >> @@ -1340,6 +1342,57 @@ void intel_engines_mark_idle(struct > >> drm_i915_private *i915) > >> } > >> } > >> > >> +int intel_enable_engine_stats(struct drm_i915_private *dev_priv) > > > > The pattern I've been trying to use here is > > > > intel_engine_* - operate on the named engine > > > > intel_engines_* - operate on all engines > > Ok. > > > Long term though having a global static key is going to be a nasty wart. > > Joonas will definitely ask the question how much will it cost us to use > > an engine->bool and what we can do to minimise that cost. > > Why you think it is nasty? Sounds pretty cool to me. If we enable sampling on one device (engine even!), it affects another. But the device is the more compelling argument against. > But I think can re-organize the series to start with a normal branch and > then add the static one on top if so is desired. Ok. I like the idea of dynamically patching in branches, and hate to be party pooper! -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC 12/14] drm/i915: Interface for controling engine stats collection
Hi Ben, On 18/07/2017 15:36, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Enables other i915 components to enable and disable the facility as needed. Signed-off-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/intel_engine_cs.c | 53 + drivers/gpu/drm/i915/intel_ringbuffer.h | 5 2 files changed, 58 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c index 3e5e08c6b5ef..03e7459bad06 100644 --- a/drivers/gpu/drm/i915/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/intel_engine_cs.c @@ -29,6 +29,8 @@ #include "intel_lrc.h" DEFINE_STATIC_KEY_FALSE(i915_engine_stats_key); +static DEFINE_MUTEX(i915_engine_stats_mutex); +static int i915_engine_stats_ref; /* Haswell does have the CXT_SIZE register however it does not appear to be * valid. Now, docs explain in dwords what is in the context object. The full @@ -1340,6 +1342,57 @@ void intel_engines_mark_idle(struct drm_i915_private *i915) } } +int intel_enable_engine_stats(struct drm_i915_private *dev_priv) +{ + if (!i915.enable_execlists) + return -ENODEV; + + mutex_lock(&i915_engine_stats_mutex); + if (i915_engine_stats_ref++ == 0) { + struct intel_engine_cs *engine; + enum intel_engine_id id; + + for_each_engine(engine, dev_priv, id) { + memset(&engine->stats, 0, sizeof(engine->stats)); + spin_lock_init(&engine->stats.lock); + } + + static_branch_enable(&i915_engine_stats_key); + } + mutex_unlock(&i915_engine_stats_mutex); + + return 0; +} + +void intel_disable_engine_stats(void) +{ + mutex_lock(&i915_engine_stats_mutex); + if (--i915_engine_stats_ref == 0) + static_branch_disable(&i915_engine_stats_key); + mutex_unlock(&i915_engine_stats_mutex); +} + +u64 intel_engine_get_current_busy_ns(struct intel_engine_cs *engine) +{ + unsigned long flags; + u64 total; + + spin_lock_irqsave(&engine->stats.lock, flags); + + total = engine->stats.total; + + /* +* If the engine is executing something at the moment +* add it to the total. +*/ + if (engine->stats.ref) + total += ktime_get_real_ns() - engine->stats.start; + + spin_unlock_irqrestore(&engine->stats.lock, flags); + + return total; +} + #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) #include "selftests/mock_engine.c" #endif diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index 2eb1e970ad06..e0f495a6d0d9 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -776,4 +776,9 @@ static inline void intel_engine_context_out(struct intel_engine_cs *engine) } } +int intel_enable_engine_stats(struct drm_i915_private *i915); +void intel_disable_engine_stats(void); + +u64 intel_engine_get_current_busy_ns(struct intel_engine_cs *engine); If we exported these symbols for other modules to use, what kind of API would they need? Presumably not per-engine but something to give the aggregated busyness of all engines? Or I have misunderstood you that there is this requirement? Regards, Tvrtko ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC 12/14] drm/i915: Interface for controling engine stats collection
On 18/07/2017 16:22, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2017-07-18 15:36:16) >> +u64 intel_engine_get_current_busy_ns(struct intel_engine_cs *engine) >> +{ >> + unsigned long flags; >> + u64 total; >> + >> + spin_lock_irqsave(&engine->stats.lock, flags); >> + >> + total = engine->stats.total; >> + >> + /* >> +* If the engine is executing something at the moment >> +* add it to the total. >> +*/ >> + if (engine->stats.ref) >> + total += ktime_get_real_ns() - engine->stats.start; >> + >> + spin_unlock_irqrestore(&engine->stats.lock, flags); > > Answers to another patch found here. I would say this is the other half > of the interface and should be kept together. Yes, it was an ugly split. On 18/07/2017 16:43, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2017-07-18 15:36:16) >> +int intel_enable_engine_stats(struct drm_i915_private *dev_priv) >> +{ >> + if (!i915.enable_execlists) >> + return -ENODEV; >> + >> + mutex_lock(&i915_engine_stats_mutex); >> + if (i915_engine_stats_ref++ == 0) { >> + struct intel_engine_cs *engine; >> + enum intel_engine_id id; >> + >> + for_each_engine(engine, dev_priv, id) { >> + memset(&engine->stats, 0, sizeof(engine->stats)); >> + spin_lock_init(&engine->stats.lock); >> + } >> + >> + static_branch_enable(&i915_engine_stats_key); >> + } >> + mutex_unlock(&i915_engine_stats_mutex); > > I don't think static_branch_enable() is a might_sleep, so it looks like > you can rewrite this avoiding the mutex and thus not requiring the > worker and then can use the error code here to decide if you need to > use the timer instead. Perhaps I could get rid of the mutex though by using atomic_inc/dec_return. But there is a mutex in jump label handling, so I think the workers have to stay - and it is also beneficial to have delayed static branch disable, since the perf core seems to like calling start/stop on the events a lot. But it is recommended practice for static branches in any way. So from that angle I could perhaps even move the delayed disable to this patch so it is automatically shared by all callers. >> +static DEFINE_MUTEX(i915_engine_stats_mutex); >> +static int i915_engine_stats_ref; >> >> /* Haswell does have the CXT_SIZE register however it does not appear to be >>* valid. Now, docs explain in dwords what is in the context object. The >> full >> @@ -1340,6 +1342,57 @@ void intel_engines_mark_idle(struct drm_i915_private >> *i915) >> } >> } >> >> +int intel_enable_engine_stats(struct drm_i915_private *dev_priv) > > The pattern I've been trying to use here is > > intel_engine_* - operate on the named engine > > intel_engines_* - operate on all engines Ok. > Long term though having a global static key is going to be a nasty wart. > Joonas will definitely ask the question how much will it cost us to use > an engine->bool and what we can do to minimise that cost. Why you think it is nasty? Sounds pretty cool to me. But I think can re-organize the series to start with a normal branch and then add the static one on top if so is desired. Regards, Tvrtko ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC 12/14] drm/i915: Interface for controling engine stats collection
Quoting Tvrtko Ursulin (2017-07-18 15:36:16) > From: Tvrtko Ursulin > > Enables other i915 components to enable and disable > the facility as needed. > > Signed-off-by: Tvrtko Ursulin > --- > drivers/gpu/drm/i915/intel_engine_cs.c | 53 > + > drivers/gpu/drm/i915/intel_ringbuffer.h | 5 > 2 files changed, 58 insertions(+) > > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c > b/drivers/gpu/drm/i915/intel_engine_cs.c > index 3e5e08c6b5ef..03e7459bad06 100644 > --- a/drivers/gpu/drm/i915/intel_engine_cs.c > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c > @@ -29,6 +29,8 @@ > #include "intel_lrc.h" > > DEFINE_STATIC_KEY_FALSE(i915_engine_stats_key); > +static DEFINE_MUTEX(i915_engine_stats_mutex); > +static int i915_engine_stats_ref; > > /* Haswell does have the CXT_SIZE register however it does not appear to be > * valid. Now, docs explain in dwords what is in the context object. The full > @@ -1340,6 +1342,57 @@ void intel_engines_mark_idle(struct drm_i915_private > *i915) > } > } > > +int intel_enable_engine_stats(struct drm_i915_private *dev_priv) The pattern I've been trying to use here is intel_engine_* - operate on the named engine intel_engines_* - operate on all engines Long term though having a global static key is going to be a nasty wart. Joonas will definitely ask the question how much will it cost us to use an engine->bool and what we can do to minimise that cost. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC 12/14] drm/i915: Interface for controling engine stats collection
Quoting Tvrtko Ursulin (2017-07-18 15:36:16) > +int intel_enable_engine_stats(struct drm_i915_private *dev_priv) > +{ > + if (!i915.enable_execlists) > + return -ENODEV; > + > + mutex_lock(&i915_engine_stats_mutex); > + if (i915_engine_stats_ref++ == 0) { > + struct intel_engine_cs *engine; > + enum intel_engine_id id; > + > + for_each_engine(engine, dev_priv, id) { > + memset(&engine->stats, 0, sizeof(engine->stats)); > + spin_lock_init(&engine->stats.lock); > + } > + > + static_branch_enable(&i915_engine_stats_key); > + } > + mutex_unlock(&i915_engine_stats_mutex); I don't think static_branch_enable() is a might_sleep, so it looks like you can rewrite this avoiding the mutex and thus not requiring the worker and then can use the error code here to decide if you need to use the timer instead. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC 12/14] drm/i915: Interface for controling engine stats collection
Quoting Tvrtko Ursulin (2017-07-18 15:36:16) > +u64 intel_engine_get_current_busy_ns(struct intel_engine_cs *engine) > +{ > + unsigned long flags; > + u64 total; > + > + spin_lock_irqsave(&engine->stats.lock, flags); > + > + total = engine->stats.total; > + > + /* > +* If the engine is executing something at the moment > +* add it to the total. > +*/ > + if (engine->stats.ref) > + total += ktime_get_real_ns() - engine->stats.start; > + > + spin_unlock_irqrestore(&engine->stats.lock, flags); Answers to another patch found here. I would say this is the other half of the interface and should be kept together. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [RFC 12/14] drm/i915: Interface for controling engine stats collection
From: Tvrtko Ursulin Enables other i915 components to enable and disable the facility as needed. Signed-off-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/intel_engine_cs.c | 53 + drivers/gpu/drm/i915/intel_ringbuffer.h | 5 2 files changed, 58 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c index 3e5e08c6b5ef..03e7459bad06 100644 --- a/drivers/gpu/drm/i915/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/intel_engine_cs.c @@ -29,6 +29,8 @@ #include "intel_lrc.h" DEFINE_STATIC_KEY_FALSE(i915_engine_stats_key); +static DEFINE_MUTEX(i915_engine_stats_mutex); +static int i915_engine_stats_ref; /* Haswell does have the CXT_SIZE register however it does not appear to be * valid. Now, docs explain in dwords what is in the context object. The full @@ -1340,6 +1342,57 @@ void intel_engines_mark_idle(struct drm_i915_private *i915) } } +int intel_enable_engine_stats(struct drm_i915_private *dev_priv) +{ + if (!i915.enable_execlists) + return -ENODEV; + + mutex_lock(&i915_engine_stats_mutex); + if (i915_engine_stats_ref++ == 0) { + struct intel_engine_cs *engine; + enum intel_engine_id id; + + for_each_engine(engine, dev_priv, id) { + memset(&engine->stats, 0, sizeof(engine->stats)); + spin_lock_init(&engine->stats.lock); + } + + static_branch_enable(&i915_engine_stats_key); + } + mutex_unlock(&i915_engine_stats_mutex); + + return 0; +} + +void intel_disable_engine_stats(void) +{ + mutex_lock(&i915_engine_stats_mutex); + if (--i915_engine_stats_ref == 0) + static_branch_disable(&i915_engine_stats_key); + mutex_unlock(&i915_engine_stats_mutex); +} + +u64 intel_engine_get_current_busy_ns(struct intel_engine_cs *engine) +{ + unsigned long flags; + u64 total; + + spin_lock_irqsave(&engine->stats.lock, flags); + + total = engine->stats.total; + + /* +* If the engine is executing something at the moment +* add it to the total. +*/ + if (engine->stats.ref) + total += ktime_get_real_ns() - engine->stats.start; + + spin_unlock_irqrestore(&engine->stats.lock, flags); + + return total; +} + #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) #include "selftests/mock_engine.c" #endif diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index 2eb1e970ad06..e0f495a6d0d9 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -776,4 +776,9 @@ static inline void intel_engine_context_out(struct intel_engine_cs *engine) } } +int intel_enable_engine_stats(struct drm_i915_private *i915); +void intel_disable_engine_stats(void); + +u64 intel_engine_get_current_busy_ns(struct intel_engine_cs *engine); + #endif /* _INTEL_RINGBUFFER_H_ */ -- 2.9.4 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx