Re: [Intel-gfx] [RFC 00/14] i915 PMU and engine busy stats

2017-07-26 Thread Chris Wilson
Quoting Tvrtko Ursulin (2017-07-26 11:34:49)
> 
> On 20/07/2017 10:03, Tvrtko Ursulin wrote:
> > On 19/07/2017 13:05, Chris Wilson wrote:
> >> Quoting Tvrtko Ursulin (2017-07-18 15:36:04)
> >>> From: Tvrtko Ursulin 
> >>>
> >>> Rough sketch of the idea I mentioned a few times to various people - 
> >>> merging
> >>> the engine busyness tracking with Chris i915 PMU RFC.
> >>>
> >>> First patch is the actual PMU RFC by Chris. It is followed by some 
> >>> cleanup
> >>> patches, then come a few improvements, cheap execlists engine 
> >>> busyness tracking,
> >>> debugfs view for the same, and finally the i915 PMU is extended to 
> >>> use this
> >>> instead of timer based mmio sampling.
> >>>
> >>> This makes it cheaper and also more accurate since engine busyness is 
> >>> not
> >>> derived via sampling.
> >>>
> >>> But I haven't figure out the perf API yet. For example is it possible 
> >>> to access
> >>> our events in an usable fashion via perf top/stat or something? Do we 
> >>> want to
> >>> make the events discoverable as I did (patch 8).
> >>
> >> In my dreams I have gpu activity in the same perf timechart as gpu
> >> activity. But that can be mostly by the request tracepoints, but still
> >> overlaying cpu/gpu activity is desirable and more importantly we want to
> >> coordinate with nouveau/amdgpu so that such interfaces are as agnostic
> >> as possible. There are definitely a bunch of global features in common
> >> for all (engine enumeration & activity, mempool enumeration, size &
> >> activty, power usage?). But the key question is how do we build for the
> >> future? Split the event id range into common/driver?
> > 
> > I don't know if going for common events would be workable. A few metrics 
> > sounds like it could be generic, but I am not sure there would be more 
> > than a couple where that would be future proof. Also is the coordination 
> > effort (no one else seems to implement a perf interface at the moment) 
> > worth it at the current time? I am not sure.
> > 
> >>> I could not find much (any?) kernel API level documentation for perf.
> >>
> >> There isn't much indeed. Given that we now have a second pair of eyes go
> >> over the sampling and improve its interaction with i915, we should start
> >> getting PeterZ involved to check the interaction with perf.
> > 
> > Okay, I guess another cleanup pass and then I can do that.
> > 
> > In the meantime do you have any good understanding of what kind of 
> > events are we exposing here? They look weird if I record them and look 
> > with "perf script", and "perf stat" always reports zeroes for them. But 
> > they still work from the overlay tool. So it is a bit of a mystery to me 
> > what they really are.
> >>> Btw patch series actually works since intel-gpu-overlay can use these 
> >>> events
> >>> when they are available.
> >>>
> >>> Chris Wilson (1):
> >>>RFC drm/i915: Expose a PMU interface for perf queries
> >>
> >> One thing I would like is for any future interface (including this
> >> engine/class/event id) to use the engine class/instance mapping.
> > 
> > I was thinking about that myself. I can do it in the next cleanup pass.
> 
> Although to do this I think it will make more sense for me to squash a 
> bunch of improvements into your patch, and to start working on it 
> directly. Your thoughts on this? Do you mind if I start working on the 
> original patch bumping its version number for all the additions?

Don't mind in the slightest. I thought you had broken them out for a
quick series of "oh, yes, completely missed that" to be squashed in when
the patch is ready, and you feel comfortable on signing off on the whole
thing.
 
> This would mean squashing in probably the first eight patches from this 
> series. Followed by reworking it towards class-instance. And the rc6 
> residency consolidation Sagar suggested.
> 
> How do we keep shared authorship in this case? Can we have two From: 
> lines at the top?

Take it. If you want leave a based on a patch by me, but 90% of the work
is in the last 10% writing the patch, so you will be doing more work to
bring the interface to a reality than I did.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC 00/14] i915 PMU and engine busy stats

2017-07-26 Thread Tvrtko Ursulin


On 20/07/2017 10:03, Tvrtko Ursulin wrote:

On 19/07/2017 13:05, Chris Wilson wrote:

Quoting Tvrtko Ursulin (2017-07-18 15:36:04)

From: Tvrtko Ursulin 

Rough sketch of the idea I mentioned a few times to various people - 
merging

the engine busyness tracking with Chris i915 PMU RFC.

First patch is the actual PMU RFC by Chris. It is followed by some 
cleanup
patches, then come a few improvements, cheap execlists engine 
busyness tracking,
debugfs view for the same, and finally the i915 PMU is extended to 
use this

instead of timer based mmio sampling.

This makes it cheaper and also more accurate since engine busyness is 
not

derived via sampling.

But I haven't figure out the perf API yet. For example is it possible 
to access
our events in an usable fashion via perf top/stat or something? Do we 
want to

make the events discoverable as I did (patch 8).


In my dreams I have gpu activity in the same perf timechart as gpu
activity. But that can be mostly by the request tracepoints, but still
overlaying cpu/gpu activity is desirable and more importantly we want to
coordinate with nouveau/amdgpu so that such interfaces are as agnostic
as possible. There are definitely a bunch of global features in common
for all (engine enumeration & activity, mempool enumeration, size &
activty, power usage?). But the key question is how do we build for the
future? Split the event id range into common/driver?


I don't know if going for common events would be workable. A few metrics 
sounds like it could be generic, but I am not sure there would be more 
than a couple where that would be future proof. Also is the coordination 
effort (no one else seems to implement a perf interface at the moment) 
worth it at the current time? I am not sure.



I could not find much (any?) kernel API level documentation for perf.


There isn't much indeed. Given that we now have a second pair of eyes go
over the sampling and improve its interaction with i915, we should start
getting PeterZ involved to check the interaction with perf.


Okay, I guess another cleanup pass and then I can do that.

In the meantime do you have any good understanding of what kind of 
events are we exposing here? They look weird if I record them and look 
with "perf script", and "perf stat" always reports zeroes for them. But 
they still work from the overlay tool. So it is a bit of a mystery to me 
what they really are.
Btw patch series actually works since intel-gpu-overlay can use these 
events

when they are available.

Chris Wilson (1):
   RFC drm/i915: Expose a PMU interface for perf queries


One thing I would like is for any future interface (including this
engine/class/event id) to use the engine class/instance mapping.


I was thinking about that myself. I can do it in the next cleanup pass.


Although to do this I think it will make more sense for me to squash a 
bunch of improvements into your patch, and to start working on it 
directly. Your thoughts on this? Do you mind if I start working on the 
original patch bumping its version number for all the additions?


This would mean squashing in probably the first eight patches from this 
series. Followed by reworking it towards class-instance. And the rc6 
residency consolidation Sagar suggested.


How do we keep shared authorship in this case? Can we have two From: 
lines at the top?


Regards,

Tvrtko
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC 00/14] i915 PMU and engine busy stats

2017-07-20 Thread Tvrtko Ursulin


On 19/07/2017 13:05, Chris Wilson wrote:

Quoting Tvrtko Ursulin (2017-07-18 15:36:04)

From: Tvrtko Ursulin 

Rough sketch of the idea I mentioned a few times to various people - merging
the engine busyness tracking with Chris i915 PMU RFC.

First patch is the actual PMU RFC by Chris. It is followed by some cleanup
patches, then come a few improvements, cheap execlists engine busyness tracking,
debugfs view for the same, and finally the i915 PMU is extended to use this
instead of timer based mmio sampling.

This makes it cheaper and also more accurate since engine busyness is not
derived via sampling.

But I haven't figure out the perf API yet. For example is it possible to access
our events in an usable fashion via perf top/stat or something? Do we want to
make the events discoverable as I did (patch 8).


In my dreams I have gpu activity in the same perf timechart as gpu
activity. But that can be mostly by the request tracepoints, but still
overlaying cpu/gpu activity is desirable and more importantly we want to
coordinate with nouveau/amdgpu so that such interfaces are as agnostic
as possible. There are definitely a bunch of global features in common
for all (engine enumeration & activity, mempool enumeration, size &
activty, power usage?). But the key question is how do we build for the
future? Split the event id range into common/driver?


I don't know if going for common events would be workable. A few metrics 
sounds like it could be generic, but I am not sure there would be more 
than a couple where that would be future proof. Also is the coordination 
effort (no one else seems to implement a perf interface at the moment) 
worth it at the current time? I am not sure.



I could not find much (any?) kernel API level documentation for perf.


There isn't much indeed. Given that we now have a second pair of eyes go
over the sampling and improve its interaction with i915, we should start
getting PeterZ involved to check the interaction with perf.


Okay, I guess another cleanup pass and then I can do that.

In the meantime do you have any good understanding of what kind of 
events are we exposing here? They look weird if I record them and look 
with "perf script", and "perf stat" always reports zeroes for them. But 
they still work from the overlay tool. So it is a bit of a mystery to me 
what they really are.

Btw patch series actually works since intel-gpu-overlay can use these events
when they are available.

Chris Wilson (1):
   RFC drm/i915: Expose a PMU interface for perf queries


One thing I would like is for any future interface (including this
engine/class/event id) to use the engine class/instance mapping.


I was thinking about that myself. I can do it in the next cleanup pass.

Regards,

Tvrtko
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC 00/14] i915 PMU and engine busy stats

2017-07-19 Thread Chris Wilson
Quoting Tvrtko Ursulin (2017-07-18 15:36:04)
> From: Tvrtko Ursulin 
> 
> Rough sketch of the idea I mentioned a few times to various people - merging
> the engine busyness tracking with Chris i915 PMU RFC.
> 
> First patch is the actual PMU RFC by Chris. It is followed by some cleanup
> patches, then come a few improvements, cheap execlists engine busyness 
> tracking,
> debugfs view for the same, and finally the i915 PMU is extended to use this
> instead of timer based mmio sampling.
> 
> This makes it cheaper and also more accurate since engine busyness is not
> derived via sampling.
> 
> But I haven't figure out the perf API yet. For example is it possible to 
> access
> our events in an usable fashion via perf top/stat or something? Do we want to
> make the events discoverable as I did (patch 8).

In my dreams I have gpu activity in the same perf timechart as gpu
activity. But that can be mostly by the request tracepoints, but still
overlaying cpu/gpu activity is desirable and more importantly we want to
coordinate with nouveau/amdgpu so that such interfaces are as agnostic
as possible. There are definitely a bunch of global features in common
for all (engine enumeration & activity, mempool enumeration, size &
activty, power usage?). But the key question is how do we build for the
future? Split the event id range into common/driver?

> I could not find much (any?) kernel API level documentation for perf.

There isn't much indeed. Given that we now have a second pair of eyes go
over the sampling and improve its interaction with i915, we should start
getting PeterZ involved to check the interaction with perf.
 
> Btw patch series actually works since intel-gpu-overlay can use these events
> when they are available.
> 
> Chris Wilson (1):
>   RFC drm/i915: Expose a PMU interface for perf queries

One thing I would like is for any future interface (including this
engine/class/event id) to use the engine class/instance mapping.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [RFC 00/14] i915 PMU and engine busy stats

2017-07-18 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Rough sketch of the idea I mentioned a few times to various people - merging
the engine busyness tracking with Chris i915 PMU RFC.

First patch is the actual PMU RFC by Chris. It is followed by some cleanup
patches, then come a few improvements, cheap execlists engine busyness tracking,
debugfs view for the same, and finally the i915 PMU is extended to use this
instead of timer based mmio sampling.

This makes it cheaper and also more accurate since engine busyness is not
derived via sampling.

But I haven't figure out the perf API yet. For example is it possible to access
our events in an usable fashion via perf top/stat or something? Do we want to
make the events discoverable as I did (patch 8).

I could not find much (any?) kernel API level documentation for perf.

Btw patch series actually works since intel-gpu-overlay can use these events
when they are available.

Chris Wilson (1):
  RFC drm/i915: Expose a PMU interface for perf queries

Tvrtko Ursulin (13):
  drm/i915/pmu: Add VCS2 engine to the PMU uAPI
  drm/i915/pmu: Add queued samplers to the PMU uAPI
  drm/i915/pmu: Decouple uAPI engine ids
  drm/i915/pmu: Helper to extract engine and sampler from PMU config
  drm/i915/pmu: Only sample enabled samplers
  drm/i915/pmu: Add fake regs
  drm/i915/pmu: Expose events in sysfs
  drm/i915/pmu: Suspend sampling when GPU is idle
  drm/i915: Wrap context schedule notification
  drm/i915: Engine busy time tracking
  drm/i915: Interface for controling engine stats collection
  drm/i915: Export engine busy stats in debugfs
  drm/i915/pmu: Wire up engine busy stats to PMU

 drivers/gpu/drm/i915/Makefile   |   1 +
 drivers/gpu/drm/i915/i915_debugfs.c |  89 
 drivers/gpu/drm/i915/i915_drv.c |   2 +
 drivers/gpu/drm/i915/i915_drv.h |  32 ++
 drivers/gpu/drm/i915/i915_gem.c |   1 +
 drivers/gpu/drm/i915/i915_gem_request.c |   1 +
 drivers/gpu/drm/i915/i915_pmu.c | 697 
 drivers/gpu/drm/i915/intel_engine_cs.c  |  59 +++
 drivers/gpu/drm/i915/intel_lrc.c|  19 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h |  46 +++
 include/uapi/drm/i915_drm.h |  51 +++
 kernel/events/core.c|   1 +
 12 files changed, 996 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_pmu.c

-- 
2.9.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx