Re: [Intel-gfx] [RFC 00/14] i915 PMU and engine busy stats
Quoting Tvrtko Ursulin (2017-07-26 11:34:49) > > On 20/07/2017 10:03, Tvrtko Ursulin wrote: > > On 19/07/2017 13:05, Chris Wilson wrote: > >> Quoting Tvrtko Ursulin (2017-07-18 15:36:04) > >>> From: Tvrtko Ursulin > >>> > >>> Rough sketch of the idea I mentioned a few times to various people - > >>> merging > >>> the engine busyness tracking with Chris i915 PMU RFC. > >>> > >>> First patch is the actual PMU RFC by Chris. It is followed by some > >>> cleanup > >>> patches, then come a few improvements, cheap execlists engine > >>> busyness tracking, > >>> debugfs view for the same, and finally the i915 PMU is extended to > >>> use this > >>> instead of timer based mmio sampling. > >>> > >>> This makes it cheaper and also more accurate since engine busyness is > >>> not > >>> derived via sampling. > >>> > >>> But I haven't figure out the perf API yet. For example is it possible > >>> to access > >>> our events in an usable fashion via perf top/stat or something? Do we > >>> want to > >>> make the events discoverable as I did (patch 8). > >> > >> In my dreams I have gpu activity in the same perf timechart as gpu > >> activity. But that can be mostly by the request tracepoints, but still > >> overlaying cpu/gpu activity is desirable and more importantly we want to > >> coordinate with nouveau/amdgpu so that such interfaces are as agnostic > >> as possible. There are definitely a bunch of global features in common > >> for all (engine enumeration & activity, mempool enumeration, size & > >> activty, power usage?). But the key question is how do we build for the > >> future? Split the event id range into common/driver? > > > > I don't know if going for common events would be workable. A few metrics > > sounds like it could be generic, but I am not sure there would be more > > than a couple where that would be future proof. Also is the coordination > > effort (no one else seems to implement a perf interface at the moment) > > worth it at the current time? I am not sure. > > > >>> I could not find much (any?) kernel API level documentation for perf. > >> > >> There isn't much indeed. Given that we now have a second pair of eyes go > >> over the sampling and improve its interaction with i915, we should start > >> getting PeterZ involved to check the interaction with perf. > > > > Okay, I guess another cleanup pass and then I can do that. > > > > In the meantime do you have any good understanding of what kind of > > events are we exposing here? They look weird if I record them and look > > with "perf script", and "perf stat" always reports zeroes for them. But > > they still work from the overlay tool. So it is a bit of a mystery to me > > what they really are. > >>> Btw patch series actually works since intel-gpu-overlay can use these > >>> events > >>> when they are available. > >>> > >>> Chris Wilson (1): > >>>RFC drm/i915: Expose a PMU interface for perf queries > >> > >> One thing I would like is for any future interface (including this > >> engine/class/event id) to use the engine class/instance mapping. > > > > I was thinking about that myself. I can do it in the next cleanup pass. > > Although to do this I think it will make more sense for me to squash a > bunch of improvements into your patch, and to start working on it > directly. Your thoughts on this? Do you mind if I start working on the > original patch bumping its version number for all the additions? Don't mind in the slightest. I thought you had broken them out for a quick series of "oh, yes, completely missed that" to be squashed in when the patch is ready, and you feel comfortable on signing off on the whole thing. > This would mean squashing in probably the first eight patches from this > series. Followed by reworking it towards class-instance. And the rc6 > residency consolidation Sagar suggested. > > How do we keep shared authorship in this case? Can we have two From: > lines at the top? Take it. If you want leave a based on a patch by me, but 90% of the work is in the last 10% writing the patch, so you will be doing more work to bring the interface to a reality than I did. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC 00/14] i915 PMU and engine busy stats
On 20/07/2017 10:03, Tvrtko Ursulin wrote: On 19/07/2017 13:05, Chris Wilson wrote: Quoting Tvrtko Ursulin (2017-07-18 15:36:04) From: Tvrtko Ursulin Rough sketch of the idea I mentioned a few times to various people - merging the engine busyness tracking with Chris i915 PMU RFC. First patch is the actual PMU RFC by Chris. It is followed by some cleanup patches, then come a few improvements, cheap execlists engine busyness tracking, debugfs view for the same, and finally the i915 PMU is extended to use this instead of timer based mmio sampling. This makes it cheaper and also more accurate since engine busyness is not derived via sampling. But I haven't figure out the perf API yet. For example is it possible to access our events in an usable fashion via perf top/stat or something? Do we want to make the events discoverable as I did (patch 8). In my dreams I have gpu activity in the same perf timechart as gpu activity. But that can be mostly by the request tracepoints, but still overlaying cpu/gpu activity is desirable and more importantly we want to coordinate with nouveau/amdgpu so that such interfaces are as agnostic as possible. There are definitely a bunch of global features in common for all (engine enumeration & activity, mempool enumeration, size & activty, power usage?). But the key question is how do we build for the future? Split the event id range into common/driver? I don't know if going for common events would be workable. A few metrics sounds like it could be generic, but I am not sure there would be more than a couple where that would be future proof. Also is the coordination effort (no one else seems to implement a perf interface at the moment) worth it at the current time? I am not sure. I could not find much (any?) kernel API level documentation for perf. There isn't much indeed. Given that we now have a second pair of eyes go over the sampling and improve its interaction with i915, we should start getting PeterZ involved to check the interaction with perf. Okay, I guess another cleanup pass and then I can do that. In the meantime do you have any good understanding of what kind of events are we exposing here? They look weird if I record them and look with "perf script", and "perf stat" always reports zeroes for them. But they still work from the overlay tool. So it is a bit of a mystery to me what they really are. Btw patch series actually works since intel-gpu-overlay can use these events when they are available. Chris Wilson (1): RFC drm/i915: Expose a PMU interface for perf queries One thing I would like is for any future interface (including this engine/class/event id) to use the engine class/instance mapping. I was thinking about that myself. I can do it in the next cleanup pass. Although to do this I think it will make more sense for me to squash a bunch of improvements into your patch, and to start working on it directly. Your thoughts on this? Do you mind if I start working on the original patch bumping its version number for all the additions? This would mean squashing in probably the first eight patches from this series. Followed by reworking it towards class-instance. And the rc6 residency consolidation Sagar suggested. How do we keep shared authorship in this case? Can we have two From: lines at the top? Regards, Tvrtko ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC 00/14] i915 PMU and engine busy stats
On 19/07/2017 13:05, Chris Wilson wrote: Quoting Tvrtko Ursulin (2017-07-18 15:36:04) From: Tvrtko Ursulin Rough sketch of the idea I mentioned a few times to various people - merging the engine busyness tracking with Chris i915 PMU RFC. First patch is the actual PMU RFC by Chris. It is followed by some cleanup patches, then come a few improvements, cheap execlists engine busyness tracking, debugfs view for the same, and finally the i915 PMU is extended to use this instead of timer based mmio sampling. This makes it cheaper and also more accurate since engine busyness is not derived via sampling. But I haven't figure out the perf API yet. For example is it possible to access our events in an usable fashion via perf top/stat or something? Do we want to make the events discoverable as I did (patch 8). In my dreams I have gpu activity in the same perf timechart as gpu activity. But that can be mostly by the request tracepoints, but still overlaying cpu/gpu activity is desirable and more importantly we want to coordinate with nouveau/amdgpu so that such interfaces are as agnostic as possible. There are definitely a bunch of global features in common for all (engine enumeration & activity, mempool enumeration, size & activty, power usage?). But the key question is how do we build for the future? Split the event id range into common/driver? I don't know if going for common events would be workable. A few metrics sounds like it could be generic, but I am not sure there would be more than a couple where that would be future proof. Also is the coordination effort (no one else seems to implement a perf interface at the moment) worth it at the current time? I am not sure. I could not find much (any?) kernel API level documentation for perf. There isn't much indeed. Given that we now have a second pair of eyes go over the sampling and improve its interaction with i915, we should start getting PeterZ involved to check the interaction with perf. Okay, I guess another cleanup pass and then I can do that. In the meantime do you have any good understanding of what kind of events are we exposing here? They look weird if I record them and look with "perf script", and "perf stat" always reports zeroes for them. But they still work from the overlay tool. So it is a bit of a mystery to me what they really are. Btw patch series actually works since intel-gpu-overlay can use these events when they are available. Chris Wilson (1): RFC drm/i915: Expose a PMU interface for perf queries One thing I would like is for any future interface (including this engine/class/event id) to use the engine class/instance mapping. I was thinking about that myself. I can do it in the next cleanup pass. Regards, Tvrtko ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC 00/14] i915 PMU and engine busy stats
Quoting Tvrtko Ursulin (2017-07-18 15:36:04) > From: Tvrtko Ursulin > > Rough sketch of the idea I mentioned a few times to various people - merging > the engine busyness tracking with Chris i915 PMU RFC. > > First patch is the actual PMU RFC by Chris. It is followed by some cleanup > patches, then come a few improvements, cheap execlists engine busyness > tracking, > debugfs view for the same, and finally the i915 PMU is extended to use this > instead of timer based mmio sampling. > > This makes it cheaper and also more accurate since engine busyness is not > derived via sampling. > > But I haven't figure out the perf API yet. For example is it possible to > access > our events in an usable fashion via perf top/stat or something? Do we want to > make the events discoverable as I did (patch 8). In my dreams I have gpu activity in the same perf timechart as gpu activity. But that can be mostly by the request tracepoints, but still overlaying cpu/gpu activity is desirable and more importantly we want to coordinate with nouveau/amdgpu so that such interfaces are as agnostic as possible. There are definitely a bunch of global features in common for all (engine enumeration & activity, mempool enumeration, size & activty, power usage?). But the key question is how do we build for the future? Split the event id range into common/driver? > I could not find much (any?) kernel API level documentation for perf. There isn't much indeed. Given that we now have a second pair of eyes go over the sampling and improve its interaction with i915, we should start getting PeterZ involved to check the interaction with perf. > Btw patch series actually works since intel-gpu-overlay can use these events > when they are available. > > Chris Wilson (1): > RFC drm/i915: Expose a PMU interface for perf queries One thing I would like is for any future interface (including this engine/class/event id) to use the engine class/instance mapping. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [RFC 00/14] i915 PMU and engine busy stats
From: Tvrtko Ursulin Rough sketch of the idea I mentioned a few times to various people - merging the engine busyness tracking with Chris i915 PMU RFC. First patch is the actual PMU RFC by Chris. It is followed by some cleanup patches, then come a few improvements, cheap execlists engine busyness tracking, debugfs view for the same, and finally the i915 PMU is extended to use this instead of timer based mmio sampling. This makes it cheaper and also more accurate since engine busyness is not derived via sampling. But I haven't figure out the perf API yet. For example is it possible to access our events in an usable fashion via perf top/stat or something? Do we want to make the events discoverable as I did (patch 8). I could not find much (any?) kernel API level documentation for perf. Btw patch series actually works since intel-gpu-overlay can use these events when they are available. Chris Wilson (1): RFC drm/i915: Expose a PMU interface for perf queries Tvrtko Ursulin (13): drm/i915/pmu: Add VCS2 engine to the PMU uAPI drm/i915/pmu: Add queued samplers to the PMU uAPI drm/i915/pmu: Decouple uAPI engine ids drm/i915/pmu: Helper to extract engine and sampler from PMU config drm/i915/pmu: Only sample enabled samplers drm/i915/pmu: Add fake regs drm/i915/pmu: Expose events in sysfs drm/i915/pmu: Suspend sampling when GPU is idle drm/i915: Wrap context schedule notification drm/i915: Engine busy time tracking drm/i915: Interface for controling engine stats collection drm/i915: Export engine busy stats in debugfs drm/i915/pmu: Wire up engine busy stats to PMU drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/i915_debugfs.c | 89 drivers/gpu/drm/i915/i915_drv.c | 2 + drivers/gpu/drm/i915/i915_drv.h | 32 ++ drivers/gpu/drm/i915/i915_gem.c | 1 + drivers/gpu/drm/i915/i915_gem_request.c | 1 + drivers/gpu/drm/i915/i915_pmu.c | 697 drivers/gpu/drm/i915/intel_engine_cs.c | 59 +++ drivers/gpu/drm/i915/intel_lrc.c| 19 +- drivers/gpu/drm/i915/intel_ringbuffer.h | 46 +++ include/uapi/drm/i915_drm.h | 51 +++ kernel/events/core.c| 1 + 12 files changed, 996 insertions(+), 3 deletions(-) create mode 100644 drivers/gpu/drm/i915/i915_pmu.c -- 2.9.4 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx