[Intel-gfx] [PATCH 03/15] drm/i915: Framework for capturing command stream based OA reports

2016-11-04 Thread sourab . gupta
From: Sourab Gupta 

This patch introduces a framework to enable OA counter reports associated
with Render command stream. We can then associate the reports captured
through this mechanism with their corresponding context id's. This can be
further extended to associate any other metadata information with the
corresponding samples (since the association with Render command stream
gives us the ability to capture these information while inserting the
corresponding capture commands into the command stream).

The OA reports generated in this way are associated with a corresponding
workload, and thus can be used the delimit the workload (i.e. sample the
counters at the workload boundaries), within an ongoing stream of periodic
counter snapshots.

There may be usecases wherein we need more than periodic OA capture mode
which is supported currently. This mode is primarily used for two usecases:
- Ability to capture system wide metrics, alongwith the ability to map
  the reports back to individual contexts (particularly for HSW).
- Ability to inject tags for work, into the reports. This provides
  visibility into the multiple stages of work within single context.

The userspace will be able to distinguish between the periodic and CS based
OA reports by the virtue of source_info sample field.

The command MI_REPORT_PERF_COUNT can be used to capture snapshots of OA
counters, and is inserted at BB boundaries.
The data thus captured will be stored in a separate buffer, which will
be different from the buffer used otherwise for periodic OA capture mode.
The metadata information pertaining to snapshot is maintained in a list,
which also has offsets into the gem buffer object per captured snapshot.
In order to track whether the gpu has completed processing the node,
a field pertaining to corresponding gem request is added, which is tracked
for completion of the command.

Both periodic and RCS based reports are associated with a single stream
(corresponding to render engine), and it is expected to have the samples
in the sequential order according to their timestamps. Now, since these
reports are collected in separate buffers, these are merge sorted at the
time of forwarding to userspace during the read call.

v2: Aligining with the non-perf interface (custom drm ioctl based). Also,
few related patches are squashed together for better readability

Signed-off-by: Sourab Gupta 
Signed-off-by: Robert Bragg 
---
 drivers/gpu/drm/i915/i915_drv.h|  44 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +
 drivers/gpu/drm/i915/i915_perf.c   | 895 -
 include/uapi/drm/i915_drm.h|  15 +
 4 files changed, 805 insertions(+), 153 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a6ac1c3..0561315 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1809,6 +1809,18 @@ struct i915_perf_stream_ops {
 * The stream will always be disabled before this is called.
 */
void (*destroy)(struct i915_perf_stream *stream);
+
+   /*
+* Routine to emit the commands in the command streamer associated
+* with the corresponding gpu engine.
+*/
+   void (*command_stream_hook)(struct drm_i915_gem_request *req);
+};
+
+enum i915_perf_stream_state {
+   I915_PERF_STREAM_DISABLED,
+   I915_PERF_STREAM_ENABLE_IN_PROGRESS,
+   I915_PERF_STREAM_ENABLED,
 };
 
 struct i915_perf_stream {
@@ -1816,11 +1828,16 @@ struct i915_perf_stream {
 
struct list_head link;
 
+   enum intel_engine_id engine;
u32 sample_flags;
int sample_size;
 
struct i915_gem_context *ctx;
bool enabled;
+   enum i915_perf_stream_state state;
+
+   /* Whether command stream based data collection is enabled */
+   bool cs_mode;
 
const struct i915_perf_stream_ops *ops;
 };
@@ -1838,10 +1855,22 @@ struct i915_oa_ops {
int (*read)(struct i915_perf_stream *stream,
char __user *buf,
size_t count,
-   size_t *offset);
+   size_t *offset,
+   u32 ts);
bool (*oa_buffer_is_empty)(struct drm_i915_private *dev_priv);
 };
 
+/*
+ * List element to hold info about the perf sample data associated
+ * with a particular GPU command stream.
+ */
+struct i915_perf_cs_data_node {
+   struct list_head link;
+   struct drm_i915_gem_request *request;
+   u32 offset;
+   u32 ctx_id;
+};
+
 struct drm_i915_private {
struct drm_device drm;
 
@@ -2149,6 +2178,8 @@ struct drm_i915_private {
struct ctl_table_header *sysctl_header;
 
struct mutex lock;
+
+   struct mutex streams_lock;
struct list_head streams;
 
spinlock_t hook_lock;
@@ -2195,6 +2226,16 @@ struct drm_i915_private {
const struct i915_oa_format

[Intel-gfx] [PATCH 03/15] drm/i915: Framework for capturing command stream based OA reports

2016-06-01 Thread sourab . gupta
From: Sourab Gupta 

This patch introduces a framework to enable OA counter reports associated
with Render command stream. We can then associate the reports captured
through this mechanism with their corresponding context id's. This can be
further extended to associate any other metadata information with the
corresponding samples (since the association with Render command stream
gives us the ability to capture these information while inserting the
corresponding capture commands into the command stream).

The OA reports generated in this way are associated with a corresponding
workload, and thus can be used the delimit the workload (i.e. sample the
counters at the workload boundaries), within an ongoing stream of periodic
counter snapshots.

There may be usecases wherein we need more than periodic OA capture mode
which is supported currently. This mode is primarily used for two usecases:
- Ability to capture system wide metrics, alongwith the ability to map
  the reports back to individual contexts (particularly for HSW).
- Ability to inject tags for work, into the reports. This provides
  visibility into the multiple stages of work within single context.

The userspace will be able to distinguish between the periodic and CS based
OA reports by the virtue of source_info sample field.

The command MI_REPORT_PERF_COUNT can be used to capture snapshots of OA
counters, and is inserted at BB boundaries.
The data thus captured will be stored in a separate buffer, which will
be different from the buffer used otherwise for periodic OA capture mode.
The metadata information pertaining to snapshot is maintained in a list,
which also has offsets into the gem buffer object per captured snapshot.
In order to track whether the gpu has completed processing the node,
a field pertaining to corresponding gem request is added, which is tracked
for completion of the command.

Both periodic and RCS based reports are associated with a single stream
(corresponding to render engine), and it is expected to have the samples
in the sequential order according to their timestamps. Now, since these
reports are collected in separate buffers, these are merge sorted at the
time of forwarding to userspace during the read call.

v2: Aligining with the non-perf interface (custom drm ioctl based). Also,
few related patches are squashed together for better readability

Signed-off-by: Sourab Gupta 
Signed-off-by: Robert Bragg 
---
 drivers/gpu/drm/i915/i915_drv.h|  44 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +
 drivers/gpu/drm/i915/i915_perf.c   | 871 -
 drivers/gpu/drm/i915/intel_lrc.c   |   4 +
 include/uapi/drm/i915_drm.h|  15 +
 5 files changed, 804 insertions(+), 134 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 82622c4..f95b02b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1791,6 +1791,18 @@ struct i915_perf_stream_ops {
 * The stream will always be disabled before this is called.
 */
void (*destroy)(struct i915_perf_stream *stream);
+
+   /*
+* Routine to emit the commands in the command streamer associated
+* with the corresponding gpu engine.
+*/
+   void (*command_stream_hook)(struct drm_i915_gem_request *req);
+};
+
+enum i915_perf_stream_state {
+   I915_PERF_STREAM_DISABLED,
+   I915_PERF_STREAM_ENABLE_IN_PROGRESS,
+   I915_PERF_STREAM_ENABLED,
 };
 
 struct i915_perf_stream {
@@ -1798,11 +1810,15 @@ struct i915_perf_stream {
 
struct list_head link;
 
+   enum intel_engine_id engine;
u32 sample_flags;
int sample_size;
 
struct intel_context *ctx;
-   bool enabled;
+   enum i915_perf_stream_state state;
+
+   /* Whether command stream based data collection is enabled */
+   bool cs_mode;
 
const struct i915_perf_stream_ops *ops;
 };
@@ -1818,10 +1834,21 @@ struct i915_oa_ops {
u32 ctx_id);
void (*legacy_ctx_switch_unlocked)(struct drm_i915_gem_request *req);
int (*read)(struct i915_perf_stream *stream,
-   struct i915_perf_read_state *read_state);
+   struct i915_perf_read_state *read_state, u32 ts);
bool (*oa_buffer_is_empty)(struct drm_i915_private *dev_priv);
 };
 
+/*
+ * List element to hold info about the perf sample data associated
+ * with a particular GPU command stream.
+ */
+struct i915_perf_cs_data_node {
+   struct list_head link;
+   struct drm_i915_gem_request *request;
+   u32 offset;
+   u32 ctx_id;
+};
+
 struct drm_i915_private {
struct drm_device *dev;
struct kmem_cache *objects;
@@ -2107,6 +2134,8 @@ struct drm_i915_private {
struct ctl_table_header *sysctl_header;
 
struct mutex lock;
+
+   struct mutex streams_lock;
 

Re: [Intel-gfx] [PATCH 03/15] drm/i915: Framework for capturing command stream based OA reports

2016-06-01 Thread Martin Peres

On 02/06/16 08:18, sourab.gu...@intel.com wrote:

From: Sourab Gupta 

This patch introduces a framework to enable OA counter reports associated
with Render command stream. We can then associate the reports captured
through this mechanism with their corresponding context id's. This can be
further extended to associate any other metadata information with the
corresponding samples (since the association with Render command stream
gives us the ability to capture these information while inserting the
corresponding capture commands into the command stream).

The OA reports generated in this way are associated with a corresponding
workload, and thus can be used the delimit the workload (i.e. sample the
counters at the workload boundaries), within an ongoing stream of periodic
counter snapshots.

There may be usecases wherein we need more than periodic OA capture mode
which is supported currently. This mode is primarily used for two usecases:
- Ability to capture system wide metrics, alongwith the ability to map
  the reports back to individual contexts (particularly for HSW).
- Ability to inject tags for work, into the reports. This provides
  visibility into the multiple stages of work within single context.

The userspace will be able to distinguish between the periodic and CS based
OA reports by the virtue of source_info sample field.

The command MI_REPORT_PERF_COUNT can be used to capture snapshots of OA
counters, and is inserted at BB boundaries.


So, it is possible to trigger a read of a set of counters (all?) from 
the pushbuffer buffer?


If so, I like this because I was wondering how would this work break 
when we move to the GuC-submission model.


Thanks for working on this!
Martin
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 03/15] drm/i915: Framework for capturing command stream based OA reports

2016-06-01 Thread sourab gupta
On Thu, 2016-06-02 at 11:30 +0530, Martin Peres wrote:
> On 02/06/16 08:18, sourab.gu...@intel.com wrote:
> > From: Sourab Gupta 
> >
> > This patch introduces a framework to enable OA counter reports associated
> > with Render command stream. We can then associate the reports captured
> > through this mechanism with their corresponding context id's. This can be
> > further extended to associate any other metadata information with the
> > corresponding samples (since the association with Render command stream
> > gives us the ability to capture these information while inserting the
> > corresponding capture commands into the command stream).
> >
> > The OA reports generated in this way are associated with a corresponding
> > workload, and thus can be used the delimit the workload (i.e. sample the
> > counters at the workload boundaries), within an ongoing stream of periodic
> > counter snapshots.
> >
> > There may be usecases wherein we need more than periodic OA capture mode
> > which is supported currently. This mode is primarily used for two usecases:
> > - Ability to capture system wide metrics, alongwith the ability to map
> >   the reports back to individual contexts (particularly for HSW).
> > - Ability to inject tags for work, into the reports. This provides
> >   visibility into the multiple stages of work within single context.
> >
> > The userspace will be able to distinguish between the periodic and CS based
> > OA reports by the virtue of source_info sample field.
> >
> > The command MI_REPORT_PERF_COUNT can be used to capture snapshots of OA
> > counters, and is inserted at BB boundaries.
> 
> So, it is possible to trigger a read of a set of counters (all?) from 
> the pushbuffer buffer?
> 
> If so, I like this because I was wondering how would this work break 
> when we move to the GuC-submission model.
> 
> Thanks for working on this!
> Martin

Yeah, we can trigger the capture of counter snapshot using command
MI_REPORT_PERF_COUNT inserted in the ringbuffer. The capture is
initiated on encountering this command, at an address specified in the
command arguments. The exact details of counters captured (report format
etc.) depends on the OA unit configuration done earlier.

-Sourab


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx