Re: [Intel-gfx] [PATCH 6/7] drm/i915/perf: add interrupt enabling parameter

2020-03-10 Thread Lionel Landwerlin

On 10/03/2020 22:08, Umesh Nerlige Ramappa wrote:

On Tue, Mar 03, 2020 at 02:19:04PM -0800, Umesh Nerlige Ramappa wrote:

From: Lionel Landwerlin 

This let's the application choose to be driven by the interrupt
mechanism of the HW. In conjuction with long periods for checks for
the availability of data on the CPU, this can reduce the CPU load when
doing capture of OA data.

v2: Version the new parameter (Joonas)
v3: Rebase (Umesh)

Signed-off-by: Lionel Landwerlin 
Signed-off-by: Umesh Nerlige Ramappa 
---
drivers/gpu/drm/i915/i915_perf.c | 58 +++-
include/uapi/drm/i915_drm.h  | 10 ++
2 files changed, 53 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c 
b/drivers/gpu/drm/i915/i915_perf.c

index 502961da840d..ab41cba85b40 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -252,7 +252,7 @@
 * oa_buffer_check().
 *
 * Most of the implementation details for this workaround are in
- * oa_buffer_check_unlocked() and _append_oa_reports()
+ * oa_buffer_check() and _append_oa_reports()
 *
 * Note for posterity: previously the driver used to define an 
effective tail
 * pointer that lagged the real pointer by a 'tail margin' measured 
in bytes
@@ -447,8 +447,9 @@ static u32 gen7_oa_hw_tail_read(struct 
i915_perf_stream *stream)

}

/**
- * oa_buffer_check_unlocked - check for data and update tail ptr state
+ * oa_buffer_check - check for data and update tail ptr state
 * @stream: i915 stream instance
+ * @lock: whether to take the oa_buffer spin lock
 *
 * This is either called via fops (for blocking reads in user ctx) or 
the poll
 * check hrtimer (atomic ctx) to check the OA buffer tail pointer and 
check
@@ -470,8 +471,9 @@ static u32 gen7_oa_hw_tail_read(struct 
i915_perf_stream *stream)

 *
 * Returns: %true if the OA buffer contains data, else %false
 */
-static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
+static bool oa_buffer_check(struct i915_perf_stream *stream, bool lock)


Hi Lionel,

All callers seem to set the lock to true when calling 
oa_buffer_check().  Do you recall why the parameter was introduced?  
If not, we probably want to remove this change.


Thanks,
Umesh



Err... Sorry, I don't remember.

It's probably a leftover the initial iteration where I was trying to get 
the OA head/tail register from the interrupt.


I guess you can drop that param and leave the function with the 
_unlocked prefix.



Thanks,


-Lionel


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 6/7] drm/i915/perf: add interrupt enabling parameter

2020-03-10 Thread Umesh Nerlige Ramappa

On Tue, Mar 03, 2020 at 02:19:04PM -0800, Umesh Nerlige Ramappa wrote:

From: Lionel Landwerlin 

This let's the application choose to be driven by the interrupt
mechanism of the HW. In conjuction with long periods for checks for
the availability of data on the CPU, this can reduce the CPU load when
doing capture of OA data.

v2: Version the new parameter (Joonas)
v3: Rebase (Umesh)

Signed-off-by: Lionel Landwerlin 
Signed-off-by: Umesh Nerlige Ramappa 
---
drivers/gpu/drm/i915/i915_perf.c | 58 +++-
include/uapi/drm/i915_drm.h  | 10 ++
2 files changed, 53 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 502961da840d..ab41cba85b40 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -252,7 +252,7 @@
 * oa_buffer_check().
 *
 * Most of the implementation details for this workaround are in
- * oa_buffer_check_unlocked() and _append_oa_reports()
+ * oa_buffer_check() and _append_oa_reports()
 *
 * Note for posterity: previously the driver used to define an effective tail
 * pointer that lagged the real pointer by a 'tail margin' measured in bytes
@@ -447,8 +447,9 @@ static u32 gen7_oa_hw_tail_read(struct i915_perf_stream 
*stream)
}

/**
- * oa_buffer_check_unlocked - check for data and update tail ptr state
+ * oa_buffer_check - check for data and update tail ptr state
 * @stream: i915 stream instance
+ * @lock: whether to take the oa_buffer spin lock
 *
 * This is either called via fops (for blocking reads in user ctx) or the poll
 * check hrtimer (atomic ctx) to check the OA buffer tail pointer and check
@@ -470,8 +471,9 @@ static u32 gen7_oa_hw_tail_read(struct i915_perf_stream 
*stream)
 *
 * Returns: %true if the OA buffer contains data, else %false
 */
-static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
+static bool oa_buffer_check(struct i915_perf_stream *stream, bool lock)


Hi Lionel,

All callers seem to set the lock to true when calling oa_buffer_check().  
Do you recall why the parameter was introduced?  If not, we probably 
want to remove this change.


Thanks,
Umesh
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 6/7] drm/i915/perf: add interrupt enabling parameter

2020-03-04 Thread Lionel Landwerlin

On 04/03/2020 07:47, Dixit, Ashutosh wrote:

On Tue, 03 Mar 2020 14:19:04 -0800, Umesh Nerlige Ramappa wrote:

From: Lionel Landwerlin 

This let's the application choose to be driven by the interrupt
mechanism of the HW. In conjuction with long periods for checks for
the availability of data on the CPU, this can reduce the CPU load when
doing capture of OA data.

v2: Version the new parameter (Joonas)
v3: Rebase (Umesh)

Signed-off-by: Lionel Landwerlin 
Signed-off-by: Umesh Nerlige Ramappa 

[snip]


+   /**
+* Specifying this property sets up the interrupt mechanism for the OA
+* buffer in i915. This option in conjuction with a long polling delay
+* for avaibility of OA data can reduce CPU load significantly if you
+* do not care about OA data being read as soon as it's available.
+*
+* This property is available in perf revision 5.
+*/
+   DRM_I915_PERF_PROP_OA_ENABLE_INTERRUPT,

What if we do not expose this parameter in the uapi at all and internally
decide in i915 whether to leave the interrupt either always enabled or
always disabled (and in that case always use the hrtimer)? This way we
retain flexibility in i915 if hardware evolves in the future e.g. to use
watermarks for the interrupt, without yielding control to userspace.

Overall I feel we should avoid exposing unnecessary details of the internal
implemenation to userspace, they would be neither interested in knowing
internal details nor know how to properly use these parameters. Shouldn't
the driver be able to make these kinds of decisions internally?

At this point the only parameter which implicitly exposed to userspace is
the hrtimer poll period, so perhaps all we need to do is to expose that in
the uapi? Thoughts?



I guess I agree with you. I can't remember why I exposed it to userspace.

There might be one test that checks the stream reports LOST_BUFFER with 
no poll() wakeup, but I guess we could update it.



-Lionel

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 6/7] drm/i915/perf: add interrupt enabling parameter

2020-03-03 Thread Dixit, Ashutosh
On Tue, 03 Mar 2020 14:19:04 -0800, Umesh Nerlige Ramappa wrote:
>
> From: Lionel Landwerlin 
>
> This let's the application choose to be driven by the interrupt
> mechanism of the HW. In conjuction with long periods for checks for
> the availability of data on the CPU, this can reduce the CPU load when
> doing capture of OA data.
>
> v2: Version the new parameter (Joonas)
> v3: Rebase (Umesh)
>
> Signed-off-by: Lionel Landwerlin 
> Signed-off-by: Umesh Nerlige Ramappa 

[snip]

> + /**
> +  * Specifying this property sets up the interrupt mechanism for the OA
> +  * buffer in i915. This option in conjuction with a long polling delay
> +  * for avaibility of OA data can reduce CPU load significantly if you
> +  * do not care about OA data being read as soon as it's available.
> +  *
> +  * This property is available in perf revision 5.
> +  */
> + DRM_I915_PERF_PROP_OA_ENABLE_INTERRUPT,

What if we do not expose this parameter in the uapi at all and internally
decide in i915 whether to leave the interrupt either always enabled or
always disabled (and in that case always use the hrtimer)? This way we
retain flexibility in i915 if hardware evolves in the future e.g. to use
watermarks for the interrupt, without yielding control to userspace.

Overall I feel we should avoid exposing unnecessary details of the internal
implemenation to userspace, they would be neither interested in knowing
internal details nor know how to properly use these parameters. Shouldn't
the driver be able to make these kinds of decisions internally?

At this point the only parameter which implicitly exposed to userspace is
the hrtimer poll period, so perhaps all we need to do is to expose that in
the uapi? Thoughts?
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 6/7] drm/i915/perf: add interrupt enabling parameter

2020-03-03 Thread Umesh Nerlige Ramappa
From: Lionel Landwerlin 

This let's the application choose to be driven by the interrupt
mechanism of the HW. In conjuction with long periods for checks for
the availability of data on the CPU, this can reduce the CPU load when
doing capture of OA data.

v2: Version the new parameter (Joonas)
v3: Rebase (Umesh)

Signed-off-by: Lionel Landwerlin 
Signed-off-by: Umesh Nerlige Ramappa 
---
 drivers/gpu/drm/i915/i915_perf.c | 58 +++-
 include/uapi/drm/i915_drm.h  | 10 ++
 2 files changed, 53 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 502961da840d..ab41cba85b40 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -252,7 +252,7 @@
  * oa_buffer_check().
  *
  * Most of the implementation details for this workaround are in
- * oa_buffer_check_unlocked() and _append_oa_reports()
+ * oa_buffer_check() and _append_oa_reports()
  *
  * Note for posterity: previously the driver used to define an effective tail
  * pointer that lagged the real pointer by a 'tail margin' measured in bytes
@@ -447,8 +447,9 @@ static u32 gen7_oa_hw_tail_read(struct i915_perf_stream 
*stream)
 }
 
 /**
- * oa_buffer_check_unlocked - check for data and update tail ptr state
+ * oa_buffer_check - check for data and update tail ptr state
  * @stream: i915 stream instance
+ * @lock: whether to take the oa_buffer spin lock
  *
  * This is either called via fops (for blocking reads in user ctx) or the poll
  * check hrtimer (atomic ctx) to check the OA buffer tail pointer and check
@@ -470,8 +471,9 @@ static u32 gen7_oa_hw_tail_read(struct i915_perf_stream 
*stream)
  *
  * Returns: %true if the OA buffer contains data, else %false
  */
-static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
+static bool oa_buffer_check(struct i915_perf_stream *stream, bool lock)
 {
+   u64 half_full_count = atomic64_read(&stream->half_full_count);
u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
int report_size = stream->oa_buffer.format_size;
unsigned long flags;
@@ -482,7 +484,8 @@ static bool oa_buffer_check_unlocked(struct 
i915_perf_stream *stream)
 * could result in an OA buffer reset which might reset the head,
 * tails[] and aged_tail state.
 */
-   spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
+   if (lock)
+   spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
 
hw_tail = stream->perf->ops.oa_hw_tail_read(stream);
 
@@ -558,7 +561,10 @@ static bool oa_buffer_check_unlocked(struct 
i915_perf_stream *stream)
stream->oa_buffer.aging_timestamp = now;
}
 
-   spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
+   stream->half_full_count_last = half_full_count;
+
+   if (lock)
+   spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
 
return OA_TAKEN(stream->oa_buffer.tail - gtt_offset,
stream->oa_buffer.head - gtt_offset) >= report_size;
@@ -1169,9 +1175,9 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
  * i915_oa_wait_unlocked - handles blocking IO until OA data available
  * @stream: An i915-perf stream opened for OA metrics
  *
- * Called when userspace tries to read() from a blocking stream FD opened
- * for OA metrics. It waits until the hrtimer callback finds a non-empty
- * OA buffer and wakes us.
+ * Called when userspace tries to read() from a blocking stream FD opened for
+ * OA metrics. It waits until either the hrtimer callback finds a non-empty OA
+ * buffer or the OA interrupt kicks in and wakes us.
  *
  * Note: it's acceptable to have this return with some false positives
  * since any subsequent read handling will return -EAGAIN if there isn't
@@ -1186,7 +1192,7 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream 
*stream)
return -EIO;
 
return wait_event_interruptible(stream->poll_wq,
-   oa_buffer_check_unlocked(stream));
+   oa_buffer_check(stream, true));
 }
 
 /**
@@ -2733,6 +2739,10 @@ static void i915_oa_stream_disable(struct 
i915_perf_stream *stream)
 {
stream->perf->ops.oa_disable(stream);
 
+   stream->half_full_count_last = 0;
+   atomic64_set(&stream->half_full_count,
+stream->half_full_count_last);
+
if (stream->periodic)
hrtimer_cancel(&stream->poll_check_timer);
 }
@@ -3075,7 +3085,7 @@ static enum hrtimer_restart oa_poll_check_timer_cb(struct 
hrtimer *hrtimer)
struct i915_perf_stream *stream =
container_of(hrtimer, typeof(*stream), poll_check_timer);
 
-   if (oa_buffer_check_unlocked(stream)) {
+   if (oa_buffer_check(stream, true)) {
stream->pollin = true;
wake_up(&stream->poll_wq);
}
@@ -3109,6 +3119,16 @@ static __poll_

[Intel-gfx] [PATCH 6/7] drm/i915/perf: add interrupt enabling parameter

2019-01-16 Thread Lionel Landwerlin
This let's the application choose to be driven by the interrupt
mechanism of the HW. In conjuction with long periods for checks for
the availability of data on the CPU, this can reduce the CPU load when
doing capture of OA data.

Signed-off-by: Lionel Landwerlin 
---
 drivers/gpu/drm/i915/i915_perf.c | 54 +++-
 include/uapi/drm/i915_drm.h  |  8 +
 2 files changed, 48 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 82c4e0c08859..da721fce2543 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -243,7 +243,7 @@
  * oa_buffer_check().
  *
  * Most of the implementation details for this workaround are in
- * oa_buffer_check_unlocked() and _append_oa_reports()
+ * oa_buffer_check() and _append_oa_reports()
  *
  * Note for posterity: previously the driver used to define an effective tail
  * pointer that lagged the real pointer by a 'tail margin' measured in bytes
@@ -418,9 +418,11 @@ static u32 gen7_oa_hw_tail_read(struct drm_i915_private 
*dev_priv)
return oastatus1 & GEN7_OASTATUS1_TAIL_MASK;
 }
 
+
 /**
- * oa_buffer_check_unlocked - check for data and update tail ptr state
+ * oa_buffer_check - check for data and update tail ptr state
  * @dev_priv: i915 device instance
+ * @lock: whether to take the oa_buffer spin lock
  *
  * This is either called via fops (for blocking reads in user ctx) or the poll
  * check hrtimer (atomic ctx) to check the OA buffer tail pointer and check
@@ -442,8 +444,9 @@ static u32 gen7_oa_hw_tail_read(struct drm_i915_private 
*dev_priv)
  *
  * Returns: %true if the OA buffer contains data, else %false
  */
-static bool oa_buffer_check_unlocked(struct drm_i915_private *dev_priv)
+static bool oa_buffer_check(struct drm_i915_private *dev_priv, bool lock)
 {
+   u64 half_full_count = atomic64_read(&dev_priv->perf.oa.half_full_count);
u32 gtt_offset = i915_ggtt_offset(dev_priv->perf.oa.oa_buffer.vma);
int report_size = dev_priv->perf.oa.oa_buffer.format_size;
unsigned long flags;
@@ -454,7 +457,8 @@ static bool oa_buffer_check_unlocked(struct 
drm_i915_private *dev_priv)
 * could result in an OA buffer reset which might reset the head,
 * tails[] and aged_tail state.
 */
-   spin_lock_irqsave(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags);
+   if (lock)
+   spin_lock_irqsave(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags);
 
hw_tail = dev_priv->perf.oa.ops.oa_hw_tail_read(dev_priv);
 
@@ -530,7 +534,10 @@ static bool oa_buffer_check_unlocked(struct 
drm_i915_private *dev_priv)
dev_priv->perf.oa.oa_buffer.aging_timestamp = now;
}
 
-   spin_unlock_irqrestore(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags);
+   dev_priv->perf.oa.half_full_count_last = half_full_count;
+
+   if (lock)
+   spin_unlock_irqrestore(&dev_priv->perf.oa.oa_buffer.ptr_lock, 
flags);
 
return OA_TAKEN(dev_priv->perf.oa.oa_buffer.tail - gtt_offset,
dev_priv->perf.oa.oa_buffer.head - gtt_offset) >= 
report_size;
@@ -1124,9 +1131,9 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
  * i915_oa_wait_unlocked - handles blocking IO until OA data available
  * @stream: An i915-perf stream opened for OA metrics
  *
- * Called when userspace tries to read() from a blocking stream FD opened
- * for OA metrics. It waits until the hrtimer callback finds a non-empty
- * OA buffer and wakes us.
+ * Called when userspace tries to read() from a blocking stream FD opened for
+ * OA metrics. It waits until either the hrtimer callback finds a non-empty OA
+ * buffer or the OA interrupt kicks in and wakes us.
  *
  * Note: it's acceptable to have this return with some false positives
  * since any subsequent read handling will return -EAGAIN if there isn't
@@ -1143,7 +1150,7 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream 
*stream)
return -EIO;
 
return wait_event_interruptible(dev_priv->perf.oa.poll_wq,
-   oa_buffer_check_unlocked(dev_priv));
+   oa_buffer_check(dev_priv, true));
 }
 
 /**
@@ -1964,6 +1971,10 @@ static void i915_oa_stream_disable(struct 
i915_perf_stream *stream)
 
dev_priv->perf.oa.ops.oa_disable(stream);
 
+   dev_priv->perf.oa.half_full_count_last = 0;
+   atomic64_set(&dev_priv->perf.oa.half_full_count,
+dev_priv->perf.oa.half_full_count_last);
+
if (dev_priv->perf.oa.periodic)
hrtimer_cancel(&dev_priv->perf.oa.poll_check_timer);
 }
@@ -2290,7 +2301,7 @@ static enum hrtimer_restart oa_poll_check_timer_cb(struct 
hrtimer *hrtimer)
 perf.oa.poll_check_timer);
struct i915_perf_stream *stream = dev_priv->perf.oa.exclusive_stream;
 
-   if (oa_buffer_check_unlocked(dev_priv)) {
+   if (oa_buffer_c