[Intel-gfx] [RFC] drm/i915: Add a new modparam for customized ring multiplier

2017-12-18 Thread Jackie Li
From: Zhipeng Gong 

SKL platforms requires a higher ring multiplier when there's massive
GPU load. Current driver doesn't provide a way to override the ring
multiplier.

This patch adds a new module parameter to allow the overriding of
ring multiplier for Gen9 platforms.

Cc: Ben Widawsky 
Cc: Chris Wilson 
Cc: Oscar Mateo 
Cc: Sagar Arun Kamble 
Cc: Dmitry Rogozhkin 
Cc: Zhipeng Gong 
Signed-off-by: Jackie Li 
---
 drivers/gpu/drm/i915/i915_params.c |  4 +++
 drivers/gpu/drm/i915/i915_params.h |  1 +
 drivers/gpu/drm/i915/intel_pm.c| 57 --
 3 files changed, 53 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_params.c 
b/drivers/gpu/drm/i915/i915_params.c
index 8dfea03..f3dd179 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -101,6 +101,10 @@ i915_param_named_unsafe(disable_power_well, int, 0400,
"Disable display power wells when possible "
"(-1=auto [default], 0=power wells always on, 1=power wells disabled 
when possible)");
 
+i915_param_named_unsafe(ring_multiplier, uint, 0400,
+   "Override Ring/GT frequency multiplier."
+   "(2=2x ring multiplier [defaut], 3=3x ring multiplier)");
+
 i915_param_named_unsafe(enable_ips, int, 0600, "Enable IPS (default: true)");
 
 i915_param_named(fastboot, bool, 0600,
diff --git a/drivers/gpu/drm/i915/i915_params.h 
b/drivers/gpu/drm/i915/i915_params.h
index 792ce26..6073d1c 100644
--- a/drivers/gpu/drm/i915/i915_params.h
+++ b/drivers/gpu/drm/i915/i915_params.h
@@ -43,6 +43,7 @@
param(int, enable_ppgtt, -1) \
param(int, enable_psr, -1) \
param(int, disable_power_well, -1) \
+   param(unsigned int, ring_multiplier, 2) \
param(int, enable_ips, 1) \
param(int, invert_brightness, 0) \
param(int, enable_guc, 0) \
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index a349c4f..080051b 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -6818,6 +6818,34 @@ static void gen6_enable_rps(struct drm_i915_private 
*dev_priv)
intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 }
 
+static inline void sanitize_ring_multiplier(struct drm_i915_private *i915)
+{
+   unsigned int m = i915_modparams.ring_multiplier;
+
+   /* Currently, only support 2x or 3x multipliers */
+   if (m != 2 && m != 3)
+   i915_modparams.ring_multiplier = 2;
+}
+
+static inline void get_ring_multiplier(struct drm_i915_private *i915,
+ unsigned int *numer,
+ unsigned int *denom)
+{
+   sanitize_ring_multiplier(i915);
+
+   if (IS_GEN9(i915)) {
+   *numer = i915_modparams.ring_multiplier;
+   *denom = 1;
+   } else if IS_HASWELL(i915) {
+   *numer = 5;
+   *denom = 2;
+   } else {
+   /* Use a 2x ring multiplier by default */
+   *numer = 2;
+   *denom = 1;
+   }
+}
+
 static void gen6_update_ring_freq(struct drm_i915_private *dev_priv)
 {
struct intel_rps *rps = &dev_priv->gt_pm.rps;
@@ -6825,6 +6853,8 @@ static void gen6_update_ring_freq(struct drm_i915_private 
*dev_priv)
unsigned int gpu_freq;
unsigned int max_ia_freq, min_ring_freq;
unsigned int max_gpu_freq, min_gpu_freq;
+   unsigned int ring_mul_numer, ring_mul_denom;
+   unsigned int ring_request_freq;
int scaling_factor = 180;
struct cpufreq_policy *policy;
 
@@ -6858,6 +6888,9 @@ static void gen6_update_ring_freq(struct drm_i915_private 
*dev_priv)
max_gpu_freq = rps->max_freq;
}
 
+
+   get_ring_multiplier(dev_priv, &ring_mul_numer, &ring_mul_denom);
+
/*
 * For each potential GPU frequency, load a ring frequency we'd like
 * to use for memory access.  We do this by specifying the IA frequency
@@ -6867,18 +6900,24 @@ static void gen6_update_ring_freq(struct 
drm_i915_private *dev_priv)
int diff = max_gpu_freq - gpu_freq;
unsigned int ia_freq = 0, ring_freq = 0;
 
+   /*
+* ring_request_freq = ring_multiplier * GPU frequency.
+* Ring freq is in 100MHz units while GPU frequency is in 50MHz
+* units.
+*/
+   ring_request_freq = mult_frac(gpu_freq, ring_mul_numer,
+ 2 * ring_mul_denom);
+
if (IS_GEN9_BC(dev_priv) || IS_CANNONLAKE(dev_priv)) {
/*
-* ring_freq = 2 * GT. ring_freq is in 100MHz units
-* No floor required for ring frequency on SKL.
+* ring_freq = ring request freq. No floor required for
+* ring frequency on SKL.
 */
-   ring_freq = gpu_freq;
-  

Re: [Intel-gfx] [RFC] drm/i915: Add a new modparam for customized ring multiplier

2017-12-18 Thread Chris Wilson
Quoting Jackie Li (2017-12-18 21:22:08)
> From: Zhipeng Gong 
> 
> SKL platforms requires a higher ring multiplier when there's massive
> GPU load. Current driver doesn't provide a way to override the ring
> multiplier.
> 
> This patch adds a new module parameter to allow the overriding of
> ring multiplier for Gen9 platforms.

So the default ring-scaling is not good enough, the first thing we do is
to try and ensure the defaults work for nearly all use cases. My
impression is that you want a nonlinear scalefactor, low power workloads
don't try and ramp up the ring frequencies as aggressively, high power
workloads try hard for higher frequencies, and then get throttled back
harder as well. How well can we autotune it? What events tells us if the
ratio is too high or too low?

Adding an untested unsafe modparam is a big no. If you want us to
support such a parameter, we at least need it not to taint the kernel
and come with some tests that prove it functions as intended. Better
still might be a I915_SETPARAM (or hooking up the write func of the
modparam) to support changing the value on the fly.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC] drm/i915: Add a new modparam for customized ring multiplier

2017-12-18 Thread Yaodong Li

On 12/18/2017 01:47 PM, Chris Wilson wrote:

Quoting Jackie Li (2017-12-18 21:22:08)

From: Zhipeng Gong 

SKL platforms requires a higher ring multiplier when there's massive
GPU load. Current driver doesn't provide a way to override the ring
multiplier.

This patch adds a new module parameter to allow the overriding of
ring multiplier for Gen9 platforms.

So the default ring-scaling is not good enough, the first thing we do is
to try and ensure the defaults work for nearly all use cases. My
impression is that you want a nonlinear scalefactor, low power workloads
don't try and ramp up the ring frequencies as aggressively, high power
workloads try hard for higher frequencies, and then get throttled back
harder as well. How well can we autotune it? What events tells us if the
ratio is too high or too low?

Thanks Chris! the background was that we found that the performance
would drop on some SKL platforms with the default ring scaling factor.
and use of a 3x ring scaler can solve this problem.

One of the trigger of this issue is when there's massive GPU workloads
while having little/negligible CPU load - currently, we have no way to 
monitor

GPU/CPU workload in our driver. This is the reason why we use a modparam.
Do you have any suggestion about this?

Zhipeng and Dmitry may add more insights on this issue and related tuning.

Adding an untested unsafe modparam is a big no. If you want us to
support such a parameter, we at least need it not to taint the kernel
and come with some tests that prove it functions as intended. Better
still might be a I915_SETPARAM (or hooking up the write func of the
modparam) to support changing the value on the fly.

the intention of this patch was to add a way to override the
default 2x ring scaling factor so that we would be able choose
a different ring scaler for SKL platforms which were known that would
have massive GPU workload. And related changes (using a 3x scaler)
had been tested.

It would be great if we can do this on the fly.  However, we're going
to need a way to monitor the trigger events in order to change the
value dynamically.

By saying use I915_SETPARAM (or different write func), are you suggesting
to request the new ring freq directly while setting the new ring scaling 
factor?


Regards,
-Jackie

-Chris


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC] drm/i915: Add a new modparam for customized ring multiplier

2017-12-26 Thread Chris Wilson
Quoting Chris Wilson (2017-12-18 21:47:25)
> Quoting Jackie Li (2017-12-18 21:22:08)
> > From: Zhipeng Gong 
> > 
> > SKL platforms requires a higher ring multiplier when there's massive
> > GPU load. Current driver doesn't provide a way to override the ring
> > multiplier.
> > 
> > This patch adds a new module parameter to allow the overriding of
> > ring multiplier for Gen9 platforms.
> 
> So the default ring-scaling is not good enough, the first thing we do is
> to try and ensure the defaults work for nearly all use cases. My
> impression is that you want a nonlinear scalefactor, low power workloads
> don't try and ramp up the ring frequencies as aggressively, high power
> workloads try hard for higher frequencies, and then get throttled back
> harder as well. How well can we autotune it? What events tells us if the
> ratio is too high or too low?

One thing that came to mind is that we don't know the min/max ring
frequencies and just program them blindly. Is it the case that at max
gpu freq, there is still headroom on the ring freq, or do you require a
steeper ramp so that you hit the max ringfreq earlier for your workload
(which then presumably can run at less than max gpufreq, so pushing
power elsewhere).
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC] drm/i915: Add a new modparam for customized ring multiplier

2017-12-26 Thread Rogozhkin, Dmitry V
Clarification on the issue. Consider that you have a massive load on GT and 
just tiny one on IA. If GT will program the RING frequency to be lower than IA 
frequency, then you will fall into the situation when RING frequency constantly 
transits from GT to IA level and back. Each transition of a RING frequency is a 
full system stall. If you will have "good" transition rate with few transitions 
per few milliseconds you will lose ~10% of performance. That's the case for 
media workloads when you easily can step into this since 1) media utilizes few 
GPU engines and with few parallel workloads you can make sure that at least 1 
engine is _always_ doing something, 2) media BB are relatively small, so you 
have regular wakeups of the IA to manage requests. This will affect Gen9 
platforms due to HW design change (we've spot this in SKL). This will not 
happen in Gen8 (old HW design). This will be fixed in Gen10+ (CNL+).

On SKL we ran into this with the GPU frequency pinned to 700MHz, CPU to 2GHz. 
Multipliers were x2 for GT, x1 for IA.

So, effectively, what we need to do is to make sure that RING frequency request 
from GT is _not_ below the request from IA. If IA requests 2GHz, we can't 
request 1.4GHz, we need request at least 2GHz. Multiplier patch was intended to 
do exactly that, but manually. Can  we somehow automate that managing IA 
frequency requests to the RING?

Dmitry.

-Original Message-
From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] 
Sent: Tuesday, December 26, 2017 6:36 AM
To: Li, Yaodong ; intel-gfx@lists.freedesktop.org
Cc: Gong, Zhipeng ; Widawsky, Benjamin 
; Mateo Lozano, Oscar ; 
Kamble, Sagar A ; Rogozhkin, Dmitry V 
; Li, Yaodong 
Subject: Re: [RFC] drm/i915: Add a new modparam for customized ring multiplier

Quoting Chris Wilson (2017-12-18 21:47:25)
> Quoting Jackie Li (2017-12-18 21:22:08)
> > From: Zhipeng Gong 
> > 
> > SKL platforms requires a higher ring multiplier when there's massive 
> > GPU load. Current driver doesn't provide a way to override the ring 
> > multiplier.
> > 
> > This patch adds a new module parameter to allow the overriding of 
> > ring multiplier for Gen9 platforms.
> 
> So the default ring-scaling is not good enough, the first thing we do 
> is to try and ensure the defaults work for nearly all use cases. My 
> impression is that you want a nonlinear scalefactor, low power 
> workloads don't try and ramp up the ring frequencies as aggressively, 
> high power workloads try hard for higher frequencies, and then get 
> throttled back harder as well. How well can we autotune it? What 
> events tells us if the ratio is too high or too low?

One thing that came to mind is that we don't know the min/max ring frequencies 
and just program them blindly. Is it the case that at max gpu freq, there is 
still headroom on the ring freq, or do you require a steeper ramp so that you 
hit the max ringfreq earlier for your workload (which then presumably can run 
at less than max gpufreq, so pushing power elsewhere).
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC] drm/i915: Add a new modparam for customized ring multiplier

2017-12-26 Thread Chris Wilson
Quoting Rogozhkin, Dmitry V (2017-12-26 16:39:23)
> Clarification on the issue. Consider that you have a massive load on GT and 
> just tiny one on IA. If GT will program the RING frequency to be lower than 
> IA frequency, then you will fall into the situation when RING frequency 
> constantly transits from GT to IA level and back. Each transition of a RING 
> frequency is a full system stall. If you will have "good" transition rate 
> with few transitions per few milliseconds you will lose ~10% of performance. 
> That's the case for media workloads when you easily can step into this since 
> 1) media utilizes few GPU engines and with few parallel workloads you can 
> make sure that at least 1 engine is _always_ doing something, 2) media BB are 
> relatively small, so you have regular wakeups of the IA to manage requests. 
> This will affect Gen9 platforms due to HW design change (we've spot this in 
> SKL). This will not happen in Gen8 (old HW design). This will be fixed in 
> Gen10+ (CNL+).

To clarify, the HW will flip between the two GT/IA requests rather than
stick to the highest? Iirc, the expectation was that we were setting a
requested minimum frequency for the ring/ia based off the gpu freq.

> On SKL we ran into this with the GPU frequency pinned to 700MHz, CPU to 2GHz. 
> Multipliers were x2 for GT, x1 for IA.

Basically, with the GPU clocked to mid frequency, memory throughput is
insufficient to keep the fixed functions occupied, and you need to
increase the ring frequency. Is there ever a case where we don't need
max ring frequency? (Perhaps we still need to set low frequency for GT
idle?) I guess media is more susceptible to this as that workload should
be sustainable at reduced clocks, GL et al are much more likely to keep
the clocks ramped all the way up.

Do you know anything about the opposite position. I heard a suggestion
that simply increasing the ringfreq universally caused thermal
throttling in some other workloads. Do you have any knowledge of those?
 
> So, effectively, what we need to do is to make sure that RING frequency 
> request from GT is _not_ below the request from IA. If IA requests 2GHz, we 
> can't request 1.4GHz, we need request at least 2GHz. Multiplier patch was 
> intended to do exactly that, but manually. Can  we somehow automate that 
> managing IA frequency requests to the RING?

You are thinking of plugging into intel_pstate to make it smarter for ia
freq transitions? That seems possible, certainly. I'm not sure if the
ring frequency is actually poked from anywhere else in the kernel, would
be interesting to find out.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC] drm/i915: Add a new modparam for customized ring multiplier

2017-12-26 Thread Rogozhkin, Dmitry V
>> To clarify, the HW will flip between the two GT/IA requests rather than 
>> stick to the highest?

Yes, it will flip on Gen9. On Gen8 there was some mechanism (HW) which 
flattened that. But it was removed/substituted in Gen9. In Gen10 it was tuned  
to close the mentioned issue.

>> Do you know anything about the opposite position. I heard a suggestion that 
>> simply increasing the ringfreq universally caused thermal throttling in some 
>> other workloads. Do you have any knowledge of those?

Initially we tried to just increase GT multiplier to x3 and stepped into the 
throttling. Thus, we introduced parameter to be able to mitigate all that 
depending on the SKU and user needs. I definitely asked what will be if GT 
request will be bigger than IA request. But that was a year ago and I don't 
remember the answer. Let me ask again. I will mail back in few days.

>> You are thinking of plugging into intel_pstate to make it smarter for ia 
>> freq transitions?

Yep. This seems a correct step to give some automatic support instead of 
parameter/hardcoded multiplier.

Dmitry.

-Original Message-
From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] 
Sent: Tuesday, December 26, 2017 8:59 AM
To: Rogozhkin, Dmitry V ; Li, Yaodong 
; intel-gfx@lists.freedesktop.org
Cc: Gong, Zhipeng ; Widawsky, Benjamin 
; Mateo Lozano, Oscar ; 
Kamble, Sagar A ; Li, Yaodong 
Subject: RE: [RFC] drm/i915: Add a new modparam for customized ring multiplier

Quoting Rogozhkin, Dmitry V (2017-12-26 16:39:23)
> Clarification on the issue. Consider that you have a massive load on GT and 
> just tiny one on IA. If GT will program the RING frequency to be lower than 
> IA frequency, then you will fall into the situation when RING frequency 
> constantly transits from GT to IA level and back. Each transition of a RING 
> frequency is a full system stall. If you will have "good" transition rate 
> with few transitions per few milliseconds you will lose ~10% of performance. 
> That's the case for media workloads when you easily can step into this since 
> 1) media utilizes few GPU engines and with few parallel workloads you can 
> make sure that at least 1 engine is _always_ doing something, 2) media BB are 
> relatively small, so you have regular wakeups of the IA to manage requests. 
> This will affect Gen9 platforms due to HW design change (we've spot this in 
> SKL). This will not happen in Gen8 (old HW design). This will be fixed in 
> Gen10+ (CNL+).

To clarify, the HW will flip between the two GT/IA requests rather than stick 
to the highest? Iirc, the expectation was that we were setting a requested 
minimum frequency for the ring/ia based off the gpu freq.

> On SKL we ran into this with the GPU frequency pinned to 700MHz, CPU to 2GHz. 
> Multipliers were x2 for GT, x1 for IA.

Basically, with the GPU clocked to mid frequency, memory throughput is 
insufficient to keep the fixed functions occupied, and you need to increase the 
ring frequency. Is there ever a case where we don't need max ring frequency? 
(Perhaps we still need to set low frequency for GT
idle?) I guess media is more susceptible to this as that workload should be 
sustainable at reduced clocks, GL et al are much more likely to keep the clocks 
ramped all the way up.

Do you know anything about the opposite position. I heard a suggestion that 
simply increasing the ringfreq universally caused thermal throttling in some 
other workloads. Do you have any knowledge of those?
 
> So, effectively, what we need to do is to make sure that RING frequency 
> request from GT is _not_ below the request from IA. If IA requests 2GHz, we 
> can't request 1.4GHz, we need request at least 2GHz. Multiplier patch was 
> intended to do exactly that, but manually. Can  we somehow automate that 
> managing IA frequency requests to the RING?

You are thinking of plugging into intel_pstate to make it smarter for ia freq 
transitions? That seems possible, certainly. I'm not sure if the ring frequency 
is actually poked from anywhere else in the kernel, would be interesting to 
find out.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC] drm/i915: Add a new modparam for customized ring multiplier

2017-12-27 Thread Rogozhkin, Dmitry V
>> I definitely asked what will be if GT request will be bigger than IA 
>> request. But that was a year ago and I don't remember the answer. Let me ask 
>> again. I will mail back in few days.

Hi Chris, here is the response.

Question was: "Whether we can meet with the RING transition penalty (at least 
theoretically) if we will have GT request higher than IA request with the 
dominant IA load and tiny GT load, i.e. reverted situation of what we have 
actually faced? For example, if we will try to pin IA frequency to 800MHz (x1 
multiplier) and GT frequency to 700MHz (x2 multiplier): in that case we will 
have requests for ring 800 vs. 1400."

Answer is: "In this case, if the GT will toggle between RC0 and RC6, it will 
force ring frequency to toggle between 800 and 1400, which in the toggling time 
will stall IA execution. This will lead to performance loss." However, this is 
a case if we have really few toggle events within few milliseconds. It is quite 
probable that GT driver will not allow such behavior to happen if it simply 
doesn't often toggle between RC0 and RC6. Considering that GT driver probably 
handles much less interrupts than IA, this can be the case. So, I think Chris, 
that's now question to you: how often you toggle between RC0 and RC6 to see the 
reverted issue to happen? If you don't toggle much, then RING will simply 
remain on 1400 almost all the time and you will see no issue.

Again, I remind that's the talk about Gen9 only.

Dmitry.

-Original Message-
From: Intel-gfx [mailto:intel-gfx-boun...@lists.freedesktop.org] On Behalf Of 
Rogozhkin, Dmitry V
Sent: Tuesday, December 26, 2017 9:39 AM
To: Chris Wilson ; Li, Yaodong 
; intel-gfx@lists.freedesktop.org
Cc: Widawsky, Benjamin 
Subject: Re: [Intel-gfx] [RFC] drm/i915: Add a new modparam for customized ring 
multiplier

>> To clarify, the HW will flip between the two GT/IA requests rather than 
>> stick to the highest?

Yes, it will flip on Gen9. On Gen8 there was some mechanism (HW) which 
flattened that. But it was removed/substituted in Gen9. In Gen10 it was tuned  
to close the mentioned issue.

>> Do you know anything about the opposite position. I heard a suggestion that 
>> simply increasing the ringfreq universally caused thermal throttling in some 
>> other workloads. Do you have any knowledge of those?

Initially we tried to just increase GT multiplier to x3 and stepped into the 
throttling. Thus, we introduced parameter to be able to mitigate all that 
depending on the SKU and user needs. I definitely asked what will be if GT 
request will be bigger than IA request. But that was a year ago and I don't 
remember the answer. Let me ask again. I will mail back in few days.

>> You are thinking of plugging into intel_pstate to make it smarter for ia 
>> freq transitions?

Yep. This seems a correct step to give some automatic support instead of 
parameter/hardcoded multiplier.

Dmitry.

-Original Message-
From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] 
Sent: Tuesday, December 26, 2017 8:59 AM
To: Rogozhkin, Dmitry V ; Li, Yaodong 
; intel-gfx@lists.freedesktop.org
Cc: Gong, Zhipeng ; Widawsky, Benjamin 
; Mateo Lozano, Oscar ; 
Kamble, Sagar A ; Li, Yaodong 
Subject: RE: [RFC] drm/i915: Add a new modparam for customized ring multiplier

Quoting Rogozhkin, Dmitry V (2017-12-26 16:39:23)
> Clarification on the issue. Consider that you have a massive load on GT and 
> just tiny one on IA. If GT will program the RING frequency to be lower than 
> IA frequency, then you will fall into the situation when RING frequency 
> constantly transits from GT to IA level and back. Each transition of a RING 
> frequency is a full system stall. If you will have "good" transition rate 
> with few transitions per few milliseconds you will lose ~10% of performance. 
> That's the case for media workloads when you easily can step into this since 
> 1) media utilizes few GPU engines and with few parallel workloads you can 
> make sure that at least 1 engine is _always_ doing something, 2) media BB are 
> relatively small, so you have regular wakeups of the IA to manage requests. 
> This will affect Gen9 platforms due to HW design change (we've spot this in 
> SKL). This will not happen in Gen8 (old HW design). This will be fixed in 
> Gen10+ (CNL+).

To clarify, the HW will flip between the two GT/IA requests rather than stick 
to the highest? Iirc, the expectation was that we were setting a requested 
minimum frequency for the ring/ia based off the gpu freq.

> On SKL we ran into this with the GPU frequency pinned to 700MHz, CPU to 2GHz. 
> Multipliers were x2 for GT, x1 for IA.

Basically, with the GPU clocked to mid frequency, memory throughput is 
insufficient to keep the fixed functions occupied, and you need to increase the 
ring frequency. Is 

Re: [Intel-gfx] [RFC] drm/i915: Add a new modparam for customized ring multiplier

2018-01-03 Thread Yaodong Li



You are thinking of plugging into intel_pstate to make it smarter for ia freq 
transitions?

Yep. This seems a correct step to give some automatic support instead of 
parameter/hardcoded multiplier.

Does this mean we should use cpufreq/intel_pstate based approach instead 
of the current modparam solution for Gen9?


Some concerns and questions about intel_pstate approach:
a) Currently, we cannot get the accurate pstate/target freq value from 
cpufreq in intel_pstate active mode since
 these values won't be exported to cpufreq layer, so if we won't 
change intel_pstate code then we only can get

 the max cpu freq of a new policy.
b) intel_pstate policy is attached to each logic cpu, which means we 
will receive policy/freq transition notification
    for each logic cpu freq change. One question is how we are going to 
decide the freq of the ring? just use the max

    cpu freq reported?
c) With the intel_pstate approach we may still run into thermal 
throttling, in this case, can a certain cooling device

    be triggered to lower the cpu freq?

Thanks and Regards,
-Jackie

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC] drm/i915: Add a new modparam for customized ring multiplier

2018-01-03 Thread Sagar Arun Kamble
Since ring frequency programming needs consideration of both IA and GT 
frequency requests I think keeping the logic
to program the ring frequency table in driver that monitors both IA/GT 
busyness and power budgets like intel_ips
will be more appropriate. intel_ips is relying on global load derived 
from all CPUs.
I understand that power awareness and busyness based policy might be 
trickier but having that as tunable will give better flexibility.


On 1/3/2018 11:51 PM, Yaodong Li wrote:


You are thinking of plugging into intel_pstate to make it smarter 
for ia freq transitions?
Yep. This seems a correct step to give some automatic support instead 
of parameter/hardcoded multiplier.


Does this mean we should use cpufreq/intel_pstate based approach 
instead of the current modparam solution for Gen9?


Some concerns and questions about intel_pstate approach:
a) Currently, we cannot get the accurate pstate/target freq value from 
cpufreq in intel_pstate active mode since
 these values won't be exported to cpufreq layer, so if we won't 
change intel_pstate code then we only can get

 the max cpu freq of a new policy.
b) intel_pstate policy is attached to each logic cpu, which means we 
will receive policy/freq transition notification
    for each logic cpu freq change. One question is how we are going 
to decide the freq of the ring? just use the max

    cpu freq reported?
c) With the intel_pstate approach we may still run into thermal 
throttling, in this case, can a certain cooling device

    be triggered to lower the cpu freq?

Thanks and Regards,
-Jackie



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC] drm/i915: Add a new modparam for customized ring multiplier

2018-01-04 Thread Yaodong Li


On 01/03/2018 10:10 PM, Sagar Arun Kamble wrote:
Since ring frequency programming needs consideration of both IA and GT 
frequency requests I think keeping the logic
to program the ring frequency table in driver that monitors both IA/GT 
busyness and power budgets like intel_ips
will be more appropriate. intel_ips is relying on global load derived 
from all CPUs.
I understand that power awareness and busyness based policy might be 
trickier but having that as tunable will give better flexibility.


By just looking into the current code, the way intel_ips checks gpu 
busyness cannot reflect the actual workload of GT
(e.g. gpu busy is true even if there's only one pending request), in 
this case, we shall not increase the ring freq if we
want to use a "workload monitoring" based solution. so we need a more 
accurate way to monitor the current GT workload

(e.g.  when the pending request count reaches a center tunable threshold??).


On 1/3/2018 11:51 PM, Yaodong Li wrote:


You are thinking of plugging into intel_pstate to make it smarter 
for ia freq transitions?
Yep. This seems a correct step to give some automatic support 
instead of parameter/hardcoded multiplier.


Does this mean we should use cpufreq/intel_pstate based approach 
instead of the current modparam solution for Gen9?


Some concerns and questions about intel_pstate approach:
a) Currently, we cannot get the accurate pstate/target freq value 
from cpufreq in intel_pstate active mode since
 these values won't be exported to cpufreq layer, so if we won't 
change intel_pstate code then we only can get

 the max cpu freq of a new policy.
b) intel_pstate policy is attached to each logic cpu, which means we 
will receive policy/freq transition notification
    for each logic cpu freq change. One question is how we are going 
to decide the freq of the ring? just use the max

    cpu freq reported?
c) With the intel_pstate approach we may still run into thermal 
throttling, in this case, can a certain cooling device

    be triggered to lower the cpu freq?

Thanks and Regards,
-Jackie





___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC] drm/i915: Add a new modparam for customized ring multiplier

2018-01-05 Thread Sagar Arun Kamble



On 1/5/2018 3:22 AM, Yaodong Li wrote:


On 01/03/2018 10:10 PM, Sagar Arun Kamble wrote:
Since ring frequency programming needs consideration of both IA and 
GT frequency requests I think keeping the logic
to program the ring frequency table in driver that monitors both 
IA/GT busyness and power budgets like intel_ips
will be more appropriate. intel_ips is relying on global load derived 
from all CPUs.
I understand that power awareness and busyness based policy might be 
trickier but having that as tunable will give better flexibility.


By just looking into the current code, the way intel_ips checks gpu 
busyness cannot reflect the actual workload of GT
(e.g. gpu busy is true even if there's only one pending request), in 
this case, we shall not increase the ring freq if we
want to use a "workload monitoring" based solution. so we need a more 
accurate way to monitor the current GT workload
(e.g.  when the pending request count reaches a center tunable 
threshold??).

Yes. May be we can share the PMU data about engine busyness with intel_ips.



On 1/3/2018 11:51 PM, Yaodong Li wrote:


You are thinking of plugging into intel_pstate to make it smarter 
for ia freq transitions?
Yep. This seems a correct step to give some automatic support 
instead of parameter/hardcoded multiplier.


Does this mean we should use cpufreq/intel_pstate based approach 
instead of the current modparam solution for Gen9?


Some concerns and questions about intel_pstate approach:
a) Currently, we cannot get the accurate pstate/target freq value 
from cpufreq in intel_pstate active mode since
 these values won't be exported to cpufreq layer, so if we won't 
change intel_pstate code then we only can get

 the max cpu freq of a new policy.
b) intel_pstate policy is attached to each logic cpu, which means we 
will receive policy/freq transition notification
    for each logic cpu freq change. One question is how we are going 
to decide the freq of the ring? just use the max

    cpu freq reported?
c) With the intel_pstate approach we may still run into thermal 
throttling, in this case, can a certain cooling device

    be triggered to lower the cpu freq?

Thanks and Regards,
-Jackie







___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC] drm/i915: Add a new modparam for customized ring multiplier

2018-01-05 Thread Yaodong Li

On 01/05/2018 02:15 AM, Sagar Arun Kamble wrote:



On 1/5/2018 3:22 AM, Yaodong Li wrote:


On 01/03/2018 10:10 PM, Sagar Arun Kamble wrote:
Since ring frequency programming needs consideration of both IA and 
GT frequency requests I think keeping the logic
to program the ring frequency table in driver that monitors both 
IA/GT busyness and power budgets like intel_ips
will be more appropriate. intel_ips is relying on global load 
derived from all CPUs.
I understand that power awareness and busyness based policy might be 
trickier but having that as tunable will give better flexibility.


By just looking into the current code, the way intel_ips checks gpu 
busyness cannot reflect the actual workload of GT
(e.g. gpu busy is true even if there's only one pending request), in 
this case, we shall not increase the ring freq if we
want to use a "workload monitoring" based solution. so we need a more 
accurate way to monitor the current GT workload
(e.g.  when the pending request count reaches a center tunable 
threshold??).
Yes. May be we can share the PMU data about engine busyness with 
intel_ips.
Thank you Sagar! Can you tell more about how we can get the gpu busyness 
from PMU data?


I think the solution would be to set the ring freq table to use a ring 
freq >= current ia freq (for all possible gpu freq) once we found
gpu workload is high (need to tune the threshold), and we will decrease 
the ring freq (use a 2x multiplier?) once we found the GT
workload is low. The benefit to use the intel_ips is it can tell both 
the cpu & gpu busyness. However, we do need an accurate way

to check at least the busyness of gpu for this issue.


On 1/3/2018 11:51 PM, Yaodong Li wrote:


You are thinking of plugging into intel_pstate to make it 
smarter for ia freq transitions?
Yep. This seems a correct step to give some automatic support 
instead of parameter/hardcoded multiplier.


Does this mean we should use cpufreq/intel_pstate based approach 
instead of the current modparam solution for Gen9?


Some concerns and questions about intel_pstate approach:
a) Currently, we cannot get the accurate pstate/target freq value 
from cpufreq in intel_pstate active mode since
 these values won't be exported to cpufreq layer, so if we 
won't change intel_pstate code then we only can get

 the max cpu freq of a new policy.
b) intel_pstate policy is attached to each logic cpu, which means 
we will receive policy/freq transition notification
    for each logic cpu freq change. One question is how we are 
going to decide the freq of the ring? just use the max

    cpu freq reported?
c) With the intel_pstate approach we may still run into thermal 
throttling, in this case, can a certain cooling device

    be triggered to lower the cpu freq?

Thanks and Regards,
-Jackie









___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC] drm/i915: Add a new modparam for customized ring multiplier

2018-01-08 Thread Sagar Arun Kamble



On 1/6/2018 5:53 AM, Yaodong Li wrote:

On 01/05/2018 02:15 AM, Sagar Arun Kamble wrote:



On 1/5/2018 3:22 AM, Yaodong Li wrote:


On 01/03/2018 10:10 PM, Sagar Arun Kamble wrote:
Since ring frequency programming needs consideration of both IA and 
GT frequency requests I think keeping the logic
to program the ring frequency table in driver that monitors both 
IA/GT busyness and power budgets like intel_ips
will be more appropriate. intel_ips is relying on global load 
derived from all CPUs.
I understand that power awareness and busyness based policy might 
be trickier but having that as tunable will give better flexibility.


By just looking into the current code, the way intel_ips checks gpu 
busyness cannot reflect the actual workload of GT
(e.g. gpu busy is true even if there's only one pending request), in 
this case, we shall not increase the ring freq if we
want to use a "workload monitoring" based solution. so we need a 
more accurate way to monitor the current GT workload
(e.g.  when the pending request count reaches a center tunable 
threshold??).
Yes. May be we can share the PMU data about engine busyness with 
intel_ips.
Thank you Sagar! Can you tell more about how we can get the gpu 
busyness from PMU data?


It seems to be not possible as of now since pmu is reporting accumulated 
busy time per engine without info about time they became active/idle,
unless I am missing something. Also having pmu ON in release environment 
needs to be checked. Chris, could you please clarify on pmu usage here.


Also thinking about below two options to get the GPU busyness:
1. Sharing i915 rps power zone (LOW_POWER, BETWEEN, HIGH_POWER) to 
intel_ips (kind of indicators of GPU load)
2. Sampling GT C0 counter to derive full GPU busyness. This though may 
not have been tested much so far across platforms.


I think the solution would be to set the ring freq table to use a ring 
freq >= current ia freq (for all possible gpu freq) once we found
gpu workload is high (need to tune the threshold), and we will 
decrease the ring freq (use a 2x multiplier?) once we found the GT
workload is low. The benefit to use the intel_ips is it can tell both 
the cpu & gpu busyness. However, we do need an accurate way

to check at least the busyness of gpu for this issue.

Agree.

Thanks
Sagar



On 1/3/2018 11:51 PM, Yaodong Li wrote:


You are thinking of plugging into intel_pstate to make it 
smarter for ia freq transitions?
Yep. This seems a correct step to give some automatic support 
instead of parameter/hardcoded multiplier.


Does this mean we should use cpufreq/intel_pstate based approach 
instead of the current modparam solution for Gen9?


Some concerns and questions about intel_pstate approach:
a) Currently, we cannot get the accurate pstate/target freq value 
from cpufreq in intel_pstate active mode since
 these values won't be exported to cpufreq layer, so if we 
won't change intel_pstate code then we only can get

 the max cpu freq of a new policy.
b) intel_pstate policy is attached to each logic cpu, which means 
we will receive policy/freq transition notification
    for each logic cpu freq change. One question is how we are 
going to decide the freq of the ring? just use the max

    cpu freq reported?
c) With the intel_pstate approach we may still run into thermal 
throttling, in this case, can a certain cooling device

    be triggered to lower the cpu freq?

Thanks and Regards,
-Jackie











___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx