Re: [PATCH] drm/amd/display: Use pre-allocated temp struct for bounding box update

2022-06-08 Thread Aurabindo Pillai



On 2022-06-08 13:47, Leo wrote:



On 2022-06-08 13:25, Pillai, Aurabindo wrote:

[AMD Official Use Only - General]


Isnt it cleaner to revert the original patch which removed the temporary 
variable and then instead allocate the clock_limits array on heap, and later 
freed at the end of the function?

--


I don't quite recall the details, but in FPU context, we should not malloc 
since it can fault/sleep. More info here: 
https://yarchive.net/comp/linux/kernel_fp.html


It looks like we dont have access to bw_ctx and hence the VBA struct 
inside the callback to update bounding box. If we can add that, then we 
could replicate what we did within DML functions like 
dml32_ModeSupportAndSystemConfigurationFull() to reduce the stack usage.


See the struct define:


struct dml32_ModeSupportAndSystemConfigurationFull {
unsigned int dummy_integer_array[8][DC__NUM_DPP__MAX];
double dummy_double_array[2][DC__NUM_DPP__MAX];
DmlPipe SurfParameters[DC__NUM_DPP__MAX];
double dummy_single[5];
double dummy_single2[5];
SOCParametersList mSOCParameters;
unsigned int MaximumSwathWidthSupportLuma;
unsigned int MaximumSwathWidthSupportChroma;
};

struct dummy_vars {
struct 
DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation


DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation;
struct dml32_ModeSupportAndSystemConfigurationFull 
dml32_ModeSupportAndSystemConfigurationFull;

};

We define the dummy vars needed in the format of 
function_name.variable_name. dummy_vars is added to vba struct and this 
can be access from bw_ctx.



Or we could also add a similar dummy struct to the soc resources 
(dcn2_1_soc etc) and add the temporary variable there. This way, we have 
a consistent solution for reducing stack usage.



- Leo



Regards,
Jay
--
*From:* Li, Sun peng (Leo) 
*Sent:* Wednesday, June 8, 2022 12:48 PM
*To:* amd-gfx@lists.freedesktop.org 
*Cc:* Wentland, Harry ; Pillai, Aurabindo 
; Siqueira, Rodrigo ; Li, Sun peng 
(Leo) 
*Subject:* [PATCH] drm/amd/display: Use pre-allocated temp struct for bounding 
box update
  
From: Leo Li 


[Why]

There is a theoretical problem in prior patches for reducing the stack
size of *update_bw_bounding_box() functions.

By modifying the soc.clock_limits[n] struct directly, this can cause
unintended behavior as the for loop attempts to swap rows in
clock_limits[n]. A temporary struct is still required to make sure we
stay functinoally equivalent.

[How]

Add a temporary clock_limits table to the SOC struct, and use it when
swapping rows.

Signed-off-by: Leo Li 
---
  .../drm/amd/display/dc/dml/dcn20/dcn20_fpu.c  | 33 +-
  .../amd/display/dc/dml/dcn301/dcn301_fpu.c    | 36 ++-
  .../drm/amd/display/dc/dml/dcn31/dcn31_fpu.c  | 64 +++
  .../amd/display/dc/dml/display_mode_structs.h |  5 ++
  4 files changed, 82 insertions(+), 56 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
index c2fec0d85da4..e247b2270b1d 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
@@ -2015,9 +2015,8 @@ void dcn21_update_bw_bounding_box(struct dc *dc, struct 
clk_bw_params *bw_params
  
  ASSERT(clk_table->num_entries);

  /* Copy dcn2_1_soc.clock_limits to clock_limits to avoid copying over 
null states later */
-   for (i = 0; i < dcn2_1_soc.num_states + 1; i++) {
-   dcn2_1_soc.clock_limits[i] = dcn2_1_soc.clock_limits[i];
-   }
+   memcpy(_1_soc._clock_tmp, _1_soc.clock_limits,
+  sizeof(dcn2_1_soc.clock_limits));
  
  for (i = 0; i < clk_table->num_entries; i++) {

  /* loop backwards*/
@@ -2032,22 +2031,26 @@ void dcn21_update_bw_bounding_box(struct dc *dc, struct 
clk_bw_params *bw_params
    

Re: [PATCH] drm/amd/display: Use pre-allocated temp struct for bounding box update

2022-06-08 Thread Harry Wentland



On 2022-06-08 12:48, sunpeng...@amd.com wrote:
> From: Leo Li 
> 
> [Why]
> 
> There is a theoretical problem in prior patches for reducing the stack
> size of *update_bw_bounding_box() functions.
> 
> By modifying the soc.clock_limits[n] struct directly, this can cause
> unintended behavior as the for loop attempts to swap rows in
> clock_limits[n]. A temporary struct is still required to make sure we
> stay functinoally equivalent.
> 
> [How]
> 
> Add a temporary clock_limits table to the SOC struct, and use it when
> swapping rows.
> 
> Signed-off-by: Leo Li 
> ---
>  .../drm/amd/display/dc/dml/dcn20/dcn20_fpu.c  | 33 +-
>  .../amd/display/dc/dml/dcn301/dcn301_fpu.c| 36 ++-
>  .../drm/amd/display/dc/dml/dcn31/dcn31_fpu.c  | 64 +++
>  .../amd/display/dc/dml/display_mode_structs.h |  5 ++
>  4 files changed, 82 insertions(+), 56 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c 
> b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
> index c2fec0d85da4..e247b2270b1d 100644
> --- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
> +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
> @@ -2015,9 +2015,8 @@ void dcn21_update_bw_bounding_box(struct dc *dc, struct 
> clk_bw_params *bw_params
>  
>   ASSERT(clk_table->num_entries);
>   /* Copy dcn2_1_soc.clock_limits to clock_limits to avoid copying over 
> null states later */
> - for (i = 0; i < dcn2_1_soc.num_states + 1; i++) {
> - dcn2_1_soc.clock_limits[i] = dcn2_1_soc.clock_limits[i];
> - }

Hmm, this for loop didn't make sense. I gave my RB for the previous patch too 
quickly.

> + memcpy(_1_soc._clock_tmp, _1_soc.clock_limits,
> +sizeof(dcn2_1_soc.clock_limits));
>  
>   for (i = 0; i < clk_table->num_entries; i++) {
>   /* loop backwards*/
> @@ -2032,22 +2031,26 @@ void dcn21_update_bw_bounding_box(struct dc *dc, 
> struct clk_bw_params *bw_params
>   if (i == 1)
>   k++;
>  
> - dcn2_1_soc.clock_limits[k].state = k;
> - dcn2_1_soc.clock_limits[k].dcfclk_mhz = 
> clk_table->entries[i].dcfclk_mhz;
> - dcn2_1_soc.clock_limits[k].fabricclk_mhz = 
> clk_table->entries[i].fclk_mhz;
> - dcn2_1_soc.clock_limits[k].socclk_mhz = 
> clk_table->entries[i].socclk_mhz;
> - dcn2_1_soc.clock_limits[k].dram_speed_mts = 
> clk_table->entries[i].memclk_mhz * 2;
> + dcn2_1_soc._clock_tmp[k].state = k;
> + dcn2_1_soc._clock_tmp[k].dcfclk_mhz = 
> clk_table->entries[i].dcfclk_mhz;
> + dcn2_1_soc._clock_tmp[k].fabricclk_mhz = 
> clk_table->entries[i].fclk_mhz;
> + dcn2_1_soc._clock_tmp[k].socclk_mhz = 
> clk_table->entries[i].socclk_mhz;
> + dcn2_1_soc._clock_tmp[k].dram_speed_mts = 
> clk_table->entries[i].memclk_mhz * 2;
>  
> - dcn2_1_soc.clock_limits[k].dispclk_mhz = 
> dcn2_1_soc.clock_limits[closest_clk_lvl].dispclk_mhz;
> - dcn2_1_soc.clock_limits[k].dppclk_mhz = 
> dcn2_1_soc.clock_limits[closest_clk_lvl].dppclk_mhz;
> - dcn2_1_soc.clock_limits[k].dram_bw_per_chan_gbps = 
> dcn2_1_soc.clock_limits[closest_clk_lvl].dram_bw_per_chan_gbps;
> - dcn2_1_soc.clock_limits[k].dscclk_mhz = 
> dcn2_1_soc.clock_limits[closest_clk_lvl].dscclk_mhz;
> - dcn2_1_soc.clock_limits[k].dtbclk_mhz = 
> dcn2_1_soc.clock_limits[closest_clk_lvl].dtbclk_mhz;
> - dcn2_1_soc.clock_limits[k].phyclk_d18_mhz = 
> dcn2_1_soc.clock_limits[closest_clk_lvl].phyclk_d18_mhz;
> - dcn2_1_soc.clock_limits[k].phyclk_mhz = 
> dcn2_1_soc.clock_limits[closest_clk_lvl].phyclk_mhz;
> + dcn2_1_soc._clock_tmp[k].dispclk_mhz = 
> dcn2_1_soc.clock_limits[closest_clk_lvl].dispclk_mhz;
> + dcn2_1_soc._clock_tmp[k].dppclk_mhz = 
> dcn2_1_soc.clock_limits[closest_clk_lvl].dppclk_mhz;
> + dcn2_1_soc._clock_tmp[k].dram_bw_per_chan_gbps = 
> dcn2_1_soc.clock_limits[closest_clk_lvl].dram_bw_per_chan_gbps;
> + dcn2_1_soc._clock_tmp[k].dscclk_mhz = 
> dcn2_1_soc.clock_limits[closest_clk_lvl].dscclk_mhz;
> + dcn2_1_soc._clock_tmp[k].dtbclk_mhz = 
> dcn2_1_soc.clock_limits[closest_clk_lvl].dtbclk_mhz;
> + dcn2_1_soc._clock_tmp[k].phyclk_d18_mhz = 
> dcn2_1_soc.clock_limits[closest_clk_lvl].phyclk_d18_mhz;
> + dcn2_1_soc._clock_tmp[k].phyclk_mhz = 
> dcn2_1_soc.clock_limits[closest_clk_lvl].phyclk_mhz;
>  

I see why we need a tmp array and agree that we shouldn't allocate it inside DML
functions.

Reviewed-by: Harry Wentland 

Harry

>   k++;
>   }
> +
> + memcpy(_1_soc.clock_limits, _1_soc._clock_tmp,
> +sizeof(dcn2_1_soc.clock_limits));
> +
>   if (clk_table->num_entries) {
>   dcn2_1_soc.num_states = clk_table->num_entries + 1;
>   /* fill in min DF PState */
> diff --git 

Re: [PATCH] drm/amd/display: Use pre-allocated temp struct for bounding box update

2022-06-08 Thread Rodrigo Siqueira Jordao



On 2022-06-08 13:47, Leo wrote:



On 2022-06-08 13:25, Pillai, Aurabindo wrote:

[AMD Official Use Only - General]


Isnt it cleaner to revert the original patch which removed the temporary 
variable and then instead allocate the clock_limits array on heap, and later 
freed at the end of the function?

--


I don't quite recall the details, but in FPU context, we should not malloc 
since it can fault/sleep. More info here: 
https://yarchive.net/comp/linux/kernel_fp.html


Yeah... we cannot use malloc inside DML. I also think that we can just 
apply this patch on top of amd-staging-drm-next without reverting the 
old patches because we are unaware of any regression caused for those 
patches, afaik.


Thanks
Siqueira



- Leo



Regards,
Jay
--
*From:* Li, Sun peng (Leo) 
*Sent:* Wednesday, June 8, 2022 12:48 PM
*To:* amd-gfx@lists.freedesktop.org 
*Cc:* Wentland, Harry ; Pillai, Aurabindo 
; Siqueira, Rodrigo ; Li, Sun peng 
(Leo) 
*Subject:* [PATCH] drm/amd/display: Use pre-allocated temp struct for bounding 
box update
  
From: Leo Li 


[Why]

There is a theoretical problem in prior patches for reducing the stack
size of *update_bw_bounding_box() functions.

By modifying the soc.clock_limits[n] struct directly, this can cause
unintended behavior as the for loop attempts to swap rows in
clock_limits[n]. A temporary struct is still required to make sure we
stay functinoally equivalent.

[How]

Add a temporary clock_limits table to the SOC struct, and use it when
swapping rows.

Signed-off-by: Leo Li 
---
  .../drm/amd/display/dc/dml/dcn20/dcn20_fpu.c  | 33 +-
  .../amd/display/dc/dml/dcn301/dcn301_fpu.c    | 36 ++-
  .../drm/amd/display/dc/dml/dcn31/dcn31_fpu.c  | 64 +++
  .../amd/display/dc/dml/display_mode_structs.h |  5 ++
  4 files changed, 82 insertions(+), 56 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
index c2fec0d85da4..e247b2270b1d 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
@@ -2015,9 +2015,8 @@ void dcn21_update_bw_bounding_box(struct dc *dc, struct 
clk_bw_params *bw_params
  
  ASSERT(clk_table->num_entries);

  /* Copy dcn2_1_soc.clock_limits to clock_limits to avoid copying over 
null states later */
-   for (i = 0; i < dcn2_1_soc.num_states + 1; i++) {
-   dcn2_1_soc.clock_limits[i] = dcn2_1_soc.clock_limits[i];
-   }
+   memcpy(_1_soc._clock_tmp, _1_soc.clock_limits,
+  sizeof(dcn2_1_soc.clock_limits));
  
  for (i = 0; i < clk_table->num_entries; i++) {

  /* loop backwards*/
@@ -2032,22 +2031,26 @@ void dcn21_update_bw_bounding_box(struct dc *dc, struct 
clk_bw_params *bw_params
  if (i == 1)
  k++;
  
-   dcn2_1_soc.clock_limits[k].state = k;

-   dcn2_1_soc.clock_limits[k].dcfclk_mhz = 
clk_table->entries[i].dcfclk_mhz;
-   dcn2_1_soc.clock_limits[k].fabricclk_mhz = 
clk_table->entries[i].fclk_mhz;
-   dcn2_1_soc.clock_limits[k].socclk_mhz = 
clk_table->entries[i].socclk_mhz;
-   dcn2_1_soc.clock_limits[k].dram_speed_mts = 
clk_table->entries[i].memclk_mhz * 2;
+   dcn2_1_soc._clock_tmp[k].state = k;
+   dcn2_1_soc._clock_tmp[k].dcfclk_mhz = 
clk_table->entries[i].dcfclk_mhz;
+   dcn2_1_soc._clock_tmp[k].fabricclk_mhz = 
clk_table->entries[i].fclk_mhz;
+   dcn2_1_soc._clock_tmp[k].socclk_mhz = 
clk_table->entries[i].socclk_mhz;
+   dcn2_1_soc._clock_tmp[k].dram_speed_mts = 
clk_table->entries[i].memclk_mhz * 2;
  
-   dcn2_1_soc.clock_limits[k].dispclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].dispclk_mhz;

-   dcn2_1_soc.clock_limits[k].dppclk_mhz = 
dcn2_1_soc.clock_limits[closest_clk_lvl].dppclk_mhz;
-

Re: [PATCH] drm/amd/display: Use pre-allocated temp struct for bounding box update

2022-06-08 Thread Leo


On 2022-06-08 13:25, Pillai, Aurabindo wrote:
> [AMD Official Use Only - General]
> 
> 
> Isnt it cleaner to revert the original patch which removed the temporary 
> variable and then instead allocate the clock_limits array on heap, and later 
> freed at the end of the function?
> 
> --

I don't quite recall the details, but in FPU context, we should not malloc 
since it can fault/sleep. More info here: 
https://yarchive.net/comp/linux/kernel_fp.html
- Leo

> 
> Regards,
> Jay
> --
> *From:* Li, Sun peng (Leo) 
> *Sent:* Wednesday, June 8, 2022 12:48 PM
> *To:* amd-gfx@lists.freedesktop.org 
> *Cc:* Wentland, Harry ; Pillai, Aurabindo 
> ; Siqueira, Rodrigo ; Li, 
> Sun peng (Leo) 
> *Subject:* [PATCH] drm/amd/display: Use pre-allocated temp struct for 
> bounding box update
>  
> From: Leo Li 
> 
> [Why]
> 
> There is a theoretical problem in prior patches for reducing the stack
> size of *update_bw_bounding_box() functions.
> 
> By modifying the soc.clock_limits[n] struct directly, this can cause
> unintended behavior as the for loop attempts to swap rows in
> clock_limits[n]. A temporary struct is still required to make sure we
> stay functinoally equivalent.
> 
> [How]
> 
> Add a temporary clock_limits table to the SOC struct, and use it when
> swapping rows.
> 
> Signed-off-by: Leo Li 
> ---
>  .../drm/amd/display/dc/dml/dcn20/dcn20_fpu.c  | 33 +-
>  .../amd/display/dc/dml/dcn301/dcn301_fpu.c    | 36 ++-
>  .../drm/amd/display/dc/dml/dcn31/dcn31_fpu.c  | 64 +++
>  .../amd/display/dc/dml/display_mode_structs.h |  5 ++
>  4 files changed, 82 insertions(+), 56 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c 
> b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
> index c2fec0d85da4..e247b2270b1d 100644
> --- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
> +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
> @@ -2015,9 +2015,8 @@ void dcn21_update_bw_bounding_box(struct dc *dc, struct 
> clk_bw_params *bw_params
>  
>  ASSERT(clk_table->num_entries);
>  /* Copy dcn2_1_soc.clock_limits to clock_limits to avoid copying 
> over null states later */
> -   for (i = 0; i < dcn2_1_soc.num_states + 1; i++) {
> -   dcn2_1_soc.clock_limits[i] = dcn2_1_soc.clock_limits[i];
> -   }
> +   memcpy(_1_soc._clock_tmp, _1_soc.clock_limits,
> +  sizeof(dcn2_1_soc.clock_limits));
>  
>  for (i = 0; i < clk_table->num_entries; i++) {
>  /* loop backwards*/
> @@ -2032,22 +2031,26 @@ void dcn21_update_bw_bounding_box(struct dc *dc, 
> struct clk_bw_params *bw_params
>  if (i == 1)
>  k++;
>  
> -   dcn2_1_soc.clock_limits[k].state = k;
> -   dcn2_1_soc.clock_limits[k].dcfclk_mhz = 
> clk_table->entries[i].dcfclk_mhz;
> -   dcn2_1_soc.clock_limits[k].fabricclk_mhz = 
> clk_table->entries[i].fclk_mhz;
> -   dcn2_1_soc.clock_limits[k].socclk_mhz = 
> clk_table->entries[i].socclk_mhz;
> -   dcn2_1_soc.clock_limits[k].dram_speed_mts = 
> clk_table->entries[i].memclk_mhz * 2;
> +   dcn2_1_soc._clock_tmp[k].state = k;
> +   dcn2_1_soc._clock_tmp[k].dcfclk_mhz = 
> clk_table->entries[i].dcfclk_mhz;
> +   dcn2_1_soc._clock_tmp[k].fabricclk_mhz = 
> clk_table->entries[i].fclk_mhz;
> +   dcn2_1_soc._clock_tmp[k].socclk_mhz = 
> clk_table->entries[i].socclk_mhz;
> +   dcn2_1_soc._clock_tmp[k].dram_speed_mts = 
> clk_table->entries[i].memclk_mhz * 2;
>  
> -   dcn2_1_soc.clock_limits[k].dispclk_mhz =

Re: [PATCH] drm/amd/display: Use pre-allocated temp struct for bounding box update

2022-06-08 Thread Pillai, Aurabindo
[AMD Official Use Only - General]

Isnt it cleaner to revert the original patch which removed the temporary 
variable and then instead allocate the clock_limits array on heap, and later 
freed at the end of the function?

--

Regards,
Jay

From: Li, Sun peng (Leo) 
Sent: Wednesday, June 8, 2022 12:48 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Wentland, Harry ; Pillai, Aurabindo 
; Siqueira, Rodrigo ; Li, 
Sun peng (Leo) 
Subject: [PATCH] drm/amd/display: Use pre-allocated temp struct for bounding 
box update

From: Leo Li 

[Why]

There is a theoretical problem in prior patches for reducing the stack
size of *update_bw_bounding_box() functions.

By modifying the soc.clock_limits[n] struct directly, this can cause
unintended behavior as the for loop attempts to swap rows in
clock_limits[n]. A temporary struct is still required to make sure we
stay functinoally equivalent.

[How]

Add a temporary clock_limits table to the SOC struct, and use it when
swapping rows.

Signed-off-by: Leo Li 
---
 .../drm/amd/display/dc/dml/dcn20/dcn20_fpu.c  | 33 +-
 .../amd/display/dc/dml/dcn301/dcn301_fpu.c| 36 ++-
 .../drm/amd/display/dc/dml/dcn31/dcn31_fpu.c  | 64 +++
 .../amd/display/dc/dml/display_mode_structs.h |  5 ++
 4 files changed, 82 insertions(+), 56 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
index c2fec0d85da4..e247b2270b1d 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
@@ -2015,9 +2015,8 @@ void dcn21_update_bw_bounding_box(struct dc *dc, struct 
clk_bw_params *bw_params

 ASSERT(clk_table->num_entries);
 /* Copy dcn2_1_soc.clock_limits to clock_limits to avoid copying over 
null states later */
-   for (i = 0; i < dcn2_1_soc.num_states + 1; i++) {
-   dcn2_1_soc.clock_limits[i] = dcn2_1_soc.clock_limits[i];
-   }
+   memcpy(_1_soc._clock_tmp, _1_soc.clock_limits,
+  sizeof(dcn2_1_soc.clock_limits));

 for (i = 0; i < clk_table->num_entries; i++) {
 /* loop backwards*/
@@ -2032,22 +2031,26 @@ void dcn21_update_bw_bounding_box(struct dc *dc, struct 
clk_bw_params *bw_params
 if (i == 1)
 k++;

-   dcn2_1_soc.clock_limits[k].state = k;
-   dcn2_1_soc.clock_limits[k].dcfclk_mhz = 
clk_table->entries[i].dcfclk_mhz;
-   dcn2_1_soc.clock_limits[k].fabricclk_mhz = 
clk_table->entries[i].fclk_mhz;
-   dcn2_1_soc.clock_limits[k].socclk_mhz = 
clk_table->entries[i].socclk_mhz;
-   dcn2_1_soc.clock_limits[k].dram_speed_mts = 
clk_table->entries[i].memclk_mhz * 2;
+   dcn2_1_soc._clock_tmp[k].state = k;
+   dcn2_1_soc._clock_tmp[k].dcfclk_mhz = 
clk_table->entries[i].dcfclk_mhz;
+   dcn2_1_soc._clock_tmp[k].fabricclk_mhz = 
clk_table->entries[i].fclk_mhz;
+   dcn2_1_soc._clock_tmp[k].socclk_mhz = 
clk_table->entries[i].socclk_mhz;
+   dcn2_1_soc._clock_tmp[k].dram_speed_mts = 
clk_table->entries[i].memclk_mhz * 2;

-   dcn2_1_soc.clock_limits[k].dispclk_mhz = 
dcn2_1_soc.clock_limits[closest_clk_lvl].dispclk_mhz;
-   dcn2_1_soc.clock_limits[k].dppclk_mhz = 
dcn2_1_soc.clock_limits[closest_clk_lvl].dppclk_mhz;
-   dcn2_1_soc.clock_limits[k].dram_bw_per_chan_gbps = 
dcn2_1_soc.clock_limits[closest_clk_lvl].dram_bw_per_chan_gbps;
-   dcn2_1_soc.clock_limits[k].dscclk_mhz = 
dcn2_1_soc.clock_limits[closest_clk_lvl].dscclk_mhz;
-   dcn2_1_soc.clock_limits[k].dtbclk_mhz = 
dcn2_1_soc.clock_limits[closest_clk_lvl].dtbclk_mhz;
-   dcn2_1_soc.clock_limits[k].phyclk_d18_mhz = 
dcn2_1_soc.clock_limits[closest_clk_lvl].phyclk_d18_mhz;
-   dcn2_1_soc.clock_limits[k].phyclk_mhz = 
dcn2_1_soc.clock_limits[closest_clk_lvl].phyclk_mhz;
+   dcn2_1_soc._clock_tmp[k].dispclk_mhz = 
dcn2_1_soc.clock_limits[closest_clk_lvl].dispclk_mhz;
+   dcn2_1_soc._clock_tmp[k].dppclk_mhz = 
dcn2_1_soc.clock_limits[closest_clk_lvl].dppclk_mhz;
+   dcn2_1_soc._clock_tmp[k].dram_bw_per_chan_gbps = 
dcn2_1_soc.clock_limits[closest_clk_lvl].dram_bw_per_chan_gbps;
+   dcn2_1_soc._clock_tmp[k].dscclk_mhz = 
dcn2_1_soc.clock_limits[closest_clk_lvl].dscclk_mhz;
+   dcn2_1_soc._clock_tmp[k].dtbclk_mhz = 
dcn2_1_soc.clock_limits[closest_clk_lvl].dtbclk_mhz;
+   dcn2_1_soc._clock_tmp[k].phyclk_d18_mhz = 
dcn2_1_soc.clock_limits[closest_clk_lvl].phyclk_d18_mhz;
+   dcn2_1_soc._clock_tmp[k].phyclk_mhz = 
dcn2_1_soc.clock_limits[closest_clk_lvl].phyclk_mhz;

 k++;
 }
+
+   memcpy(_1_soc.clock_limits, _1_soc._clock_tmp,
+  sizeof(dcn2_1_soc.clock_limits));
+

[PATCH] drm/amd/display: Use pre-allocated temp struct for bounding box update

2022-06-08 Thread sunpeng.li
From: Leo Li 

[Why]

There is a theoretical problem in prior patches for reducing the stack
size of *update_bw_bounding_box() functions.

By modifying the soc.clock_limits[n] struct directly, this can cause
unintended behavior as the for loop attempts to swap rows in
clock_limits[n]. A temporary struct is still required to make sure we
stay functinoally equivalent.

[How]

Add a temporary clock_limits table to the SOC struct, and use it when
swapping rows.

Signed-off-by: Leo Li 
---
 .../drm/amd/display/dc/dml/dcn20/dcn20_fpu.c  | 33 +-
 .../amd/display/dc/dml/dcn301/dcn301_fpu.c| 36 ++-
 .../drm/amd/display/dc/dml/dcn31/dcn31_fpu.c  | 64 +++
 .../amd/display/dc/dml/display_mode_structs.h |  5 ++
 4 files changed, 82 insertions(+), 56 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
index c2fec0d85da4..e247b2270b1d 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
@@ -2015,9 +2015,8 @@ void dcn21_update_bw_bounding_box(struct dc *dc, struct 
clk_bw_params *bw_params
 
ASSERT(clk_table->num_entries);
/* Copy dcn2_1_soc.clock_limits to clock_limits to avoid copying over 
null states later */
-   for (i = 0; i < dcn2_1_soc.num_states + 1; i++) {
-   dcn2_1_soc.clock_limits[i] = dcn2_1_soc.clock_limits[i];
-   }
+   memcpy(_1_soc._clock_tmp, _1_soc.clock_limits,
+  sizeof(dcn2_1_soc.clock_limits));
 
for (i = 0; i < clk_table->num_entries; i++) {
/* loop backwards*/
@@ -2032,22 +2031,26 @@ void dcn21_update_bw_bounding_box(struct dc *dc, struct 
clk_bw_params *bw_params
if (i == 1)
k++;
 
-   dcn2_1_soc.clock_limits[k].state = k;
-   dcn2_1_soc.clock_limits[k].dcfclk_mhz = 
clk_table->entries[i].dcfclk_mhz;
-   dcn2_1_soc.clock_limits[k].fabricclk_mhz = 
clk_table->entries[i].fclk_mhz;
-   dcn2_1_soc.clock_limits[k].socclk_mhz = 
clk_table->entries[i].socclk_mhz;
-   dcn2_1_soc.clock_limits[k].dram_speed_mts = 
clk_table->entries[i].memclk_mhz * 2;
+   dcn2_1_soc._clock_tmp[k].state = k;
+   dcn2_1_soc._clock_tmp[k].dcfclk_mhz = 
clk_table->entries[i].dcfclk_mhz;
+   dcn2_1_soc._clock_tmp[k].fabricclk_mhz = 
clk_table->entries[i].fclk_mhz;
+   dcn2_1_soc._clock_tmp[k].socclk_mhz = 
clk_table->entries[i].socclk_mhz;
+   dcn2_1_soc._clock_tmp[k].dram_speed_mts = 
clk_table->entries[i].memclk_mhz * 2;
 
-   dcn2_1_soc.clock_limits[k].dispclk_mhz = 
dcn2_1_soc.clock_limits[closest_clk_lvl].dispclk_mhz;
-   dcn2_1_soc.clock_limits[k].dppclk_mhz = 
dcn2_1_soc.clock_limits[closest_clk_lvl].dppclk_mhz;
-   dcn2_1_soc.clock_limits[k].dram_bw_per_chan_gbps = 
dcn2_1_soc.clock_limits[closest_clk_lvl].dram_bw_per_chan_gbps;
-   dcn2_1_soc.clock_limits[k].dscclk_mhz = 
dcn2_1_soc.clock_limits[closest_clk_lvl].dscclk_mhz;
-   dcn2_1_soc.clock_limits[k].dtbclk_mhz = 
dcn2_1_soc.clock_limits[closest_clk_lvl].dtbclk_mhz;
-   dcn2_1_soc.clock_limits[k].phyclk_d18_mhz = 
dcn2_1_soc.clock_limits[closest_clk_lvl].phyclk_d18_mhz;
-   dcn2_1_soc.clock_limits[k].phyclk_mhz = 
dcn2_1_soc.clock_limits[closest_clk_lvl].phyclk_mhz;
+   dcn2_1_soc._clock_tmp[k].dispclk_mhz = 
dcn2_1_soc.clock_limits[closest_clk_lvl].dispclk_mhz;
+   dcn2_1_soc._clock_tmp[k].dppclk_mhz = 
dcn2_1_soc.clock_limits[closest_clk_lvl].dppclk_mhz;
+   dcn2_1_soc._clock_tmp[k].dram_bw_per_chan_gbps = 
dcn2_1_soc.clock_limits[closest_clk_lvl].dram_bw_per_chan_gbps;
+   dcn2_1_soc._clock_tmp[k].dscclk_mhz = 
dcn2_1_soc.clock_limits[closest_clk_lvl].dscclk_mhz;
+   dcn2_1_soc._clock_tmp[k].dtbclk_mhz = 
dcn2_1_soc.clock_limits[closest_clk_lvl].dtbclk_mhz;
+   dcn2_1_soc._clock_tmp[k].phyclk_d18_mhz = 
dcn2_1_soc.clock_limits[closest_clk_lvl].phyclk_d18_mhz;
+   dcn2_1_soc._clock_tmp[k].phyclk_mhz = 
dcn2_1_soc.clock_limits[closest_clk_lvl].phyclk_mhz;
 
k++;
}
+
+   memcpy(_1_soc.clock_limits, _1_soc._clock_tmp,
+  sizeof(dcn2_1_soc.clock_limits));
+
if (clk_table->num_entries) {
dcn2_1_soc.num_states = clk_table->num_entries + 1;
/* fill in min DF PState */
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn301/dcn301_fpu.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn301/dcn301_fpu.c
index 62cf283d9f41..e4863f0bf0f6 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn301/dcn301_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn301/dcn301_fpu.c
@@ -254,6 +254,9 @@ void dcn301_update_bw_bounding_box(struct dc *dc, struct 
clk_bw_params *bw_param
 
dc_assert_fp_enabled();
 
+