Re: [PATCH] drm/amd/display: Use pre-allocated temp struct for bounding box update
On 2022-06-08 13:47, Leo wrote: On 2022-06-08 13:25, Pillai, Aurabindo wrote: [AMD Official Use Only - General] Isnt it cleaner to revert the original patch which removed the temporary variable and then instead allocate the clock_limits array on heap, and later freed at the end of the function? -- I don't quite recall the details, but in FPU context, we should not malloc since it can fault/sleep. More info here: https://yarchive.net/comp/linux/kernel_fp.html It looks like we dont have access to bw_ctx and hence the VBA struct inside the callback to update bounding box. If we can add that, then we could replicate what we did within DML functions like dml32_ModeSupportAndSystemConfigurationFull() to reduce the stack usage. See the struct define: struct dml32_ModeSupportAndSystemConfigurationFull { unsigned int dummy_integer_array[8][DC__NUM_DPP__MAX]; double dummy_double_array[2][DC__NUM_DPP__MAX]; DmlPipe SurfParameters[DC__NUM_DPP__MAX]; double dummy_single[5]; double dummy_single2[5]; SOCParametersList mSOCParameters; unsigned int MaximumSwathWidthSupportLuma; unsigned int MaximumSwathWidthSupportChroma; }; struct dummy_vars { struct DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation; struct dml32_ModeSupportAndSystemConfigurationFull dml32_ModeSupportAndSystemConfigurationFull; }; We define the dummy vars needed in the format of function_name.variable_name. dummy_vars is added to vba struct and this can be access from bw_ctx. Or we could also add a similar dummy struct to the soc resources (dcn2_1_soc etc) and add the temporary variable there. This way, we have a consistent solution for reducing stack usage. - Leo Regards, Jay -- *From:* Li, Sun peng (Leo) *Sent:* Wednesday, June 8, 2022 12:48 PM *To:* amd-gfx@lists.freedesktop.org *Cc:* Wentland, Harry ; Pillai, Aurabindo ; Siqueira, Rodrigo ; Li, Sun peng (Leo) *Subject:* [PATCH] drm/amd/display: Use pre-allocated temp struct for bounding box update From: Leo Li [Why] There is a theoretical problem in prior patches for reducing the stack size of *update_bw_bounding_box() functions. By modifying the soc.clock_limits[n] struct directly, this can cause unintended behavior as the for loop attempts to swap rows in clock_limits[n]. A temporary struct is still required to make sure we stay functinoally equivalent. [How] Add a temporary clock_limits table to the SOC struct, and use it when swapping rows. Signed-off-by: Leo Li --- .../drm/amd/display/dc/dml/dcn20/dcn20_fpu.c | 33 +- .../amd/display/dc/dml/dcn301/dcn301_fpu.c | 36 ++- .../drm/amd/display/dc/dml/dcn31/dcn31_fpu.c | 64 +++ .../amd/display/dc/dml/display_mode_structs.h | 5 ++ 4 files changed, 82 insertions(+), 56 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c index c2fec0d85da4..e247b2270b1d 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c @@ -2015,9 +2015,8 @@ void dcn21_update_bw_bounding_box(struct dc *dc, struct clk_bw_params *bw_params ASSERT(clk_table->num_entries); /* Copy dcn2_1_soc.clock_limits to clock_limits to avoid copying over null states later */ - for (i = 0; i < dcn2_1_soc.num_states + 1; i++) { - dcn2_1_soc.clock_limits[i] = dcn2_1_soc.clock_limits[i]; - } + memcpy(_1_soc._clock_tmp, _1_soc.clock_limits, + sizeof(dcn2_1_soc.clock_limits)); for (i = 0; i < clk_table->num_entries; i++) { /* loop backwards*/ @@ -2032,22 +2031,26 @@ void dcn21_update_bw_bounding_box(struct dc *dc, struct clk_bw_params *bw_params
Re: [PATCH] drm/amd/display: Use pre-allocated temp struct for bounding box update
On 2022-06-08 12:48, sunpeng...@amd.com wrote: > From: Leo Li > > [Why] > > There is a theoretical problem in prior patches for reducing the stack > size of *update_bw_bounding_box() functions. > > By modifying the soc.clock_limits[n] struct directly, this can cause > unintended behavior as the for loop attempts to swap rows in > clock_limits[n]. A temporary struct is still required to make sure we > stay functinoally equivalent. > > [How] > > Add a temporary clock_limits table to the SOC struct, and use it when > swapping rows. > > Signed-off-by: Leo Li > --- > .../drm/amd/display/dc/dml/dcn20/dcn20_fpu.c | 33 +- > .../amd/display/dc/dml/dcn301/dcn301_fpu.c| 36 ++- > .../drm/amd/display/dc/dml/dcn31/dcn31_fpu.c | 64 +++ > .../amd/display/dc/dml/display_mode_structs.h | 5 ++ > 4 files changed, 82 insertions(+), 56 deletions(-) > > diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c > b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c > index c2fec0d85da4..e247b2270b1d 100644 > --- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c > +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c > @@ -2015,9 +2015,8 @@ void dcn21_update_bw_bounding_box(struct dc *dc, struct > clk_bw_params *bw_params > > ASSERT(clk_table->num_entries); > /* Copy dcn2_1_soc.clock_limits to clock_limits to avoid copying over > null states later */ > - for (i = 0; i < dcn2_1_soc.num_states + 1; i++) { > - dcn2_1_soc.clock_limits[i] = dcn2_1_soc.clock_limits[i]; > - } Hmm, this for loop didn't make sense. I gave my RB for the previous patch too quickly. > + memcpy(_1_soc._clock_tmp, _1_soc.clock_limits, > +sizeof(dcn2_1_soc.clock_limits)); > > for (i = 0; i < clk_table->num_entries; i++) { > /* loop backwards*/ > @@ -2032,22 +2031,26 @@ void dcn21_update_bw_bounding_box(struct dc *dc, > struct clk_bw_params *bw_params > if (i == 1) > k++; > > - dcn2_1_soc.clock_limits[k].state = k; > - dcn2_1_soc.clock_limits[k].dcfclk_mhz = > clk_table->entries[i].dcfclk_mhz; > - dcn2_1_soc.clock_limits[k].fabricclk_mhz = > clk_table->entries[i].fclk_mhz; > - dcn2_1_soc.clock_limits[k].socclk_mhz = > clk_table->entries[i].socclk_mhz; > - dcn2_1_soc.clock_limits[k].dram_speed_mts = > clk_table->entries[i].memclk_mhz * 2; > + dcn2_1_soc._clock_tmp[k].state = k; > + dcn2_1_soc._clock_tmp[k].dcfclk_mhz = > clk_table->entries[i].dcfclk_mhz; > + dcn2_1_soc._clock_tmp[k].fabricclk_mhz = > clk_table->entries[i].fclk_mhz; > + dcn2_1_soc._clock_tmp[k].socclk_mhz = > clk_table->entries[i].socclk_mhz; > + dcn2_1_soc._clock_tmp[k].dram_speed_mts = > clk_table->entries[i].memclk_mhz * 2; > > - dcn2_1_soc.clock_limits[k].dispclk_mhz = > dcn2_1_soc.clock_limits[closest_clk_lvl].dispclk_mhz; > - dcn2_1_soc.clock_limits[k].dppclk_mhz = > dcn2_1_soc.clock_limits[closest_clk_lvl].dppclk_mhz; > - dcn2_1_soc.clock_limits[k].dram_bw_per_chan_gbps = > dcn2_1_soc.clock_limits[closest_clk_lvl].dram_bw_per_chan_gbps; > - dcn2_1_soc.clock_limits[k].dscclk_mhz = > dcn2_1_soc.clock_limits[closest_clk_lvl].dscclk_mhz; > - dcn2_1_soc.clock_limits[k].dtbclk_mhz = > dcn2_1_soc.clock_limits[closest_clk_lvl].dtbclk_mhz; > - dcn2_1_soc.clock_limits[k].phyclk_d18_mhz = > dcn2_1_soc.clock_limits[closest_clk_lvl].phyclk_d18_mhz; > - dcn2_1_soc.clock_limits[k].phyclk_mhz = > dcn2_1_soc.clock_limits[closest_clk_lvl].phyclk_mhz; > + dcn2_1_soc._clock_tmp[k].dispclk_mhz = > dcn2_1_soc.clock_limits[closest_clk_lvl].dispclk_mhz; > + dcn2_1_soc._clock_tmp[k].dppclk_mhz = > dcn2_1_soc.clock_limits[closest_clk_lvl].dppclk_mhz; > + dcn2_1_soc._clock_tmp[k].dram_bw_per_chan_gbps = > dcn2_1_soc.clock_limits[closest_clk_lvl].dram_bw_per_chan_gbps; > + dcn2_1_soc._clock_tmp[k].dscclk_mhz = > dcn2_1_soc.clock_limits[closest_clk_lvl].dscclk_mhz; > + dcn2_1_soc._clock_tmp[k].dtbclk_mhz = > dcn2_1_soc.clock_limits[closest_clk_lvl].dtbclk_mhz; > + dcn2_1_soc._clock_tmp[k].phyclk_d18_mhz = > dcn2_1_soc.clock_limits[closest_clk_lvl].phyclk_d18_mhz; > + dcn2_1_soc._clock_tmp[k].phyclk_mhz = > dcn2_1_soc.clock_limits[closest_clk_lvl].phyclk_mhz; > I see why we need a tmp array and agree that we shouldn't allocate it inside DML functions. Reviewed-by: Harry Wentland Harry > k++; > } > + > + memcpy(_1_soc.clock_limits, _1_soc._clock_tmp, > +sizeof(dcn2_1_soc.clock_limits)); > + > if (clk_table->num_entries) { > dcn2_1_soc.num_states = clk_table->num_entries + 1; > /* fill in min DF PState */ > diff --git
Re: [PATCH] drm/amd/display: Use pre-allocated temp struct for bounding box update
On 2022-06-08 13:47, Leo wrote: On 2022-06-08 13:25, Pillai, Aurabindo wrote: [AMD Official Use Only - General] Isnt it cleaner to revert the original patch which removed the temporary variable and then instead allocate the clock_limits array on heap, and later freed at the end of the function? -- I don't quite recall the details, but in FPU context, we should not malloc since it can fault/sleep. More info here: https://yarchive.net/comp/linux/kernel_fp.html Yeah... we cannot use malloc inside DML. I also think that we can just apply this patch on top of amd-staging-drm-next without reverting the old patches because we are unaware of any regression caused for those patches, afaik. Thanks Siqueira - Leo Regards, Jay -- *From:* Li, Sun peng (Leo) *Sent:* Wednesday, June 8, 2022 12:48 PM *To:* amd-gfx@lists.freedesktop.org *Cc:* Wentland, Harry ; Pillai, Aurabindo ; Siqueira, Rodrigo ; Li, Sun peng (Leo) *Subject:* [PATCH] drm/amd/display: Use pre-allocated temp struct for bounding box update From: Leo Li [Why] There is a theoretical problem in prior patches for reducing the stack size of *update_bw_bounding_box() functions. By modifying the soc.clock_limits[n] struct directly, this can cause unintended behavior as the for loop attempts to swap rows in clock_limits[n]. A temporary struct is still required to make sure we stay functinoally equivalent. [How] Add a temporary clock_limits table to the SOC struct, and use it when swapping rows. Signed-off-by: Leo Li --- .../drm/amd/display/dc/dml/dcn20/dcn20_fpu.c | 33 +- .../amd/display/dc/dml/dcn301/dcn301_fpu.c | 36 ++- .../drm/amd/display/dc/dml/dcn31/dcn31_fpu.c | 64 +++ .../amd/display/dc/dml/display_mode_structs.h | 5 ++ 4 files changed, 82 insertions(+), 56 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c index c2fec0d85da4..e247b2270b1d 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c @@ -2015,9 +2015,8 @@ void dcn21_update_bw_bounding_box(struct dc *dc, struct clk_bw_params *bw_params ASSERT(clk_table->num_entries); /* Copy dcn2_1_soc.clock_limits to clock_limits to avoid copying over null states later */ - for (i = 0; i < dcn2_1_soc.num_states + 1; i++) { - dcn2_1_soc.clock_limits[i] = dcn2_1_soc.clock_limits[i]; - } + memcpy(_1_soc._clock_tmp, _1_soc.clock_limits, + sizeof(dcn2_1_soc.clock_limits)); for (i = 0; i < clk_table->num_entries; i++) { /* loop backwards*/ @@ -2032,22 +2031,26 @@ void dcn21_update_bw_bounding_box(struct dc *dc, struct clk_bw_params *bw_params if (i == 1) k++; - dcn2_1_soc.clock_limits[k].state = k; - dcn2_1_soc.clock_limits[k].dcfclk_mhz = clk_table->entries[i].dcfclk_mhz; - dcn2_1_soc.clock_limits[k].fabricclk_mhz = clk_table->entries[i].fclk_mhz; - dcn2_1_soc.clock_limits[k].socclk_mhz = clk_table->entries[i].socclk_mhz; - dcn2_1_soc.clock_limits[k].dram_speed_mts = clk_table->entries[i].memclk_mhz * 2; + dcn2_1_soc._clock_tmp[k].state = k; + dcn2_1_soc._clock_tmp[k].dcfclk_mhz = clk_table->entries[i].dcfclk_mhz; + dcn2_1_soc._clock_tmp[k].fabricclk_mhz = clk_table->entries[i].fclk_mhz; + dcn2_1_soc._clock_tmp[k].socclk_mhz = clk_table->entries[i].socclk_mhz; + dcn2_1_soc._clock_tmp[k].dram_speed_mts = clk_table->entries[i].memclk_mhz * 2; - dcn2_1_soc.clock_limits[k].dispclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].dispclk_mhz; - dcn2_1_soc.clock_limits[k].dppclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].dppclk_mhz; -
Re: [PATCH] drm/amd/display: Use pre-allocated temp struct for bounding box update
On 2022-06-08 13:25, Pillai, Aurabindo wrote: > [AMD Official Use Only - General] > > > Isnt it cleaner to revert the original patch which removed the temporary > variable and then instead allocate the clock_limits array on heap, and later > freed at the end of the function? > > -- I don't quite recall the details, but in FPU context, we should not malloc since it can fault/sleep. More info here: https://yarchive.net/comp/linux/kernel_fp.html - Leo > > Regards, > Jay > -- > *From:* Li, Sun peng (Leo) > *Sent:* Wednesday, June 8, 2022 12:48 PM > *To:* amd-gfx@lists.freedesktop.org > *Cc:* Wentland, Harry ; Pillai, Aurabindo > ; Siqueira, Rodrigo ; Li, > Sun peng (Leo) > *Subject:* [PATCH] drm/amd/display: Use pre-allocated temp struct for > bounding box update > > From: Leo Li > > [Why] > > There is a theoretical problem in prior patches for reducing the stack > size of *update_bw_bounding_box() functions. > > By modifying the soc.clock_limits[n] struct directly, this can cause > unintended behavior as the for loop attempts to swap rows in > clock_limits[n]. A temporary struct is still required to make sure we > stay functinoally equivalent. > > [How] > > Add a temporary clock_limits table to the SOC struct, and use it when > swapping rows. > > Signed-off-by: Leo Li > --- > .../drm/amd/display/dc/dml/dcn20/dcn20_fpu.c | 33 +- > .../amd/display/dc/dml/dcn301/dcn301_fpu.c | 36 ++- > .../drm/amd/display/dc/dml/dcn31/dcn31_fpu.c | 64 +++ > .../amd/display/dc/dml/display_mode_structs.h | 5 ++ > 4 files changed, 82 insertions(+), 56 deletions(-) > > diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c > b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c > index c2fec0d85da4..e247b2270b1d 100644 > --- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c > +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c > @@ -2015,9 +2015,8 @@ void dcn21_update_bw_bounding_box(struct dc *dc, struct > clk_bw_params *bw_params > > ASSERT(clk_table->num_entries); > /* Copy dcn2_1_soc.clock_limits to clock_limits to avoid copying > over null states later */ > - for (i = 0; i < dcn2_1_soc.num_states + 1; i++) { > - dcn2_1_soc.clock_limits[i] = dcn2_1_soc.clock_limits[i]; > - } > + memcpy(_1_soc._clock_tmp, _1_soc.clock_limits, > + sizeof(dcn2_1_soc.clock_limits)); > > for (i = 0; i < clk_table->num_entries; i++) { > /* loop backwards*/ > @@ -2032,22 +2031,26 @@ void dcn21_update_bw_bounding_box(struct dc *dc, > struct clk_bw_params *bw_params > if (i == 1) > k++; > > - dcn2_1_soc.clock_limits[k].state = k; > - dcn2_1_soc.clock_limits[k].dcfclk_mhz = > clk_table->entries[i].dcfclk_mhz; > - dcn2_1_soc.clock_limits[k].fabricclk_mhz = > clk_table->entries[i].fclk_mhz; > - dcn2_1_soc.clock_limits[k].socclk_mhz = > clk_table->entries[i].socclk_mhz; > - dcn2_1_soc.clock_limits[k].dram_speed_mts = > clk_table->entries[i].memclk_mhz * 2; > + dcn2_1_soc._clock_tmp[k].state = k; > + dcn2_1_soc._clock_tmp[k].dcfclk_mhz = > clk_table->entries[i].dcfclk_mhz; > + dcn2_1_soc._clock_tmp[k].fabricclk_mhz = > clk_table->entries[i].fclk_mhz; > + dcn2_1_soc._clock_tmp[k].socclk_mhz = > clk_table->entries[i].socclk_mhz; > + dcn2_1_soc._clock_tmp[k].dram_speed_mts = > clk_table->entries[i].memclk_mhz * 2; > > - dcn2_1_soc.clock_limits[k].dispclk_mhz =
Re: [PATCH] drm/amd/display: Use pre-allocated temp struct for bounding box update
[AMD Official Use Only - General] Isnt it cleaner to revert the original patch which removed the temporary variable and then instead allocate the clock_limits array on heap, and later freed at the end of the function? -- Regards, Jay From: Li, Sun peng (Leo) Sent: Wednesday, June 8, 2022 12:48 PM To: amd-gfx@lists.freedesktop.org Cc: Wentland, Harry ; Pillai, Aurabindo ; Siqueira, Rodrigo ; Li, Sun peng (Leo) Subject: [PATCH] drm/amd/display: Use pre-allocated temp struct for bounding box update From: Leo Li [Why] There is a theoretical problem in prior patches for reducing the stack size of *update_bw_bounding_box() functions. By modifying the soc.clock_limits[n] struct directly, this can cause unintended behavior as the for loop attempts to swap rows in clock_limits[n]. A temporary struct is still required to make sure we stay functinoally equivalent. [How] Add a temporary clock_limits table to the SOC struct, and use it when swapping rows. Signed-off-by: Leo Li --- .../drm/amd/display/dc/dml/dcn20/dcn20_fpu.c | 33 +- .../amd/display/dc/dml/dcn301/dcn301_fpu.c| 36 ++- .../drm/amd/display/dc/dml/dcn31/dcn31_fpu.c | 64 +++ .../amd/display/dc/dml/display_mode_structs.h | 5 ++ 4 files changed, 82 insertions(+), 56 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c index c2fec0d85da4..e247b2270b1d 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c @@ -2015,9 +2015,8 @@ void dcn21_update_bw_bounding_box(struct dc *dc, struct clk_bw_params *bw_params ASSERT(clk_table->num_entries); /* Copy dcn2_1_soc.clock_limits to clock_limits to avoid copying over null states later */ - for (i = 0; i < dcn2_1_soc.num_states + 1; i++) { - dcn2_1_soc.clock_limits[i] = dcn2_1_soc.clock_limits[i]; - } + memcpy(_1_soc._clock_tmp, _1_soc.clock_limits, + sizeof(dcn2_1_soc.clock_limits)); for (i = 0; i < clk_table->num_entries; i++) { /* loop backwards*/ @@ -2032,22 +2031,26 @@ void dcn21_update_bw_bounding_box(struct dc *dc, struct clk_bw_params *bw_params if (i == 1) k++; - dcn2_1_soc.clock_limits[k].state = k; - dcn2_1_soc.clock_limits[k].dcfclk_mhz = clk_table->entries[i].dcfclk_mhz; - dcn2_1_soc.clock_limits[k].fabricclk_mhz = clk_table->entries[i].fclk_mhz; - dcn2_1_soc.clock_limits[k].socclk_mhz = clk_table->entries[i].socclk_mhz; - dcn2_1_soc.clock_limits[k].dram_speed_mts = clk_table->entries[i].memclk_mhz * 2; + dcn2_1_soc._clock_tmp[k].state = k; + dcn2_1_soc._clock_tmp[k].dcfclk_mhz = clk_table->entries[i].dcfclk_mhz; + dcn2_1_soc._clock_tmp[k].fabricclk_mhz = clk_table->entries[i].fclk_mhz; + dcn2_1_soc._clock_tmp[k].socclk_mhz = clk_table->entries[i].socclk_mhz; + dcn2_1_soc._clock_tmp[k].dram_speed_mts = clk_table->entries[i].memclk_mhz * 2; - dcn2_1_soc.clock_limits[k].dispclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].dispclk_mhz; - dcn2_1_soc.clock_limits[k].dppclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].dppclk_mhz; - dcn2_1_soc.clock_limits[k].dram_bw_per_chan_gbps = dcn2_1_soc.clock_limits[closest_clk_lvl].dram_bw_per_chan_gbps; - dcn2_1_soc.clock_limits[k].dscclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].dscclk_mhz; - dcn2_1_soc.clock_limits[k].dtbclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].dtbclk_mhz; - dcn2_1_soc.clock_limits[k].phyclk_d18_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].phyclk_d18_mhz; - dcn2_1_soc.clock_limits[k].phyclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].phyclk_mhz; + dcn2_1_soc._clock_tmp[k].dispclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].dispclk_mhz; + dcn2_1_soc._clock_tmp[k].dppclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].dppclk_mhz; + dcn2_1_soc._clock_tmp[k].dram_bw_per_chan_gbps = dcn2_1_soc.clock_limits[closest_clk_lvl].dram_bw_per_chan_gbps; + dcn2_1_soc._clock_tmp[k].dscclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].dscclk_mhz; + dcn2_1_soc._clock_tmp[k].dtbclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].dtbclk_mhz; + dcn2_1_soc._clock_tmp[k].phyclk_d18_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].phyclk_d18_mhz; + dcn2_1_soc._clock_tmp[k].phyclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].phyclk_mhz; k++; } + + memcpy(_1_soc.clock_limits, _1_soc._clock_tmp, + sizeof(dcn2_1_soc.clock_limits)); +
[PATCH] drm/amd/display: Use pre-allocated temp struct for bounding box update
From: Leo Li [Why] There is a theoretical problem in prior patches for reducing the stack size of *update_bw_bounding_box() functions. By modifying the soc.clock_limits[n] struct directly, this can cause unintended behavior as the for loop attempts to swap rows in clock_limits[n]. A temporary struct is still required to make sure we stay functinoally equivalent. [How] Add a temporary clock_limits table to the SOC struct, and use it when swapping rows. Signed-off-by: Leo Li --- .../drm/amd/display/dc/dml/dcn20/dcn20_fpu.c | 33 +- .../amd/display/dc/dml/dcn301/dcn301_fpu.c| 36 ++- .../drm/amd/display/dc/dml/dcn31/dcn31_fpu.c | 64 +++ .../amd/display/dc/dml/display_mode_structs.h | 5 ++ 4 files changed, 82 insertions(+), 56 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c index c2fec0d85da4..e247b2270b1d 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c @@ -2015,9 +2015,8 @@ void dcn21_update_bw_bounding_box(struct dc *dc, struct clk_bw_params *bw_params ASSERT(clk_table->num_entries); /* Copy dcn2_1_soc.clock_limits to clock_limits to avoid copying over null states later */ - for (i = 0; i < dcn2_1_soc.num_states + 1; i++) { - dcn2_1_soc.clock_limits[i] = dcn2_1_soc.clock_limits[i]; - } + memcpy(_1_soc._clock_tmp, _1_soc.clock_limits, + sizeof(dcn2_1_soc.clock_limits)); for (i = 0; i < clk_table->num_entries; i++) { /* loop backwards*/ @@ -2032,22 +2031,26 @@ void dcn21_update_bw_bounding_box(struct dc *dc, struct clk_bw_params *bw_params if (i == 1) k++; - dcn2_1_soc.clock_limits[k].state = k; - dcn2_1_soc.clock_limits[k].dcfclk_mhz = clk_table->entries[i].dcfclk_mhz; - dcn2_1_soc.clock_limits[k].fabricclk_mhz = clk_table->entries[i].fclk_mhz; - dcn2_1_soc.clock_limits[k].socclk_mhz = clk_table->entries[i].socclk_mhz; - dcn2_1_soc.clock_limits[k].dram_speed_mts = clk_table->entries[i].memclk_mhz * 2; + dcn2_1_soc._clock_tmp[k].state = k; + dcn2_1_soc._clock_tmp[k].dcfclk_mhz = clk_table->entries[i].dcfclk_mhz; + dcn2_1_soc._clock_tmp[k].fabricclk_mhz = clk_table->entries[i].fclk_mhz; + dcn2_1_soc._clock_tmp[k].socclk_mhz = clk_table->entries[i].socclk_mhz; + dcn2_1_soc._clock_tmp[k].dram_speed_mts = clk_table->entries[i].memclk_mhz * 2; - dcn2_1_soc.clock_limits[k].dispclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].dispclk_mhz; - dcn2_1_soc.clock_limits[k].dppclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].dppclk_mhz; - dcn2_1_soc.clock_limits[k].dram_bw_per_chan_gbps = dcn2_1_soc.clock_limits[closest_clk_lvl].dram_bw_per_chan_gbps; - dcn2_1_soc.clock_limits[k].dscclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].dscclk_mhz; - dcn2_1_soc.clock_limits[k].dtbclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].dtbclk_mhz; - dcn2_1_soc.clock_limits[k].phyclk_d18_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].phyclk_d18_mhz; - dcn2_1_soc.clock_limits[k].phyclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].phyclk_mhz; + dcn2_1_soc._clock_tmp[k].dispclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].dispclk_mhz; + dcn2_1_soc._clock_tmp[k].dppclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].dppclk_mhz; + dcn2_1_soc._clock_tmp[k].dram_bw_per_chan_gbps = dcn2_1_soc.clock_limits[closest_clk_lvl].dram_bw_per_chan_gbps; + dcn2_1_soc._clock_tmp[k].dscclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].dscclk_mhz; + dcn2_1_soc._clock_tmp[k].dtbclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].dtbclk_mhz; + dcn2_1_soc._clock_tmp[k].phyclk_d18_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].phyclk_d18_mhz; + dcn2_1_soc._clock_tmp[k].phyclk_mhz = dcn2_1_soc.clock_limits[closest_clk_lvl].phyclk_mhz; k++; } + + memcpy(_1_soc.clock_limits, _1_soc._clock_tmp, + sizeof(dcn2_1_soc.clock_limits)); + if (clk_table->num_entries) { dcn2_1_soc.num_states = clk_table->num_entries + 1; /* fill in min DF PState */ diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn301/dcn301_fpu.c b/drivers/gpu/drm/amd/display/dc/dml/dcn301/dcn301_fpu.c index 62cf283d9f41..e4863f0bf0f6 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn301/dcn301_fpu.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn301/dcn301_fpu.c @@ -254,6 +254,9 @@ void dcn301_update_bw_bounding_box(struct dc *dc, struct clk_bw_params *bw_param dc_assert_fp_enabled(); +