RE: [PATCH] drm/amdgpu: modify mcbp implement for gfx9(v3)

2022-08-12 Thread Zhu, Jiadong
[AMD Official Use Only - General]

Hi Christian,

The details as follows:

> 1. Use unmap_queue package to trigger preemption on gfx9
> Add trailing fence to track the preemption done.

On gfx9, there is no single package to complete the mcbp request in a single 
frame like gfx10 does.
To send preemption on gfx9, kmd needs to:
1. emit a trailing fence on gfx ring, do not update the wptr to cp.
2. emit a write_reg to reset mmCP_VMID_PREEMPT after the trailing fence.
3. send unmap_queue to kiq ring with field rb_wptr which is the offset of 
trailing fence on gfx ring.

When cp fw receives the unmap_queue in mec, it will:
1. Store mmCP_RB0_WPTR from rb_wptr to kick GFX RB off.
2. write mmCP_VMID_PREEMPT as 0x to request preemption on all vmids. Then 
wait on mmCP_VMID_PREEMPT to become 0x0 indicating the preemption is complete.
3. the rest of pipeline would do the preemption according to the 
mmCP_VMID_PREEMPT until it hits the trailing fence.
4. after the trailing fence is signaled,  the write_reg to reset 
mmCP_VMID_PREEMPT unblocks the unmap_queue package to proceed.

The unmap_queue on gfx9 using rb_wptr is referred from the doc cp_packages_rn:
UNMAP_QUEUES
DW| Bits | Field  | Description
4b | 19:0 | rb_wptr | If ((engine_sel = 4) and (action = 3)) then preempted GFX 
queue’s new RB
pointer.

2. Modify emit_ce_meta emit_de_meta functions
> for the resumed ibs.
For preemption enabled ibs, kmd add preamble ib(ce/de meta) to initialize csa 
data before send the main ib. The csa is used to save/restore ib execution 
infos when preemption/resubmit happens.
KMD is responsible to extract the content from CSA during re-submission of a 
previously pre-empted DMA frame.
The patch is to write csa data for resubmit ibs with previous preempted ib's 
csa.

Thanks,
Jiadong


-Original Message-
From: Christian König 
Sent: Friday, August 12, 2022 7:39 PM
To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org
Cc: Huang, Ray ; Liu, Aaron 
Subject: Re: [PATCH] drm/amdgpu: modify mcbp implement for gfx9(v3)

[CAUTION: External Email]

Am 11.08.22 um 05:19 schrieb jiadong@amd.com:
> From: "Jiadong.Zhu" 
>
> 1. Use unmap_queue package to trigger preemption on gfx9
> Add trailing fence to track the preemption done.
> 2. Modify emit_ce_meta emit_de_meta functions
> for the resumed ibs.
>
> Signed-off-by: Jiadong.Zhu 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   1 +
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 161 ---
>   drivers/gpu/drm/amd/amdgpu/soc15d.h  |   2 +
>   3 files changed, 143 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index 82c178a9033a..ca626f0ad7b1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> @@ -59,6 +59,7 @@ enum amdgpu_ring_priority_level {
>   #define AMDGPU_FENCE_FLAG_64BIT (1 << 0)
>   #define AMDGPU_FENCE_FLAG_INT   (1 << 1)
>   #define AMDGPU_FENCE_FLAG_TC_WB_ONLY(1 << 2)

> +#define AMDGPU_FENCE_FLAG_EXEC  (1 << 3)

Ok, that here needs much more explanation why you need it and how all this is 
supposed to work?

Regards,
Christian.

>
>   #define to_amdgpu_ring(s) container_of((s), struct amdgpu_ring,
> sched)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index 5332899642dc..887021fd56aa 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -751,7 +751,7 @@ static void gfx_v9_0_set_rlc_funcs(struct amdgpu_device 
> *adev);
>   static int gfx_v9_0_get_cu_info(struct amdgpu_device *adev,
>   struct amdgpu_cu_info *cu_info);
>   static uint64_t gfx_v9_0_get_gpu_clock_counter(struct amdgpu_device
> *adev); -static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring
> *ring);
> +static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring *ring, bool
> +resume);
>   static u64 gfx_v9_0_ring_get_rptr_compute(struct amdgpu_ring *ring);
>   static void gfx_v9_0_query_ras_error_count(struct amdgpu_device *adev,
> void *ras_error_status); @@
> -824,9 +824,10 @@ static void gfx_v9_0_kiq_unmap_queues(struct amdgpu_ring 
> *kiq_ring,
>
> PACKET3_UNMAP_QUEUES_DOORBELL_OFFSET0(ring->doorbell_index));
>
>   if (action == PREEMPT_QUEUES_NO_UNMAP) {
> - amdgpu_ring_write(kiq_ring, lower_32_bits(gpu_addr));
> - amdgpu_ring_write(kiq_ring, upper_32_bits(gpu_addr));
> - amdgpu_ring_write(kiq_ring, seq);
> + amdgpu_ring_write(kiq_ring, lower_32_bits(ring->wptr & 
> ring->buf_mask));
> + amdgpu_ring_write(kiq_ring, 0);
> + amdgpu_ring_write(kiq_ring, 0);
> +
>   } else {
>   amdgpu_ring_write(kiq_ring, 0);
>   amdgpu_ring_write(kiq_ring, 0); @@ -5446,11 +5447,16 @@
> static void gfx_v9_0_ring_emit_ib_gfx(struct 

[PATCH 12/14] drm/amd/display: Don't set DSC for phantom pipes

2022-08-12 Thread brichang
From: Alvin Lee 

[Description]
Don't set DSC bit for phantom pipes, not
required since phantom pipe don't have
any actual output

Reviewed-by: Jun Lei 
Acked-by: Brian Chang 
Signed-off-by: Alvin Lee 
---
 drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
index 3316c4a64901..8118cfc5b405 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
@@ -493,6 +493,7 @@ void dcn32_set_phantom_stream_timing(struct dc *dc,

phantom_stream->timing.v_front_porch +

phantom_stream->timing.v_sync_width +
phantom_bp;
+   phantom_stream->timing.flags.DSC = 0; // Don't need DSC for phantom 
timing
 }
 
 /**
-- 
2.25.1



[PATCH 11/14] drm/amd/display: Update clock table policy for DCN314

2022-08-12 Thread brichang
From: Nicholas Kazlauskas 

[Why & How]
Depending on how the clock table is constructed from PMFW we can run
into issues where we don't think we have enough bandwidth available
due to FCLK too low - eg. when the FCLK table contains invalid entries
or a single entry.

We should always pick up the maximum clocks for each state as a final
state in this case to prevent validation from failing if the table is
malformed.

We should also contain sensible defaults in the case where values
are invalid.

Redfine the clock table structures by adding a 314 prefix to make
debugging these issues easier by avoiding symbol name clashes.

Overall this policy more closely aligns to how we did things for 315,
but because of how the voltage rail is setup we should favor keeping
DCFCLK low rather than DISPCLK or DPPCLK - so use the max for those
in every entry.

Reviewed-by: Daniel Miess 
Acked-by: Brian Chang 
Signed-off-by: Nicholas Kazlauskas 
---
 .../dc/clk_mgr/dcn314/dcn314_clk_mgr.c| 186 --
 .../display/dc/clk_mgr/dcn314/dcn314_smu.h|  33 +++-
 2 files changed, 154 insertions(+), 65 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn314/dcn314_clk_mgr.c 
b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn314/dcn314_clk_mgr.c
index c74f2d5bbbc5..beb025cd3dc2 100644
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn314/dcn314_clk_mgr.c
+++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn314/dcn314_clk_mgr.c
@@ -415,7 +415,7 @@ static struct wm_table lpddr5_wm_table = {
}
 };
 
-static DpmClocks_t dummy_clocks;
+static DpmClocks314_t dummy_clocks;
 
 static struct dcn314_watermarks dummy_wms = { 0 };
 
@@ -500,7 +500,7 @@ static void dcn314_notify_wm_ranges(struct clk_mgr 
*clk_mgr_base)
 static void dcn314_get_dpm_table_from_smu(struct clk_mgr_internal *clk_mgr,
struct dcn314_smu_dpm_clks *smu_dpm_clks)
 {
-   DpmClocks_t *table = smu_dpm_clks->dpm_clks;
+   DpmClocks314_t *table = smu_dpm_clks->dpm_clks;
 
if (!clk_mgr->smu_ver)
return;
@@ -517,6 +517,26 @@ static void dcn314_get_dpm_table_from_smu(struct 
clk_mgr_internal *clk_mgr,
dcn314_smu_transfer_dpm_table_smu_2_dram(clk_mgr);
 }
 
+static inline bool is_valid_clock_value(uint32_t clock_value)
+{
+   return clock_value > 1 && clock_value < 10;
+}
+
+static unsigned int convert_wck_ratio(uint8_t wck_ratio)
+{
+   switch (wck_ratio) {
+   case WCK_RATIO_1_2:
+   return 2;
+
+   case WCK_RATIO_1_4:
+   return 4;
+
+   default:
+   break;
+   }
+   return 1;
+}
+
 static uint32_t find_max_clk_value(const uint32_t clocks[], uint32_t 
num_clocks)
 {
uint32_t max = 0;
@@ -530,89 +550,127 @@ static uint32_t find_max_clk_value(const uint32_t 
clocks[], uint32_t num_clocks)
return max;
 }
 
-static unsigned int find_clk_for_voltage(
-   const DpmClocks_t *clock_table,
-   const uint32_t clocks[],
-   unsigned int voltage)
-{
-   int i;
-   int max_voltage = 0;
-   int clock = 0;
-
-   for (i = 0; i < NUM_SOC_VOLTAGE_LEVELS; i++) {
-   if (clock_table->SocVoltage[i] == voltage) {
-   return clocks[i];
-   } else if (clock_table->SocVoltage[i] >= max_voltage &&
-   clock_table->SocVoltage[i] < voltage) {
-   max_voltage = clock_table->SocVoltage[i];
-   clock = clocks[i];
-   }
-   }
-
-   ASSERT(clock);
-   return clock;
-}
-
 static void dcn314_clk_mgr_helper_populate_bw_params(struct clk_mgr_internal 
*clk_mgr,
struct integrated_info 
*bios_info,
-   const DpmClocks_t 
*clock_table)
+   const DpmClocks314_t 
*clock_table)
 {
-   int i, j;
struct clk_bw_params *bw_params = clk_mgr->base.bw_params;
-   uint32_t max_dispclk = 0, max_dppclk = 0;
-
-   j = -1;
-
-   ASSERT(NUM_DF_PSTATE_LEVELS <= MAX_NUM_DPM_LVL);
-
-   /* Find lowest DPM, FCLK is filled in reverse order*/
+   struct clk_limit_table_entry def_max = 
bw_params->clk_table.entries[bw_params->clk_table.num_entries - 1];
+   uint32_t max_pstate = 0,  max_fclk = 0,  min_pstate = 0, max_dispclk = 
0, max_dppclk = 0;
+   int i;
 
-   for (i = NUM_DF_PSTATE_LEVELS - 1; i >= 0; i--) {
-   if (clock_table->DfPstateTable[i].FClk != 0) {
-   j = i;
-   break;
+   /* Find highest valid fclk pstate */
+   for (i = 0; i < clock_table->NumDfPstatesEnabled; i++) {
+   if (is_valid_clock_value(clock_table->DfPstateTable[i].FClk) &&
+   clock_table->DfPstateTable[i].FClk > max_fclk) {
+   max_fclk = clock_table->DfPstateTable[i].FClk;
+   max_pstate = i;
 

[PATCH 14/14] drm/amd/display: avoid doing vm_init multiple time

2022-08-12 Thread brichang
From: Charlene Liu 

[why]
this is to ensure that driver will not reprogram hvm_prefetch_req again if
it is done.

Reviewed-by: Martin Leung 
Acked-by: Brian Chang 
Signed-off-by: Charlene Liu 
---
 drivers/gpu/drm/amd/display/dc/dcn21/dcn21_hubbub.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_hubbub.c 
b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_hubbub.c
index c5e200d09038..5752271f22df 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_hubbub.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_hubbub.c
@@ -67,9 +67,15 @@ static uint32_t convert_and_clamp(
 void dcn21_dchvm_init(struct hubbub *hubbub)
 {
struct dcn20_hubbub *hubbub1 = TO_DCN20_HUBBUB(hubbub);
-   uint32_t riommu_active;
+   uint32_t riommu_active, prefetch_done;
int i;
 
+   REG_GET(DCHVM_RIOMMU_STAT0, HOSTVM_PREFETCH_DONE, _done);
+
+   if (prefetch_done) {
+   hubbub->riommu_active = true;
+   return;
+   }
//Init DCHVM block
REG_UPDATE(DCHVM_CTRL0, HOSTVM_INIT_REQ, 1);
 
-- 
2.25.1



[PATCH 13/14] drm/amd/display: Use pitch when calculating size to cache in MALL

2022-08-12 Thread brichang
From: Alvin Lee 

[Description]
Use pitch when calculating size to cache in MALL

Reviewed-by: Samson Tam 
Acked-by: Brian Chang 
Signed-off-by: Alvin Lee 
---
 drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource_helpers.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource_helpers.c 
b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource_helpers.c
index b3f8503cea9c..955f52e6064d 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource_helpers.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource_helpers.c
@@ -63,7 +63,7 @@ uint32_t dcn32_helper_calculate_num_ways_for_subvp(struct dc 
*dc, struct dc_stat
if (pipe->stream && pipe->plane_state && !pipe->top_pipe &&
pipe->stream->mall_stream_config.type == 
SUBVP_PHANTOM) {
bytes_per_pixel = pipe->plane_state->format >= 
SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616 ? 8 : 4;
-   mall_region_pixels = pipe->stream->timing.h_addressable 
* pipe->stream->timing.v_addressable;
+   mall_region_pixels = 
pipe->plane_state->plane_size.surface_pitch * 
pipe->stream->timing.v_addressable;
 
// For bytes required in MALL, calculate based on 
number of MBlks required
num_mblks = (mall_region_pixels * bytes_per_pixel +
-- 
2.25.1



[PATCH 09/14] drm/amd/display: Include scaling factor for SubVP command

2022-08-12 Thread brichang
From: Alvin Lee 

[Description]
For SubVP scaling cases, we must include the scaling
info as part of the cmd. This is required when converting
OTG line to HUBP line for the MALL_START_LINE programming.

Reviewed-by: Jun Lei 
Acked-by: Brian Chang 
Signed-off-by: Alvin Lee 
---
 .../drm/amd/display/dc/basics/conversion.c| 21 +++
 .../drm/amd/display/dc/basics/conversion.h|  3 +++
 drivers/gpu/drm/amd/display/dc/dc_dmub_srv.c  | 11 ++
 .../amd/display/dc/dcn321/dcn321_resource.c   |  3 ++-
 4 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/basics/conversion.c 
b/drivers/gpu/drm/amd/display/dc/basics/conversion.c
index 6767fab55c26..352e9afb85c6 100644
--- a/drivers/gpu/drm/amd/display/dc/basics/conversion.c
+++ b/drivers/gpu/drm/amd/display/dc/basics/conversion.c
@@ -100,3 +100,24 @@ void convert_float_matrix(
matrix[i] = (uint16_t)reg_value;
}
 }
+
+static uint32_t find_gcd(uint32_t a, uint32_t b)
+{
+   uint32_t remainder = 0;
+   while (b != 0) {
+   remainder = a % b;
+   a = b;
+   b = remainder;
+   }
+   return a;
+}
+
+void reduce_fraction(uint32_t num, uint32_t den,
+   uint32_t *out_num, uint32_t *out_den)
+{
+   uint32_t gcd = 0;
+
+   gcd = find_gcd(num, den);
+   *out_num = num / gcd;
+   *out_den = den / gcd;
+}
diff --git a/drivers/gpu/drm/amd/display/dc/basics/conversion.h 
b/drivers/gpu/drm/amd/display/dc/basics/conversion.h
index ade785c4fdc7..81da4e6f7a1a 100644
--- a/drivers/gpu/drm/amd/display/dc/basics/conversion.h
+++ b/drivers/gpu/drm/amd/display/dc/basics/conversion.h
@@ -38,6 +38,9 @@ void convert_float_matrix(
struct fixed31_32 *flt,
uint32_t buffer_size);
 
+void reduce_fraction(uint32_t num, uint32_t den,
+   uint32_t *out_num, uint32_t *out_den);
+
 static inline unsigned int log_2(unsigned int num)
 {
return ilog2(num);
diff --git a/drivers/gpu/drm/amd/display/dc/dc_dmub_srv.c 
b/drivers/gpu/drm/amd/display/dc/dc_dmub_srv.c
index c8059c28ac49..09b304507bad 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_dmub_srv.c
+++ b/drivers/gpu/drm/amd/display/dc/dc_dmub_srv.c
@@ -29,6 +29,7 @@
 #include "dm_helpers.h"
 #include "dc_hw_types.h"
 #include "core_types.h"
+#include "../basics/conversion.h"
 
 #define CTX dc_dmub_srv->ctx
 #define DC_LOGGER CTX->logger
@@ -600,6 +601,7 @@ static void populate_subvp_cmd_pipe_info(struct dc *dc,

>fw_assisted_mclk_switch_v2.config_data.pipe_data[cmd_pipe_index];
struct dc_crtc_timing *main_timing = _pipe->stream->timing;
struct dc_crtc_timing *phantom_timing = 
_pipe->stream->mall_stream_config.paired_stream->timing;
+   uint32_t out_num, out_den;
 
pipe_data->mode = SUBVP;
pipe_data->pipe_config.subvp_data.pix_clk_100hz = 
subvp_pipe->stream->timing.pix_clk_100hz;
@@ -613,6 +615,15 @@ static void populate_subvp_cmd_pipe_info(struct dc *dc,
pipe_data->pipe_config.subvp_data.main_pipe_index = 
subvp_pipe->pipe_idx;
pipe_data->pipe_config.subvp_data.is_drr = 
subvp_pipe->stream->ignore_msa_timing_param;
 
+   /* Calculate the scaling factor from the src and dst height.
+* e.g. If 3840x2160 being downscaled to 1920x1080, the scaling factor 
is 1/2.
+* Reduce the fraction 1080/2160 = 1/2 for the "scaling factor"
+*/
+   reduce_fraction(subvp_pipe->stream->src.height, 
subvp_pipe->stream->dst.height, _num, _den);
+   // TODO: Uncomment below lines once DMCUB include headers are promoted
+   //pipe_data->pipe_config.subvp_data.scale_factor_numerator = out_num;
+   //pipe_data->pipe_config.subvp_data.scale_factor_denominator = out_den;
+
// Prefetch lines is equal to VACTIVE + BP + VSYNC
pipe_data->pipe_config.subvp_data.prefetch_lines =
phantom_timing->v_total - phantom_timing->v_front_porch;
diff --git a/drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c
index e9db5f8b6fdc..b9317d31e282 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c
@@ -1664,7 +1664,8 @@ static bool dcn321_resource_construct(
dc->caps.subvp_prefetch_end_to_mall_start_us = 15;
dc->caps.subvp_swath_height_margin_lines = 16;
dc->caps.subvp_pstate_allow_width_us = 20;
-
+   dc->caps.subvp_vertical_int_margin_us = 30;
+   
dc->caps.max_slave_planes = 1;
dc->caps.max_slave_yuv_planes = 1;
dc->caps.max_slave_rgb_planes = 1;
-- 
2.25.1



[PATCH 06/14] drm/amd/display: add chip revision to DCN32

2022-08-12 Thread brichang
From: Samson Tam 

[Why & How]
Add GC_11_0_3_A0 as a chip revision to the DCN32 family

Reviewed-by: Rodrigo Siqueira 
Acked-by: Brian Chang 
Signed-off-by: Samson Tam 
---
 drivers/gpu/drm/amd/display/include/dal_asic_id.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/include/dal_asic_id.h 
b/drivers/gpu/drm/amd/display/include/dal_asic_id.h
index e054f3494087..9f3558c0ef11 100644
--- a/drivers/gpu/drm/amd/display/include/dal_asic_id.h
+++ b/drivers/gpu/drm/amd/display/include/dal_asic_id.h
@@ -247,10 +247,12 @@ enum {
 #define AMDGPU_FAMILY_GC_11_0_1 148
 #define GC_11_0_0_A0 0x1
 #define GC_11_0_2_A0 0x10
+#define GC_11_0_3_A0 0x20
 #define GC_11_UNKNOWN 0xFF
 
 #define ASICREV_IS_GC_11_0_0(eChipRev) (eChipRev < GC_11_0_2_A0)
-#define ASICREV_IS_GC_11_0_2(eChipRev) (eChipRev >= GC_11_0_2_A0 && eChipRev < 
GC_11_UNKNOWN)
+#define ASICREV_IS_GC_11_0_2(eChipRev) (eChipRev >= GC_11_0_2_A0 && eChipRev < 
GC_11_0_3_A0)
+#define ASICREV_IS_GC_11_0_3(eChipRev) (eChipRev >= GC_11_0_3_A0 && eChipRev < 
GC_11_UNKNOWN)
 
 /*
  * ASIC chip ID
-- 
2.25.1



[PATCH 08/14] drm/amd/display: Fix plug/unplug external monitor will hang while playback MPO video

2022-08-12 Thread brichang
From: Tom Chung 

[Why]
Pipes for MPO primary and overlay will be power down and power up during
plug/unplug external monitor while MPO video playback.
But the pipes were the same after plug/unplug and should not need to be
power down and power up or it will make page flip interrupt disabled and
cause hang issue.

[How]
Add pipe split change condition that not only check the top pipe pointer
but also check the index of top pipe if both top pipes are available.

Reviewed-by: Sun peng Li 
Acked-by: Brian Chang 
Signed-off-by: Tom Chung 
---
 drivers/gpu/drm/amd/display/dc/core/dc.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index b3e7361cd158..0fade2f1efb5 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -1077,8 +1077,15 @@ static void disable_dangling_plane(struct dc *dc, struct 
dc_state *context)
struct dc_stream_state *old_stream =
dc->current_state->res_ctx.pipe_ctx[i].stream;
bool should_disable = true;
-   bool pipe_split_change =
-   context->res_ctx.pipe_ctx[i].top_pipe != 
dc->current_state->res_ctx.pipe_ctx[i].top_pipe;
+   bool pipe_split_change = false;
+
+   if ((context->res_ctx.pipe_ctx[i].top_pipe) &&
+   (dc->current_state->res_ctx.pipe_ctx[i].top_pipe))
+   pipe_split_change = 
context->res_ctx.pipe_ctx[i].top_pipe->pipe_idx !=
+   
dc->current_state->res_ctx.pipe_ctx[i].top_pipe->pipe_idx;
+   else
+   pipe_split_change = 
context->res_ctx.pipe_ctx[i].top_pipe !=
+   dc->current_state->res_ctx.pipe_ctx[i].top_pipe;
 
for (j = 0; j < context->stream_count; j++) {
if (old_stream == context->streams[j]) {
-- 
2.25.1



[PATCH 07/14] drm/amd/display: Add debug parameter to retain default clock table

2022-08-12 Thread brichang
From: Daniel Miess 

[Why]
Need a way to retain default clock table to aid
the investigation into why 8k@30 display not
lighting up on dcn314

[How]
Use flag to prevent execution of bw_params helper
function and function for updating bw_bounding_box

Reviewed-by: Nicholas Kazlauskas 
Reviewed-by: Jun Lei 
Acked-by: Brian Chang 
Signed-off-by: Daniel Miess 
---
 drivers/gpu/drm/amd/display/dc/clk_mgr/dcn314/dcn314_clk_mgr.c | 2 +-
 drivers/gpu/drm/amd/display/dc/dc.h| 1 +
 drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c | 2 +-
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn314/dcn314_clk_mgr.c 
b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn314/dcn314_clk_mgr.c
index 7af19823a29d..c74f2d5bbbc5 100644
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn314/dcn314_clk_mgr.c
+++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn314/dcn314_clk_mgr.c
@@ -719,7 +719,7 @@ void dcn314_clk_mgr_construct(
if (clk_mgr->base.base.ctx->dc->debug.pstate_enabled) {
dcn314_get_dpm_table_from_smu(_mgr->base, _dpm_clks);
 
-   if (ctx->dc_bios && ctx->dc_bios->integrated_info) {
+   if (ctx->dc_bios && ctx->dc_bios->integrated_info && 
ctx->dc->config.use_default_clock_table == false) {
dcn314_clk_mgr_helper_populate_bw_params(
_mgr->base,
ctx->dc_bios->integrated_info,
diff --git a/drivers/gpu/drm/amd/display/dc/dc.h 
b/drivers/gpu/drm/amd/display/dc/dc.h
index 5de2c445ac31..cc2e9b572b87 100644
--- a/drivers/gpu/drm/amd/display/dc/dc.h
+++ b/drivers/gpu/drm/amd/display/dc/dc.h
@@ -404,6 +404,7 @@ struct dc_config {
bool use_pipe_ctx_sync_logic;
bool ignore_dpref_ss;
bool enable_mipi_converter_optimization;
+   bool use_default_clock_table;
 };
 
 enum visual_confirm {
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c
index c80307a6af1b..34a5d0f87b5f 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c
@@ -189,7 +189,7 @@ void dcn314_update_bw_bounding_box_fpu(struct dc *dc, 
struct clk_bw_params *bw_p
dc_assert_fp_enabled();
 
// Default clock levels are used for diags, which may lead to 
overclocking.
-   if (!IS_DIAG_DC(dc->ctx->dce_environment)) {
+   if (!IS_DIAG_DC(dc->ctx->dce_environment) && 
dc->config.use_default_clock_table == false) {
 
dcn3_14_ip.max_num_otg = 
dc->res_pool->res_cap->num_timing_generator;
dcn3_14_ip.max_num_dpp = dc->res_pool->pipe_count;
-- 
2.25.1



[PATCH 04/14] Add reserved dc_log_type.

2022-08-12 Thread brichang
From: Ian Chen 

Reviewed-by: Anthony Koo 
Acked-by: Brian Chang 
Signed-off-by: Ian Chen 
---
 drivers/gpu/drm/amd/display/include/logger_types.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/include/logger_types.h 
b/drivers/gpu/drm/amd/display/include/logger_types.h
index f093b49c5e6e..3bf08a60c45c 100644
--- a/drivers/gpu/drm/amd/display/include/logger_types.h
+++ b/drivers/gpu/drm/amd/display/include/logger_types.h
@@ -119,13 +119,15 @@ enum dc_log_type {
LOG_HDMI_RETIMER_REDRIVER,
LOG_DSC,
LOG_SMU_MSG,
+   LOG_DC2RESERVED4,
+   LOG_DC2RESERVED5,
LOG_DWB,
LOG_GAMMA_DEBUG,
LOG_MAX_HW_POINTS,
LOG_ALL_TF_CHANNELS,
LOG_SAMPLE_1DLUT,
LOG_DP2,
-   LOG_SECTION_TOTAL_COUNT
+   LOG_DC2RESERVED12,
 };
 
 #define DC_MIN_LOG_MASK ((1 << LOG_ERROR) | \
-- 
2.25.1



[PATCH 02/14] drm/amd/display: 3.2.198

2022-08-12 Thread brichang
From: Aric Cyr 

This version brings along following fixes:

-Fix edp panel missing event
-Set ARGB16161616 pixel format to 26
-Fix dcn32 interger issue
-Clear optc underflow bit after ODM clock off
-Fix issue with stereo3D
-Fix DML2 lightup issue
-Correct DTBCLK for dcn314
-Revert for a regression
-Fix clocks and bugs in DML2
-Enable SubVP by defalut on DCN32 & DCN321
-Corret boundary condition for engin ID on DCN303
-Fix FRL encoder override registry key
-Fix VPG for dcn314 HPO
-Fix Linux compile-time warning
-Add new prefetch modes in DML for DCN32

Acked-by: Brian Chang 
Signed-off-by: Aric Cyr 
---
 drivers/gpu/drm/amd/display/dc/dc.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dc.h 
b/drivers/gpu/drm/amd/display/dc/dc.h
index eec3c7832fd9..5de2c445ac31 100644
--- a/drivers/gpu/drm/amd/display/dc/dc.h
+++ b/drivers/gpu/drm/amd/display/dc/dc.h
@@ -47,7 +47,7 @@ struct aux_payload;
 struct set_config_cmd_payload;
 struct dmub_notification;
 
-#define DC_VER "3.2.197"
+#define DC_VER "3.2.198"
 
 #define MAX_SURFACES 3
 #define MAX_PLANES 6
-- 
2.25.1



[PATCH 03/14] drm/amd/display: Fix pixel clock programming

2022-08-12 Thread brichang
From: Ilya Bakoulin 

[Why]
Some pixel clock values could cause HDMI TMDS SSCPs to be misaligned
between different HDMI lanes when using YCbCr420 10-bit pixel format.

BIOS functions for transmitter/encoder control take pixel clock in kHz
increments, whereas the function for setting the pixel clock is in 100Hz
increments. Setting pixel clock to a value that is not on a kHz boundary
will cause the issue.

[How]
Round pixel clock down to nearest kHz in 10/12-bpc cases.

Reviewed-by: Aric Cyr 
Acked-by: Brian Chang 
Signed-off-by: Ilya Bakoulin 
---
 drivers/gpu/drm/amd/display/dc/dce/dce_clock_source.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/dce/dce_clock_source.c 
b/drivers/gpu/drm/amd/display/dc/dce/dce_clock_source.c
index 213de8cabfad..165392380842 100644
--- a/drivers/gpu/drm/amd/display/dc/dce/dce_clock_source.c
+++ b/drivers/gpu/drm/amd/display/dc/dce/dce_clock_source.c
@@ -543,9 +543,11 @@ static void dce112_get_pix_clk_dividers_helper (
switch (pix_clk_params->color_depth) {
case COLOR_DEPTH_101010:
actual_pixel_clock_100hz = (actual_pixel_clock_100hz * 
5) >> 2;
+   actual_pixel_clock_100hz -= actual_pixel_clock_100hz % 
10;
break;
case COLOR_DEPTH_121212:
actual_pixel_clock_100hz = (actual_pixel_clock_100hz * 
6) >> 2;
+   actual_pixel_clock_100hz -= actual_pixel_clock_100hz % 
10;
break;
case COLOR_DEPTH_161616:
actual_pixel_clock_100hz = actual_pixel_clock_100hz * 2;
-- 
2.25.1



[PATCH 05/14] drm/amd/display: do not compare integers of different widths

2022-08-12 Thread brichang
From: Josip Pavic 

[Why & How]
Increase width of some variables to avoid comparing integers of
different widths.

Reviewed-by: Alvin Lee 
Acked-by: Brian Chang 
Signed-off-by: Josip Pavic 
---
 drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
index 4aecbf230446..ebd3945c71f1 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
@@ -331,7 +331,8 @@ static uint32_t dcn32_calculate_cab_allocation(struct dc 
*dc, struct dc_state *c
 bool dcn32_apply_idle_power_optimizations(struct dc *dc, bool enable)
 {
union dmub_rb_cmd cmd;
-   uint8_t ways, i, j;
+   uint8_t ways, i;
+   int j;
bool stereo_in_use = false;
struct dc_plane_state *plane = NULL;
 
-- 
2.25.1



[PATCH 01/14] drm/amd/display: reverted limiting vscsdp_for_colorimetry and ARGB16161616 pixel format addition

2022-08-12 Thread brichang
From: Ethan Wellenreiter 

[WHY]
Limiting vscsdp_for_colorimetry for YCbCr420/BT2020 resulted in red/green
point failures in HDR10 DTN tests. The re-implementation of ARGB16161616
was to fix this however it did not actually fix this issue but a side effect of 
the
issue.

[HOW]
Change ARGB16161616 pixel format to 26.

Reviewed-by: Martin Leung 
Acked-by: Brian Chang 
Signed-off-by: Ethan Wellenreiter 
---
 drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp.c  | 2 --
 drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.c | 3 ---
 drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c  | 2 --
 drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c | 3 ---
 drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c  | 2 --
 5 files changed, 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp.c 
b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp.c
index d4a6504dfe00..db7ca4b0cdb9 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp.c
@@ -361,8 +361,6 @@ void dpp1_cnv_setup (
select = INPUT_CSC_SELECT_ICSC;
break;
case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616:
-   pixel_format = 22;
-   break;
case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616:
pixel_format = 26; /* ARGB16161616_UNORM */
break;
diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.c 
b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.c
index b54c12400323..564e061ccb58 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.c
@@ -278,9 +278,6 @@ void hubp1_program_pixel_format(
SURFACE_PIXEL_FORMAT, 10);
break;
case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616:
-   REG_UPDATE(DCSURF_SURFACE_CONFIG,
-   SURFACE_PIXEL_FORMAT, 22);
-   break;
case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616: /*we use crossbar already*/
REG_UPDATE(DCSURF_SURFACE_CONFIG,
SURFACE_PIXEL_FORMAT, 26); /* 
ARGB16161616_UNORM */
diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c
index ea1f14af0db7..eaa7032f0f1a 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c
@@ -166,8 +166,6 @@ static void dpp2_cnv_setup (
select = DCN2_ICSC_SELECT_ICSC_A;
break;
case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616:
-   pixel_format = 22;
-   break;
case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616:
pixel_format = 26; /* ARGB16161616_UNORM */
break;
diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c
index 936af65381ef..9570c2118ccc 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c
@@ -463,9 +463,6 @@ void hubp2_program_pixel_format(
SURFACE_PIXEL_FORMAT, 10);
break;
case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616:
-   REG_UPDATE(DCSURF_SURFACE_CONFIG,
-   SURFACE_PIXEL_FORMAT, 22);
-   break;
case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616: /*we use crossbar already*/
REG_UPDATE(DCSURF_SURFACE_CONFIG,
SURFACE_PIXEL_FORMAT, 26); /* 
ARGB16161616_UNORM */
diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c 
b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c
index 77b00f86c216..4a668d6563df 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c
@@ -244,8 +244,6 @@ void dpp3_cnv_setup (
select = INPUT_CSC_SELECT_ICSC;
break;
case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616:
-   pixel_format = 22;
-   break;
case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616:
pixel_format = 26; /* ARGB16161616_UNORM */
break;
-- 
2.25.1



[PATCH 00/14] DC Patches August 12, 2022

2022-08-12 Thread brichang
This DC patchset brings improvements in multiple areas. In summary, we have:

* Fix ARGB16161616 pixel format;
* Fix pixel clock in 10/12-bpc;
* Add reserved dc_log_type;
* Fix some variables widths in dc;
* Add chip version GC_11_0_3_A0 to DCN32 family;
* Fix light up bug with dcn314 with 8K@30; 
* Fix monitor hang while playback MPO video;
* Include scaling factor for SubVP command;
* Improve header inclusion pattern;
* Update clock table policy for DCN314;
* Modify DSC bit for phantom pipes;
* Use pitch when calculating size to cache in MALL;
* Avoid doing vm_init multiple time;

Alvin Lee (3):
  drm/amd/display: Include scaling factor for SubVP command
  drm/amd/display: Don't set DSC for phantom pipes
  drm/amd/display: Use pitch when calculating size to cache in MALL

Aric Cyr (1):
  drm/amd/display: 3.2.198

Chaitanya Dhere (1):
  drm/amd/display: Modify header inclusion pattern

Charlene Liu (1):
  drm/amd/display: avoid doing vm_init multiple time

Daniel Miess (1):
  drm/amd/display: Add debug parameter to retain default clock table

Ethan Wellenreiter (1):
  drm/amd/display: reverted limiting vscsdp_for_colorimetry and
ARGB16161616 pixel format addition

Ian Chen (1):
  Add reserved dc_log_type.

Ilya Bakoulin (1):
  drm/amd/display: Fix pixel clock programming

Josip Pavic (1):
  drm/amd/display: do not compare integers of different widths

Nicholas Kazlauskas (1):
  drm/amd/display: Update clock table policy for DCN314

Samson Tam (1):
  drm/amd/display: add chip revision to DCN32

Tom Chung (1):
  drm/amd/display: Fix plug/unplug external monitor will hang while
playback MPO video

 .../drm/amd/display/dc/basics/conversion.c|  21 ++
 .../drm/amd/display/dc/basics/conversion.h|   3 +
 .../dc/clk_mgr/dcn314/dcn314_clk_mgr.c| 188 --
 .../display/dc/clk_mgr/dcn314/dcn314_smu.h|  33 ++-
 drivers/gpu/drm/amd/display/dc/core/dc.c  |  11 +-
 drivers/gpu/drm/amd/display/dc/dc.h   |   3 +-
 drivers/gpu/drm/amd/display/dc/dc_dmub_srv.c  |  11 +
 .../drm/amd/display/dc/dce/dce_clock_source.c |   2 +
 .../gpu/drm/amd/display/dc/dcn10/dcn10_dpp.c  |   2 -
 .../gpu/drm/amd/display/dc/dcn10/dcn10_hubp.c |   3 -
 .../gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c  |   2 -
 .../gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c |   3 -
 .../drm/amd/display/dc/dcn21/dcn21_hubbub.c   |   8 +-
 .../gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c  |   2 -
 .../drm/amd/display/dc/dcn32/dcn32_hwseq.c|   3 +-
 .../display/dc/dcn32/dcn32_resource_helpers.c |   2 +-
 .../amd/display/dc/dcn321/dcn321_resource.c   |   3 +-
 .../dc/dml/dcn31/display_mode_vba_31.c|   2 +-
 .../dc/dml/dcn31/display_rq_dlg_calc_31.c |   2 +-
 .../amd/display/dc/dml/dcn314/dcn314_fpu.c|   2 +-
 .../drm/amd/display/dc/dml/dcn32/dcn32_fpu.c  |   1 +
 .../gpu/drm/amd/display/include/dal_asic_id.h |   4 +-
 .../drm/amd/display/include/logger_types.h|   4 +-
 23 files changed, 225 insertions(+), 90 deletions(-)

-- 
2.25.1



[PATCH 6/6] drm/amdgpu: Bump amdgpu driver version.

2022-08-12 Thread Bas Nieuwenhuizen
For detection of the new explicit sync functionality without
having to try the ioctl.

Signed-off-by: Bas Nieuwenhuizen 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 8890300766a5..6d92e8846b21 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -101,9 +101,10 @@
  * - 3.45.0 - Add context ioctl stable pstate interface
  * - 3.46.0 - To enable hot plug amdgpu tests in libdrm
  * * 3.47.0 - Add AMDGPU_GEM_CREATE_DISCARDABLE and AMDGPU_VM_NOALLOC flags
+ * - 3.48.0 - Add AMDGPU_CTX_OP_SET_IMPLICIT_SYNC context operation.
  */
 #define KMS_DRIVER_MAJOR   3
-#define KMS_DRIVER_MINOR   47
+#define KMS_DRIVER_MINOR   48
 #define KMS_DRIVER_PATCHLEVEL  0
 
 int amdgpu_vram_limit;
-- 
2.37.1



[PATCH 2/6] drm/amdgpu: Add separate mode for syncing DMA_RESV_USAGE_BOOKKEEP.

2022-08-12 Thread Bas Nieuwenhuizen
To prep for allowing different sync modes in a follow-up patch.

Signed-off-by: Bas Nieuwenhuizen 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c   | 11 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h   |  3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 11 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h |  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c  |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c  |  2 +-
 10 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index a6eb7697c936..746f44c1c3f9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1158,7 +1158,7 @@ static int process_sync_pds_resv(struct 
amdkfd_process_info *process_info,
struct amdgpu_bo *pd = peer_vm->root.bo;
 
ret = amdgpu_sync_resv(NULL, sync, pd->tbo.base.resv,
-  AMDGPU_SYNC_NE_OWNER,
+  AMDGPU_SYNC_NE_OWNER, 
AMDGPU_SYNC_NE_OWNER,
   AMDGPU_FENCE_OWNER_KFD);
if (ret)
return ret;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index f1ceb25d1b84..91958e9db90b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -675,7 +675,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p)
sync_mode = amdgpu_bo_explicit_sync(bo) ?
AMDGPU_SYNC_EXPLICIT : AMDGPU_SYNC_NE_OWNER;
r = amdgpu_sync_resv(p->adev, >job->sync, resv, sync_mode,
->vm);
+AMDGPU_SYNC_EXPLICIT, >vm);
if (r)
return r;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 2c82b1d5a0d7..20c45f502536 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1410,7 +1410,8 @@ void amdgpu_bo_fence(struct amdgpu_bo *bo, struct 
dma_fence *fence,
  *
  * @adev: amdgpu device pointer
  * @resv: reservation object to sync to
- * @sync_mode: synchronization mode
+ * @implicit_sync_mode: synchronization mode for usage <= DMA_RESV_USAGE_READ
+ * @explicit_sync_mode: synchronization mode for usage DMA_RESV_USAGE_BOOKKEEP
  * @owner: fence owner
  * @intr: Whether the wait is interruptible
  *
@@ -1420,14 +1421,15 @@ void amdgpu_bo_fence(struct amdgpu_bo *bo, struct 
dma_fence *fence,
  * 0 on success, errno otherwise.
  */
 int amdgpu_bo_sync_wait_resv(struct amdgpu_device *adev, struct dma_resv *resv,
-enum amdgpu_sync_mode sync_mode, void *owner,
+enum amdgpu_sync_mode implicit_sync_mode,
+enum amdgpu_sync_mode explicit_sync_mode, void 
*owner,
 bool intr)
 {
struct amdgpu_sync sync;
int r;
 
amdgpu_sync_create();
-   amdgpu_sync_resv(adev, , resv, sync_mode, owner);
+   amdgpu_sync_resv(adev, , resv, implicit_sync_mode, 
explicit_sync_mode, owner);
r = amdgpu_sync_wait(, intr);
amdgpu_sync_free();
return r;
@@ -1448,7 +1450,8 @@ int amdgpu_bo_sync_wait(struct amdgpu_bo *bo, void 
*owner, bool intr)
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
 
return amdgpu_bo_sync_wait_resv(adev, bo->tbo.base.resv,
-   AMDGPU_SYNC_NE_OWNER, owner, intr);
+   AMDGPU_SYNC_NE_OWNER, 
AMDGPU_SYNC_EXPLICIT,
+   owner, intr);
 }
 
 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index 147b79c10cbb..36ce9abb579c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -320,7 +320,8 @@ vm_fault_t amdgpu_bo_fault_reserve_notify(struct 
ttm_buffer_object *bo);
 void amdgpu_bo_fence(struct amdgpu_bo *bo, struct dma_fence *fence,
 bool shared);
 int amdgpu_bo_sync_wait_resv(struct amdgpu_device *adev, struct dma_resv *resv,
-enum amdgpu_sync_mode sync_mode, void *owner,
+enum amdgpu_sync_mode implicit_sync_mode,
+enum amdgpu_sync_mode explicit_sync_mode, void 
*owner,
 bool intr);
 int amdgpu_bo_sync_wait(struct amdgpu_bo *bo, void *owner, bool intr);
 u64 

[PATCH 4/6] drm/amdgpu: Refactor amdgpu_vm_get_pd_bo.

2022-08-12 Thread Bas Nieuwenhuizen
We want to take only a BOOKKEEP usage for contexts that are not
implicitly synced.

Signed-off-by: Bas Nieuwenhuizen 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 9 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c  | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c  | 4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c  | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   | 6 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h   | 3 ++-
 7 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 746f44c1c3f9..cc4fcc82eec1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -923,7 +923,7 @@ static int reserve_bo_and_vm(struct kgd_mem *mem,
ctx->kfd_bo.tv.usage = DMA_RESV_USAGE_READ;
list_add(>kfd_bo.tv.head, >list);
 
-   amdgpu_vm_get_pd_bo(vm, >list, >vm_pd[0]);
+   amdgpu_vm_get_pd_bo(vm, >list, >vm_pd[0], 
DMA_RESV_USAGE_READ);
 
ret = ttm_eu_reserve_buffers(>ticket, >list,
 false, >duplicates);
@@ -995,7 +995,7 @@ static int reserve_bo_and_cond_vms(struct kgd_mem *mem,
continue;
 
amdgpu_vm_get_pd_bo(entry->bo_va->base.vm, >list,
-   >vm_pd[i]);
+   >vm_pd[i], DMA_RESV_USAGE_READ);
i++;
}
 
@@ -2203,7 +2203,7 @@ static int validate_invalid_user_pages(struct 
amdkfd_process_info *process_info)
list_for_each_entry(peer_vm, _info->vm_list_head,
vm_list_node)
amdgpu_vm_get_pd_bo(peer_vm, _list,
-   _bo_list_entries[i++]);
+   _bo_list_entries[i++], 
DMA_RESV_USAGE_READ);
/* Add the userptr_inval_list entries to resv_list */
list_for_each_entry(mem, _info->userptr_inval_list,
validate_list.head) {
@@ -2399,7 +2399,8 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, 
struct dma_fence **ef)
mutex_lock(_info->lock);
list_for_each_entry(peer_vm, _info->vm_list_head,
vm_list_node)
-   amdgpu_vm_get_pd_bo(peer_vm, , _bo_list[i++]);
+   amdgpu_vm_get_pd_bo(peer_vm, , _bo_list[i++],
+   DMA_RESV_USAGE_READ);
 
/* Reserve all BOs and page tables/directory. Add all BOs from
 * kfd_bo_list to ctx.list
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 91958e9db90b..175fc2c2feec 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -531,7 +531,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
amdgpu_bo_list_get_list(p->bo_list, >validated);
 
INIT_LIST_HEAD();
-   amdgpu_vm_get_pd_bo(>vm, >validated, >vm_pd);
+   amdgpu_vm_get_pd_bo(>vm, >validated, >vm_pd, 
DMA_RESV_USAGE_READ);
 
if (p->uf_entry.tv.bo && !ttm_to_amdgpu_bo(p->uf_entry.tv.bo)->parent)
list_add(>uf_entry.tv.head, >validated);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
index 24941ed1a5ec..0cc2c863808f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -78,7 +78,7 @@ int amdgpu_map_static_csa(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
csa_tv.usage = DMA_RESV_USAGE_READ;
 
list_add(_tv.head, );
-   amdgpu_vm_get_pd_bo(vm, , );
+   amdgpu_vm_get_pd_bo(vm, , , DMA_RESV_USAGE_READ);
 
r = ttm_eu_reserve_buffers(, , true, NULL);
if (r) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index f8cf52eb1931..0f0e0acec691 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -211,7 +211,7 @@ static void amdgpu_gem_object_close(struct drm_gem_object 
*obj,
tv.usage = DMA_RESV_USAGE_READ;
list_add(, );
 
-   amdgpu_vm_get_pd_bo(vm, , _pd);
+   amdgpu_vm_get_pd_bo(vm, , _pd, DMA_RESV_USAGE_READ);
 
r = ttm_eu_reserve_buffers(, , false, );
if (r) {
@@ -747,7 +747,7 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
abo = NULL;
}
 
-   amdgpu_vm_get_pd_bo(>vm, , _pd);
+   amdgpu_vm_get_pd_bo(>vm, , _pd, DMA_RESV_USAGE_READ);
 
r = ttm_eu_reserve_buffers(, , true, );
if (r)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index 6b1da37c2280..85205754 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -999,7 +999,7 @@ int 

[PATCH 3/6] drm/amdgpu: Allow explicit sync for VM ops.

2022-08-12 Thread Bas Nieuwenhuizen
This should be okay because moves themselves use KERNEL usage and
hence still sync with BOOKKEEP usage. Then any later submits still
wait on any pending VM operations.

(i.e. we only made VM ops not wait on BOOKKEEP submits, not the other
 way around)

Signed-off-by: Bas Nieuwenhuizen 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c  | 3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c
index f10332e1c6c0..e898a549f86d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c
@@ -51,7 +51,8 @@ static int amdgpu_vm_cpu_prepare(struct 
amdgpu_vm_update_params *p,
if (!resv)
return 0;
 
-   return amdgpu_bo_sync_wait_resv(p->adev, resv, sync_mode, sync_mode, 
p->vm, true);
+   return amdgpu_bo_sync_wait_resv(p->adev, resv, sync_mode,
+   AMDGPU_SYNC_EXPLICIT, p->vm, true);
 }
 
 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
index 6ec6217f0b0e..9233ea3c9404 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
@@ -75,7 +75,8 @@ static int amdgpu_vm_sdma_prepare(struct 
amdgpu_vm_update_params *p,
if (!resv)
return 0;
 
-   return amdgpu_sync_resv(p->adev, >job->sync, resv, sync_mode, 
sync_mode, p->vm);
+   return amdgpu_sync_resv(p->adev, >job->sync, resv, sync_mode,
+   AMDGPU_SYNC_EXPLICIT, p->vm);
 }
 
 /**
-- 
2.37.1



[PATCH 5/6] drm/amdgpu: Add option to disable implicit sync for a context.

2022-08-12 Thread Bas Nieuwenhuizen
This changes all BO usages in a submit to BOOKKEEP instead of READ,
which effectively disables implicit sync for these submits.

This is configured at a context level using the existing IOCTL.

Signed-off-by: Bas Nieuwenhuizen 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 13 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 32 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h |  1 +
 include/uapi/drm/amdgpu_drm.h   |  3 +++
 4 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 175fc2c2feec..5246defa4de8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -500,6 +500,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
struct amdgpu_bo *gws;
struct amdgpu_bo *oa;
int r;
+   enum dma_resv_usage resv_usage;
 
INIT_LIST_HEAD(>validated);
 
@@ -522,16 +523,19 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser 
*p,
 
mutex_lock(>bo_list->bo_list_mutex);
 
+   resv_usage = p->ctx->disable_implicit_sync ? DMA_RESV_USAGE_BOOKKEEP :
+DMA_RESV_USAGE_READ;
+
/* One for TTM and one for the CS job */
amdgpu_bo_list_for_each_entry(e, p->bo_list) {
e->tv.num_shared = 2;
-   e->tv.usage = DMA_RESV_USAGE_READ;
+   e->tv.usage = resv_usage;
}
 
amdgpu_bo_list_get_list(p->bo_list, >validated);
 
INIT_LIST_HEAD();
-   amdgpu_vm_get_pd_bo(>vm, >validated, >vm_pd, 
DMA_RESV_USAGE_READ);
+   amdgpu_vm_get_pd_bo(>vm, >validated, >vm_pd, resv_usage);
 
if (p->uf_entry.tv.bo && !ttm_to_amdgpu_bo(p->uf_entry.tv.bo)->parent)
list_add(>uf_entry.tv.head, >validated);
@@ -672,7 +676,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p)
struct dma_resv *resv = bo->tbo.base.resv;
enum amdgpu_sync_mode sync_mode;
 
-   sync_mode = amdgpu_bo_explicit_sync(bo) ?
+   sync_mode = (amdgpu_bo_explicit_sync(bo) || 
p->ctx->disable_implicit_sync) ?
AMDGPU_SYNC_EXPLICIT : AMDGPU_SYNC_NE_OWNER;
r = amdgpu_sync_resv(p->adev, >job->sync, resv, sync_mode,
 AMDGPU_SYNC_EXPLICIT, >vm);
@@ -1287,7 +1291,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
/* Make sure all BOs are remembered as writers */
amdgpu_bo_list_for_each_entry(e, p->bo_list) {
e->tv.num_shared = 0;
-   e->tv.usage = DMA_RESV_USAGE_WRITE;
+   e->tv.usage = p->ctx->disable_implicit_sync ? 
DMA_RESV_USAGE_BOOKKEEP
+   : 
DMA_RESV_USAGE_WRITE;
}
 
ttm_eu_fence_buffer_objects(>ticket, >validated, p->fence);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 7dc92ef36b2b..c01140a449da 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -596,8 +596,6 @@ static int amdgpu_ctx_query2(struct amdgpu_device *adev,
return 0;
 }
 
-
-
 static int amdgpu_ctx_stable_pstate(struct amdgpu_device *adev,
struct amdgpu_fpriv *fpriv, uint32_t id,
bool set, u32 *stable_pstate)
@@ -626,6 +624,30 @@ static int amdgpu_ctx_stable_pstate(struct amdgpu_device 
*adev,
return r;
 }
 
+static int amdgpu_ctx_set_implicit_sync(struct amdgpu_device *adev,
+   struct amdgpu_fpriv *fpriv, uint32_t id,
+   bool enable)
+{
+   struct amdgpu_ctx *ctx;
+   struct amdgpu_ctx_mgr *mgr;
+
+   if (!fpriv)
+   return -EINVAL;
+
+   mgr = >ctx_mgr;
+   mutex_lock(>lock);
+   ctx = idr_find(>ctx_handles, id);
+   if (!ctx) {
+   mutex_unlock(>lock);
+   return -EINVAL;
+   }
+
+   ctx->disable_implicit_sync = !enable;
+
+   mutex_unlock(>lock);
+   return 0;
+}
+
 int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
 struct drm_file *filp)
 {
@@ -674,6 +696,12 @@ int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
return -EINVAL;
r = amdgpu_ctx_stable_pstate(adev, fpriv, id, true, 
_pstate);
break;
+   case AMDGPU_CTX_OP_SET_IMPLICIT_SYNC:
+   if ((args->in.flags & ~AMDGPU_CTX_IMPICIT_SYNC_ENABLED) || 
args->in.priority)
+   return -EINVAL;
+   r = amdgpu_ctx_set_implicit_sync(adev, fpriv, id,
+args->in.flags & 
~AMDGPU_CTX_IMPICIT_SYNC_ENABLED);
+   break;
default:
return -EINVAL;
}
diff --git 

[PATCH 0/6] amdgpu: Allow explicitly synchronized submissions.

2022-08-12 Thread Bas Nieuwenhuizen
This adds a context option to use DMA_RESV_USAGE_BOOKKEEP for userspace 
submissions,
based on Christians TTM work.

Disabling implicit sync is something we've wanted in radv for a while for 
resolving
some corner cases. A more immediate thing that would be solved here is avoiding 
a
bunch of implicit sync on GPU map/unmap operations as well, which helps with 
stutter
around sparse maps/unmaps.

This has seen a significant improvement in stutter in Forza Horizon 5 and Forza
Horizon 4. (As games that had significant issues in sparse binding related 
stutter).
I've been able to pass a full vulkan-cts run on navi21 with this.

Userspace code for this is available at
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18032 and a branch
for the kernel code is available at
https://github.com/BNieuwenhuizen/linux/tree/no-implicit-sync-5.19

This is a follow-up on RFC series 
https://patchwork.freedesktop.org/series/104578/ .

The main changes were:

1) Instead of replacing num_shared with usage, I'm just adding usage, since
   num_shared was actually needed.
2) We now agree that DMA_RESV_USAGE_BOOKKEEP is reasonable for this purpose.

Please let me know if I missed anything, especially with the change to VM 
updates,
as we went back and forth a ton of times on that.


Bas Nieuwenhuizen (6):
  drm/ttm: Add usage to ttm_validate_buffer.
  drm/amdgpu: Add separate mode for syncing DMA_RESV_USAGE_BOOKKEEP.
  drm/amdgpu: Allow explicit sync for VM ops.
  drm/amdgpu: Refactor amdgpu_vm_get_pd_bo.
  drm/amdgpu: Add option to disable implicit sync for a context.
  drm/amdgpu: Bump amdgpu driver version.

 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 16 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c| 20 +---
 drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c   |  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   | 32 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h   |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   | 12 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c   |  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c| 11 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h|  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c  | 11 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h  |  4 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  5 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c|  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c   |  3 +-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c  |  1 +
 drivers/gpu/drm/qxl/qxl_release.c |  1 +
 drivers/gpu/drm/radeon/radeon_cs.c|  2 ++
 drivers/gpu/drm/radeon/radeon_gem.c   |  1 +
 drivers/gpu/drm/radeon/radeon_vm.c|  2 ++
 drivers/gpu/drm/ttm/ttm_execbuf_util.c|  3 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c  |  7 +++-
 drivers/gpu/drm/vmwgfx/vmwgfx_validation.c|  1 +
 include/drm/ttm/ttm_execbuf_util.h|  2 ++
 include/uapi/drm/amdgpu_drm.h |  3 ++
 28 files changed, 122 insertions(+), 37 deletions(-)

-- 
2.37.1



[PATCH 1/6] drm/ttm: Add usage to ttm_validate_buffer.

2022-08-12 Thread Bas Nieuwenhuizen
This way callsites can choose between READ/BOOKKEEP reservations.

Signed-off-by: Bas Nieuwenhuizen 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 5 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   | 9 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c  | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c  | 8 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c  | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   | 1 +
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 1 +
 drivers/gpu/drm/qxl/qxl_release.c| 1 +
 drivers/gpu/drm/radeon/radeon_cs.c   | 2 ++
 drivers/gpu/drm/radeon/radeon_gem.c  | 1 +
 drivers/gpu/drm/radeon/radeon_vm.c   | 2 ++
 drivers/gpu/drm/ttm/ttm_execbuf_util.c   | 3 +--
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c | 7 ++-
 drivers/gpu/drm/vmwgfx/vmwgfx_validation.c   | 1 +
 include/drm/ttm/ttm_execbuf_util.h   | 2 ++
 15 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 4608599ba6bb..a6eb7697c936 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -775,6 +775,7 @@ static void add_kgd_mem_to_kfd_bo_list(struct kgd_mem *mem,
 
INIT_LIST_HEAD(>head);
entry->num_shared = 1;
+   entry->usage = DMA_RESV_USAGE_READ;
entry->bo = >tbo;
mutex_lock(_info->lock);
if (userptr)
@@ -919,6 +920,7 @@ static int reserve_bo_and_vm(struct kgd_mem *mem,
ctx->kfd_bo.priority = 0;
ctx->kfd_bo.tv.bo = >tbo;
ctx->kfd_bo.tv.num_shared = 1;
+   ctx->kfd_bo.tv.usage = DMA_RESV_USAGE_READ;
list_add(>kfd_bo.tv.head, >list);
 
amdgpu_vm_get_pd_bo(vm, >list, >vm_pd[0]);
@@ -982,6 +984,7 @@ static int reserve_bo_and_cond_vms(struct kgd_mem *mem,
ctx->kfd_bo.priority = 0;
ctx->kfd_bo.tv.bo = >tbo;
ctx->kfd_bo.tv.num_shared = 1;
+   ctx->kfd_bo.tv.usage = DMA_RESV_USAGE_READ;
list_add(>kfd_bo.tv.head, >list);
 
i = 0;
@@ -2207,6 +2210,7 @@ static int validate_invalid_user_pages(struct 
amdkfd_process_info *process_info)
list_add_tail(>resv_list.head, _list);
mem->resv_list.bo = mem->validate_list.bo;
mem->resv_list.num_shared = mem->validate_list.num_shared;
+   mem->resv_list.usage = mem->validate_list.usage;
}
 
/* Reserve all BOs and page tables for validation */
@@ -2406,6 +2410,7 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, 
struct dma_fence **ef)
list_add_tail(>resv_list.head, );
mem->resv_list.bo = mem->validate_list.bo;
mem->resv_list.num_shared = mem->validate_list.num_shared;
+   mem->resv_list.usage = mem->validate_list.usage;
}
 
ret = ttm_eu_reserve_buffers(, ,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index d8f1335bc68f..f1ceb25d1b84 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -57,6 +57,7 @@ static int amdgpu_cs_user_fence_chunk(struct amdgpu_cs_parser 
*p,
p->uf_entry.tv.bo = >tbo;
/* One for TTM and two for the CS job */
p->uf_entry.tv.num_shared = 3;
+   p->uf_entry.tv.usage = DMA_RESV_USAGE_READ;
 
drm_gem_object_put(gobj);
 
@@ -522,8 +523,10 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
mutex_lock(>bo_list->bo_list_mutex);
 
/* One for TTM and one for the CS job */
-   amdgpu_bo_list_for_each_entry(e, p->bo_list)
+   amdgpu_bo_list_for_each_entry(e, p->bo_list) {
e->tv.num_shared = 2;
+   e->tv.usage = DMA_RESV_USAGE_READ;
+   }
 
amdgpu_bo_list_get_list(p->bo_list, >validated);
 
@@ -1282,8 +1285,10 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
amdgpu_vm_move_to_lru_tail(p->adev, >vm);
 
/* Make sure all BOs are remembered as writers */
-   amdgpu_bo_list_for_each_entry(e, p->bo_list)
+   amdgpu_bo_list_for_each_entry(e, p->bo_list) {
e->tv.num_shared = 0;
+   e->tv.usage = DMA_RESV_USAGE_WRITE;
+   }
 
ttm_eu_fence_buffer_objects(>ticket, >validated, p->fence);
mutex_unlock(>adev->notifier_lock);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
index c6d4d41c4393..24941ed1a5ec 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -75,6 +75,7 @@ int amdgpu_map_static_csa(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
INIT_LIST_HEAD(_tv.head);
csa_tv.bo = >tbo;
csa_tv.num_shared = 1;
+   csa_tv.usage = DMA_RESV_USAGE_READ;
 
list_add(_tv.head, );

RE: [PATCH] drm/amdgpu: fix reset domain xgmi hive info reference leak

2022-08-12 Thread Kim, Jonathan
[Public]

> -Original Message-
> From: Kuehling, Felix 
> Sent: August 12, 2022 6:12 PM
> To: Grodzovsky, Andrey ; Kim, Jonathan
> ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdgpu: fix reset domain xgmi hive info reference
> leak
>
>
> On 2022-08-12 18:05, Andrey Grodzovsky wrote:
> >
> > On 2022-08-12 14:38, Kim, Jonathan wrote:
> >> [Public]
> >>
> >> Hi Andrey,
> >>
> >> Here's the load/unload stack trace.  This is a 2 GPU xGMI system.  I
> >> put dbg_xgmi_hive_get/put refcount print post kobj get/put.
> >> It's stuck at 2 on unload.  If it's an 8 GPU system, it's stuck at 8.
> >>
> >> e.g. of sysfs leak after driver unload:
> >>
> atitest@atitest:/sys/devices/pci:80/:80:02.0/:81:00.0/:82:00
> .0/:83:00.0$
> >> ls xgmi_hive_info/
> >> xgmi_hive_id
> >>
> >> Thanks,
> >>
> >> Jon
> >
> >
> > I see the leak, but how is it related to amdgpu_reset_domain ? How you
> > think that he causing this ?
> Does YiPeng's patch "[PATCH 2/2] drm/amdgpu: fix hive reference leak
> when adding xgmi device" address the same issue?

Yes, this is the extra reference I was talking about in the snippet I posted.

Thanks,

Jon

>
> Regards,
>Felix
>
>
> >
> > Andrey
> >
> >
> >>
> >>
> >> Driver load (get ref happens on both device add to hive and init per
> >> device):
> >> [   61.975900] amdkcl: loading out-of-tree module taints kernel.
> >> [   61.975973] amdkcl: module verification failed: signature and/or
> >> required key missing - tainting kernel
> >> [   62.065546] amdkcl: Warning: fail to get symbol cancel_work,
> >> replace it with kcl stub
> >> [   62.081920] AMD-Vi: AMD IOMMUv2 functionality not available on
> >> this system - This is not a bug.
> >> [   62.491119] [drm] amdgpu kernel modesetting enabled.
> >> [   62.491122] [drm] amdgpu version: 5.18.2
> >> [   62.491124] [drm] OS DRM version: 5.15.0
> >> [   62.491337] amdgpu: CRAT table not found
> >> [   62.491341] amdgpu: Virtual CRAT table created for CPU
> >> [   62.491360] amdgpu: Topology: Add CPU node
> >> [   62.603556] amdgpu: PeerDirect support was initialized successfully
> >> [   62.603847] amdgpu :83:00.0: enabling device (0100 -> 0102)
> >> [   62.603987] [drm] initializing kernel modesetting (VEGA20
> >> 0x1002:0x66A1 0x1002:0x0834 0x00).
> >> [   62.604023] [drm] register mmio base: 0xFBD0
> >> [   62.604026] [drm] register mmio size: 524288
> >> [   62.604171] [drm] add ip block number 0 
> >> [   62.604175] [drm] add ip block number 1 
> >> [   62.604177] [drm] add ip block number 2 
> >> [   62.604180] [drm] add ip block number 3 
> >> [   62.604182] [drm] add ip block number 4 
> >> [   62.604185] [drm] add ip block number 5 
> >> [   62.604187] [drm] add ip block number 6 
> >> [   62.604190] [drm] add ip block number 7 
> >> [   62.604192] [drm] add ip block number 8 
> >> [   62.604194] [drm] add ip block number 9 
> >> [   62.641771] amdgpu :83:00.0: amdgpu: Fetched VBIOS from ROM BAR
> >> [   62.641777] amdgpu: ATOM BIOS: 113-D1630200-112
> >> [   62.713418] [drm] UVD(0) is enabled in VM mode
> >> [   62.713423] [drm] UVD(1) is enabled in VM mode
> >> [   62.713426] [drm] UVD(0) ENC is enabled in VM mode
> >> [   62.713428] [drm] UVD(1) ENC is enabled in VM mode
> >> [   62.713430] [drm] VCE enabled in VM mode
> >> [   62.713433] amdgpu :83:00.0: amdgpu: Trusted Memory Zone (TMZ)
> >> feature not supported
> >> [   62.713472] [drm] GPU posting now...
> >> [   62.713993] amdgpu :83:00.0: amdgpu: MEM ECC is active.
> >> [   62.713995] amdgpu :83:00.0: amdgpu: SRAM ECC is active.
> >> [   62.714006] amdgpu :83:00.0: amdgpu: RAS INFO: ras initialized
> >> successfully, hardware ability[7fff] ras_mask[7fff]
> >> [   62.714018] [drm] vm size is 262144 GB, 4 levels, block size is
> >> 9-bit, fragment size is 9-bit
> >> [   62.714026] amdgpu :83:00.0: amdgpu: VRAM: 32752M
> >> 0x0080 - 0x0087FEFF (32752M used)
> >> [   62.714029] amdgpu :83:00.0: amdgpu: GART: 512M
> >> 0x - 0x1FFF
> >> [   62.714032] amdgpu :83:00.0: amdgpu: AGP: 267845632M
> >> 0x0090 - 0x
> >> [   62.714043] [drm] Detected VRAM RAM=32752M, BAR=32768M
> >> [   62.714044] [drm] RAM width 4096bits HBM
> >> [   62.714050] debugfs: Directory 'ttm' with parent '/' already present!
> >> [   62.714146] [drm] amdgpu: 32752M of VRAM memory ready
> >> [   62.714149] [drm] amdgpu: 40203M of GTT memory ready.
> >> [   62.714170] [drm] GART: num cpu pages 131072, num gpu pages 131072
> >> [   62.714266] [drm] PCIE GART of 512M enabled.
> >> [   62.714267] [drm] PTB located at 0x0080
> >> [   62.731067] amdgpu :83:00.0: amdgpu: PSP runtime database
> >> doesn't exist
> >> [   62.731075] amdgpu :83:00.0: amdgpu: PSP runtime database
> >> doesn't exist
> >> [   62.731449] amdgpu: [powerplay] hwmgr_sw_init smu backed is
> >> vega20_smu
> >> [   62.743177] [drm] Found UVD firmware ENC: 1.2 DEC: .43 Family ID: 19

Re: [PATCH] drm/amdgpu: fix reset domain xgmi hive info reference leak

2022-08-12 Thread Felix Kuehling



On 2022-08-12 18:05, Andrey Grodzovsky wrote:


On 2022-08-12 14:38, Kim, Jonathan wrote:

[Public]

Hi Andrey,

Here's the load/unload stack trace.  This is a 2 GPU xGMI system.  I 
put dbg_xgmi_hive_get/put refcount print post kobj get/put.

It's stuck at 2 on unload.  If it's an 8 GPU system, it's stuck at 8.

e.g. of sysfs leak after driver unload:
atitest@atitest:/sys/devices/pci:80/:80:02.0/:81:00.0/:82:00.0/:83:00.0$ 
ls xgmi_hive_info/

xgmi_hive_id

Thanks,

Jon



I see the leak, but how is it related to amdgpu_reset_domain ? How you 
think that he causing this ?
Does YiPeng's patch "[PATCH 2/2] drm/amdgpu: fix hive reference leak 
when adding xgmi device" address the same issue?


Regards,
  Felix




Andrey





Driver load (get ref happens on both device add to hive and init per 
device):

[   61.975900] amdkcl: loading out-of-tree module taints kernel.
[   61.975973] amdkcl: module verification failed: signature and/or 
required key missing - tainting kernel
[   62.065546] amdkcl: Warning: fail to get symbol cancel_work, 
replace it with kcl stub
[   62.081920] AMD-Vi: AMD IOMMUv2 functionality not available on 
this system - This is not a bug.

[   62.491119] [drm] amdgpu kernel modesetting enabled.
[   62.491122] [drm] amdgpu version: 5.18.2
[   62.491124] [drm] OS DRM version: 5.15.0
[   62.491337] amdgpu: CRAT table not found
[   62.491341] amdgpu: Virtual CRAT table created for CPU
[   62.491360] amdgpu: Topology: Add CPU node
[   62.603556] amdgpu: PeerDirect support was initialized successfully
[   62.603847] amdgpu :83:00.0: enabling device (0100 -> 0102)
[   62.603987] [drm] initializing kernel modesetting (VEGA20 
0x1002:0x66A1 0x1002:0x0834 0x00).

[   62.604023] [drm] register mmio base: 0xFBD0
[   62.604026] [drm] register mmio size: 524288
[   62.604171] [drm] add ip block number 0 
[   62.604175] [drm] add ip block number 1 
[   62.604177] [drm] add ip block number 2 
[   62.604180] [drm] add ip block number 3 
[   62.604182] [drm] add ip block number 4 
[   62.604185] [drm] add ip block number 5 
[   62.604187] [drm] add ip block number 6 
[   62.604190] [drm] add ip block number 7 
[   62.604192] [drm] add ip block number 8 
[   62.604194] [drm] add ip block number 9 
[   62.641771] amdgpu :83:00.0: amdgpu: Fetched VBIOS from ROM BAR
[   62.641777] amdgpu: ATOM BIOS: 113-D1630200-112
[   62.713418] [drm] UVD(0) is enabled in VM mode
[   62.713423] [drm] UVD(1) is enabled in VM mode
[   62.713426] [drm] UVD(0) ENC is enabled in VM mode
[   62.713428] [drm] UVD(1) ENC is enabled in VM mode
[   62.713430] [drm] VCE enabled in VM mode
[   62.713433] amdgpu :83:00.0: amdgpu: Trusted Memory Zone (TMZ) 
feature not supported

[   62.713472] [drm] GPU posting now...
[   62.713993] amdgpu :83:00.0: amdgpu: MEM ECC is active.
[   62.713995] amdgpu :83:00.0: amdgpu: SRAM ECC is active.
[   62.714006] amdgpu :83:00.0: amdgpu: RAS INFO: ras initialized 
successfully, hardware ability[7fff] ras_mask[7fff]
[   62.714018] [drm] vm size is 262144 GB, 4 levels, block size is 
9-bit, fragment size is 9-bit
[   62.714026] amdgpu :83:00.0: amdgpu: VRAM: 32752M 
0x0080 - 0x0087FEFF (32752M used)
[   62.714029] amdgpu :83:00.0: amdgpu: GART: 512M 
0x - 0x1FFF
[   62.714032] amdgpu :83:00.0: amdgpu: AGP: 267845632M 
0x0090 - 0x

[   62.714043] [drm] Detected VRAM RAM=32752M, BAR=32768M
[   62.714044] [drm] RAM width 4096bits HBM
[   62.714050] debugfs: Directory 'ttm' with parent '/' already present!
[   62.714146] [drm] amdgpu: 32752M of VRAM memory ready
[   62.714149] [drm] amdgpu: 40203M of GTT memory ready.
[   62.714170] [drm] GART: num cpu pages 131072, num gpu pages 131072
[   62.714266] [drm] PCIE GART of 512M enabled.
[   62.714267] [drm] PTB located at 0x0080
[   62.731067] amdgpu :83:00.0: amdgpu: PSP runtime database 
doesn't exist
[   62.731075] amdgpu :83:00.0: amdgpu: PSP runtime database 
doesn't exist
[   62.731449] amdgpu: [powerplay] hwmgr_sw_init smu backed is 
vega20_smu

[   62.743177] [drm] Found UVD firmware ENC: 1.2 DEC: .43 Family ID: 19
[   62.743244] [drm] PSP loading UVD firmware
[   62.744525] [drm] Found VCE firmware Version: 57.6 Binary ID: 4
[   62.744689] [drm] PSP loading VCE firmware
[   62.896804] [drm] reserve 0x40 from 0x87fec0 for PSP TMR
[   62.979421] amdgpu :83:00.0: amdgpu: HDCP: optional hdcp ta 
ucode is not available
[   62.979427] amdgpu :83:00.0: amdgpu: DTM: optional dtm ta 
ucode is not available
[   62.979430] amdgpu :83:00.0: amdgpu: RAP: optional rap ta 
ucode is not available
[   62.979432] amdgpu :83:00.0: amdgpu: SECUREDISPLAY: 
securedisplay ta ucode is not available

[   62.982386] [drm] Display Core initialized with v3.2.196!
[   62.984514] [drm] kiq ring mec 2 pipe 1 q 0
[   63.026846] [drm] UVD and UVD ENC initialized successfully.
[   63.225760] [drm] VCE 

Re: [PATCH] drm/amdgpu: fix reset domain xgmi hive info reference leak

2022-08-12 Thread Andrey Grodzovsky



On 2022-08-12 14:38, Kim, Jonathan wrote:

[Public]

Hi Andrey,

Here's the load/unload stack trace.  This is a 2 GPU xGMI system.  I put 
dbg_xgmi_hive_get/put refcount print post kobj get/put.
It's stuck at 2 on unload.  If it's an 8 GPU system, it's stuck at 8.

e.g. of sysfs leak after driver unload:
atitest@atitest:/sys/devices/pci:80/:80:02.0/:81:00.0/:82:00.0/:83:00.0$
 ls xgmi_hive_info/
xgmi_hive_id

Thanks,

Jon



I see the leak, but how is it related to amdgpu_reset_domain ? How you 
think that he causing this ?


Andrey





Driver load (get ref happens on both device add to hive and init per device):
[   61.975900] amdkcl: loading out-of-tree module taints kernel.
[   61.975973] amdkcl: module verification failed: signature and/or required 
key missing - tainting kernel
[   62.065546] amdkcl: Warning: fail to get symbol cancel_work, replace it with 
kcl stub
[   62.081920] AMD-Vi: AMD IOMMUv2 functionality not available on this system - 
This is not a bug.
[   62.491119] [drm] amdgpu kernel modesetting enabled.
[   62.491122] [drm] amdgpu version: 5.18.2
[   62.491124] [drm] OS DRM version: 5.15.0
[   62.491337] amdgpu: CRAT table not found
[   62.491341] amdgpu: Virtual CRAT table created for CPU
[   62.491360] amdgpu: Topology: Add CPU node
[   62.603556] amdgpu: PeerDirect support was initialized successfully
[   62.603847] amdgpu :83:00.0: enabling device (0100 -> 0102)
[   62.603987] [drm] initializing kernel modesetting (VEGA20 0x1002:0x66A1 
0x1002:0x0834 0x00).
[   62.604023] [drm] register mmio base: 0xFBD0
[   62.604026] [drm] register mmio size: 524288
[   62.604171] [drm] add ip block number 0 
[   62.604175] [drm] add ip block number 1 
[   62.604177] [drm] add ip block number 2 
[   62.604180] [drm] add ip block number 3 
[   62.604182] [drm] add ip block number 4 
[   62.604185] [drm] add ip block number 5 
[   62.604187] [drm] add ip block number 6 
[   62.604190] [drm] add ip block number 7 
[   62.604192] [drm] add ip block number 8 
[   62.604194] [drm] add ip block number 9 
[   62.641771] amdgpu :83:00.0: amdgpu: Fetched VBIOS from ROM BAR
[   62.641777] amdgpu: ATOM BIOS: 113-D1630200-112
[   62.713418] [drm] UVD(0) is enabled in VM mode
[   62.713423] [drm] UVD(1) is enabled in VM mode
[   62.713426] [drm] UVD(0) ENC is enabled in VM mode
[   62.713428] [drm] UVD(1) ENC is enabled in VM mode
[   62.713430] [drm] VCE enabled in VM mode
[   62.713433] amdgpu :83:00.0: amdgpu: Trusted Memory Zone (TMZ) feature 
not supported
[   62.713472] [drm] GPU posting now...
[   62.713993] amdgpu :83:00.0: amdgpu: MEM ECC is active.
[   62.713995] amdgpu :83:00.0: amdgpu: SRAM ECC is active.
[   62.714006] amdgpu :83:00.0: amdgpu: RAS INFO: ras initialized 
successfully, hardware ability[7fff] ras_mask[7fff]
[   62.714018] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, 
fragment size is 9-bit
[   62.714026] amdgpu :83:00.0: amdgpu: VRAM: 32752M 0x0080 - 
0x0087FEFF (32752M used)
[   62.714029] amdgpu :83:00.0: amdgpu: GART: 512M 0x - 
0x1FFF
[   62.714032] amdgpu :83:00.0: amdgpu: AGP: 267845632M 0x0090 
- 0x
[   62.714043] [drm] Detected VRAM RAM=32752M, BAR=32768M
[   62.714044] [drm] RAM width 4096bits HBM
[   62.714050] debugfs: Directory 'ttm' with parent '/' already present!
[   62.714146] [drm] amdgpu: 32752M of VRAM memory ready
[   62.714149] [drm] amdgpu: 40203M of GTT memory ready.
[   62.714170] [drm] GART: num cpu pages 131072, num gpu pages 131072
[   62.714266] [drm] PCIE GART of 512M enabled.
[   62.714267] [drm] PTB located at 0x0080
[   62.731067] amdgpu :83:00.0: amdgpu: PSP runtime database doesn't exist
[   62.731075] amdgpu :83:00.0: amdgpu: PSP runtime database doesn't exist
[   62.731449] amdgpu: [powerplay] hwmgr_sw_init smu backed is vega20_smu
[   62.743177] [drm] Found UVD firmware ENC: 1.2 DEC: .43 Family ID: 19
[   62.743244] [drm] PSP loading UVD firmware
[   62.744525] [drm] Found VCE firmware Version: 57.6 Binary ID: 4
[   62.744689] [drm] PSP loading VCE firmware
[   62.896804] [drm] reserve 0x40 from 0x87fec0 for PSP TMR
[   62.979421] amdgpu :83:00.0: amdgpu: HDCP: optional hdcp ta ucode is not 
available
[   62.979427] amdgpu :83:00.0: amdgpu: DTM: optional dtm ta ucode is not 
available
[   62.979430] amdgpu :83:00.0: amdgpu: RAP: optional rap ta ucode is not 
available
[   62.979432] amdgpu :83:00.0: amdgpu: SECUREDISPLAY: securedisplay ta 
ucode is not available
[   62.982386] [drm] Display Core initialized with v3.2.196!
[   62.984514] [drm] kiq ring mec 2 pipe 1 q 0
[   63.026846] [drm] UVD and UVD ENC initialized successfully.
[   63.225760] [drm] VCE initialized successfully.
[   63.22] amdgpu: [dbg_xgmi_hive_get] ref_count 2
[   63.28] CPU: 10 PID: 397 Comm: kworker/10:2 Tainted: G   OE 
5.15.0-46-generic #49~20.04.1-Ubuntu
[   

Re: Selecting CPUs for queuing work on

2022-08-12 Thread Tejun Heo
Hello,

On Fri, Aug 12, 2022 at 04:54:04PM -0400, Felix Kuehling wrote:
> In principle, I think IRQ routing to CPUs can change dynamically with
> irqbalance.

I wonder whether this is something which should be exposed to userland
rather than trying to do dynamically in the kernel and let irqbalance or
whatever deal with it. People use irq affinity to steer these handlings to
specfic CPUs and the usual expectation is that the bottom half handling is
gonna take place on the same cpu usually through softirq. It's kinda awkard
to have this secondary assignment happening implicitly.

> What we need is kind of the opposite of WQ_UNBOUND. As I understand it,
> WQ_UNBOUND can schedule anywhere to maximize concurrency. What we need is to
> schedule to very specific, predictable CPUs. We only have one work item per
> GPU that processes all the interrupts in order, so we don't need the
> concurrency of WQ_UNBOUND.

Each WQ_UNBOUND workqueue has a cpumask associated with it and the cpumask
can be changed dynamically, so it can be used for sth like this, but I'm not
yet convinced that's the right thing to do.

Thanks.

-- 
tejun


RE: [PATCH] drm/amdgpu: Fix interrupt handling on ih_soft ring

2022-08-12 Thread Joshi, Mukul
[AMD Official Use Only - General]



> -Original Message-
> From: Kuehling, Felix 
> Sent: Friday, August 12, 2022 5:27 PM
> To: Joshi, Mukul ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdgpu: Fix interrupt handling on ih_soft ring
> 
> 
> On 2022-08-12 16:56, Mukul Joshi wrote:
> > There are no backing hardware registers for ih_soft ring.
> > As a result, don't try to access hardware registers for read and write
> > pointers when processing interrupts on the IH soft ring.
> >
> > Signed-off-by: Mukul Joshi 
> 
> The patch looks good to me. But you probably should apply the same
> changes to vega10_ih.c and navi10_ih.c as well.
>

Oops sorry I missed that. Will update the patch and re-send.

Regards,
Mukul
 
> Regards,
>    Felix
> 
> 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/vega20_ih.c | 7 ++-
> >   1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
> > b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
> > index 3b4eb8285943..2022ffbb8dba 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
> > @@ -385,9 +385,11 @@ static u32 vega20_ih_get_wptr(struct
> amdgpu_device *adev,
> > u32 wptr, tmp;
> > struct amdgpu_ih_regs *ih_regs;
> >
> > -   if (ih == >irq.ih) {
> > +   if (ih == >irq.ih || ih == >irq.ih_soft) {
> > /* Only ring0 supports writeback. On other rings fall back
> >  * to register-based code with overflow checking below.
> > +* ih_soft ring doesn't have any backing hardware registers,
> > +* update wptr and return.
> >  */
> > wptr = le32_to_cpu(*ih->wptr_cpu);
> >
> > @@ -461,6 +463,9 @@ static void vega20_ih_set_rptr(struct
> amdgpu_device *adev,
> >   {
> > struct amdgpu_ih_regs *ih_regs;
> >
> > +   if (ih == >irq.ih_soft)
> > +   return;
> > +
> > if (ih->use_doorbell) {
> > /* XXX check if swapping is necessary on BE */
> > *ih->rptr_cpu = ih->rptr;


Re: [PATCH] drm/amdgpu: Fix interrupt handling on ih_soft ring

2022-08-12 Thread Felix Kuehling



On 2022-08-12 16:56, Mukul Joshi wrote:

There are no backing hardware registers for ih_soft ring.
As a result, don't try to access hardware registers for read
and write pointers when processing interrupts on the IH soft
ring.

Signed-off-by: Mukul Joshi 


The patch looks good to me. But you probably should apply the same 
changes to vega10_ih.c and navi10_ih.c as well.


Regards,
  Felix



---
  drivers/gpu/drm/amd/amdgpu/vega20_ih.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c 
b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
index 3b4eb8285943..2022ffbb8dba 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
@@ -385,9 +385,11 @@ static u32 vega20_ih_get_wptr(struct amdgpu_device *adev,
u32 wptr, tmp;
struct amdgpu_ih_regs *ih_regs;
  
-	if (ih == >irq.ih) {

+   if (ih == >irq.ih || ih == >irq.ih_soft) {
/* Only ring0 supports writeback. On other rings fall back
 * to register-based code with overflow checking below.
+* ih_soft ring doesn't have any backing hardware registers,
+* update wptr and return.
 */
wptr = le32_to_cpu(*ih->wptr_cpu);
  
@@ -461,6 +463,9 @@ static void vega20_ih_set_rptr(struct amdgpu_device *adev,

  {
struct amdgpu_ih_regs *ih_regs;
  
+	if (ih == >irq.ih_soft)

+   return;
+
if (ih->use_doorbell) {
/* XXX check if swapping is necessary on BE */
*ih->rptr_cpu = ih->rptr;


Re: [PATCH] drm/amdkfd: potential crash in kfd_create_indirect_link_prop()

2022-08-12 Thread Felix Kuehling

On 2022-08-12 02:20, Dan Carpenter wrote:

This code has two bugs.  If kfd_topology_device_by_proximity_domain()
failed on the first iteration through the loop then "cpu_link" is
uninitialized and should not be dereferenced.

The second bug is that we cannot dereference a list iterator when it
points to the list head.  In other words, if we exit the
list_for_each_entry() loop exits without hitting a break then "cpu_link"
is not a valid pointer and should not be dereferenced.

Fix both of these problems by setting "cpu_link" to NULL when it is invalid
and non-NULL when it is valid.  That makes it easier to test for
valid vs invalid.

Fixes: 0f28cca87e9a ("drm/amdkfd: Extend KFD device topology to surface peer-to-peer 
links")
Signed-off-by: Dan Carpenter 
---
I reported these in June but never heard back.


I thought Ramesh implemented a fix for this: 
https://lore.kernel.org/all/20220706183302.1719795-1-ramesh.errab...@amd.com/


You commented on a version of his patch: 
https://lore.kernel.org/all/20220629161241.GM11460@kadam/


Did this get lost somehow? Anyway, your patch looks good to me and I'm 
going to apply it to amd-staging-drm-next now.


Reviewed-by: Felix Kuehling 

Thanks,
  Felix




  drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 11 +++
  1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 25990bec600d..3f0a4a415907 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1392,8 +1392,8 @@ static int kfd_build_p2p_node_entry(struct 
kfd_topology_device *dev,
  
  static int kfd_create_indirect_link_prop(struct kfd_topology_device *kdev, int gpu_node)

  {
+   struct kfd_iolink_properties *gpu_link, *tmp_link, *cpu_link;
struct kfd_iolink_properties *props = NULL, *props2 = NULL;
-   struct kfd_iolink_properties *gpu_link, *cpu_link;
struct kfd_topology_device *cpu_dev;
int ret = 0;
int i, num_cpu;
@@ -1416,16 +1416,19 @@ static int kfd_create_indirect_link_prop(struct 
kfd_topology_device *kdev, int g
continue;
  
  		/* find CPU <-->  CPU links */

+   cpu_link = NULL;
cpu_dev = kfd_topology_device_by_proximity_domain(i);
if (cpu_dev) {
-   list_for_each_entry(cpu_link,
+   list_for_each_entry(tmp_link,
_dev->io_link_props, list) {
-   if (cpu_link->node_to == gpu_link->node_to)
+   if (tmp_link->node_to == gpu_link->node_to) {
+   cpu_link = tmp_link;
break;
+   }
}
}
  
-		if (cpu_link->node_to != gpu_link->node_to)

+   if (!cpu_link)
return -ENOMEM;
  
  		/* CPU <--> CPU <--> GPU, GPU node*/


[PATCH] drm/amdgpu: Fix interrupt handling on ih_soft ring

2022-08-12 Thread Mukul Joshi
There are no backing hardware registers for ih_soft ring.
As a result, don't try to access hardware registers for read
and write pointers when processing interrupts on the IH soft
ring.

Signed-off-by: Mukul Joshi 
---
 drivers/gpu/drm/amd/amdgpu/vega20_ih.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c 
b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
index 3b4eb8285943..2022ffbb8dba 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
@@ -385,9 +385,11 @@ static u32 vega20_ih_get_wptr(struct amdgpu_device *adev,
u32 wptr, tmp;
struct amdgpu_ih_regs *ih_regs;
 
-   if (ih == >irq.ih) {
+   if (ih == >irq.ih || ih == >irq.ih_soft) {
/* Only ring0 supports writeback. On other rings fall back
 * to register-based code with overflow checking below.
+* ih_soft ring doesn't have any backing hardware registers,
+* update wptr and return.
 */
wptr = le32_to_cpu(*ih->wptr_cpu);
 
@@ -461,6 +463,9 @@ static void vega20_ih_set_rptr(struct amdgpu_device *adev,
 {
struct amdgpu_ih_regs *ih_regs;
 
+   if (ih == >irq.ih_soft)
+   return;
+
if (ih->use_doorbell) {
/* XXX check if swapping is necessary on BE */
*ih->rptr_cpu = ih->rptr;
-- 
2.35.1



Re: Selecting CPUs for queuing work on

2022-08-12 Thread Felix Kuehling

On 2022-08-12 16:30, Tejun Heo wrote:

On Fri, Aug 12, 2022 at 04:26:47PM -0400, Felix Kuehling wrote:

Hi workqueue maintainers,

In the KFD (amdgpu) driver we found a need to schedule bottom half interrupt
handlers on CPU cores different from the one where the top-half interrupt
handler runs to avoid the interrupt handler stalling the bottom half in
extreme scenarios. See my latest patch that tries to use a different
hyperthread on the same CPU core, or falls back to a different core in the
same NUMA node if that fails:
https://lore.kernel.org/all/20220811190433.1213179-1-felix.kuehl...@amd.com/

Dave pointed out that the driver may not be the best place to implement such
logic and suggested that we should have an abstraction, maybe in the
workqueue code. Do you feel this is something that could or should be
provided by the core workqueue code? Or maybe some other place?

I'm not necessarily against it. I guess it can be a flag on an unbound wq.
Do the interrupts move across different CPUs tho? ie. why does this need to
be a dynamic decision?
In principle, I think IRQ routing to CPUs can change dynamically with 
irqbalance.


If this were a flag, would there be a way to ensure all work queued to 
the same workqueue from the same CPU, or maybe all work associated with 
a work_struct always goes to the same CPU? One of the reasons for my 
latest patch was to get more predictable scheduling of the work to cores 
that are specifically reserved for interrupt handling by the system 
admin. This minimizes CPU scheduling noise that can compound to cause 
real performance issues in large scale distributed applications.


What we need is kind of the opposite of WQ_UNBOUND. As I understand it, 
WQ_UNBOUND can schedule anywhere to maximize concurrency. What we need 
is to schedule to very specific, predictable CPUs. We only have one work 
item per GPU that processes all the interrupts in order, so we don't 
need the concurrency of WQ_UNBOUND.


Regards,
  Felix




Thanks.



Re: Selecting CPUs for queuing work on

2022-08-12 Thread Tejun Heo
On Fri, Aug 12, 2022 at 04:26:47PM -0400, Felix Kuehling wrote:
> Hi workqueue maintainers,
> 
> In the KFD (amdgpu) driver we found a need to schedule bottom half interrupt
> handlers on CPU cores different from the one where the top-half interrupt
> handler runs to avoid the interrupt handler stalling the bottom half in
> extreme scenarios. See my latest patch that tries to use a different
> hyperthread on the same CPU core, or falls back to a different core in the
> same NUMA node if that fails:
> https://lore.kernel.org/all/20220811190433.1213179-1-felix.kuehl...@amd.com/
> 
> Dave pointed out that the driver may not be the best place to implement such
> logic and suggested that we should have an abstraction, maybe in the
> workqueue code. Do you feel this is something that could or should be
> provided by the core workqueue code? Or maybe some other place?

I'm not necessarily against it. I guess it can be a flag on an unbound wq.
Do the interrupts move across different CPUs tho? ie. why does this need to
be a dynamic decision?

Thanks.

-- 
tejun


Selecting CPUs for queuing work on

2022-08-12 Thread Felix Kuehling

Hi workqueue maintainers,

In the KFD (amdgpu) driver we found a need to schedule bottom half 
interrupt handlers on CPU cores different from the one where the 
top-half interrupt handler runs to avoid the interrupt handler stalling 
the bottom half in extreme scenarios. See my latest patch that tries to 
use a different hyperthread on the same CPU core, or falls back to a 
different core in the same NUMA node if that fails: 
https://lore.kernel.org/all/20220811190433.1213179-1-felix.kuehl...@amd.com/


Dave pointed out that the driver may not be the best place to implement 
such logic and suggested that we should have an abstraction, maybe in 
the workqueue code. Do you feel this is something that could or should 
be provided by the core workqueue code? Or maybe some other place?


Thank you,
  Felix


--
F e l i x   K u e h l i n g
PMTS Software Development Engineer | Linux Compute Kernel
1 Commerce Valley Dr. East, Markham, ON L3T 7X6 Canada
(O) +1(289)695-1597
_ _   _   _   _
   / \   | \ / | |  _  \  \ _  |
  / A \  | \M/ | | |D) )  /|_| |
 /_/ \_\ |_| |_| |_/ |__/ \|   facebook.com/AMD | amd.com



Re: [PATCH] drm/amdgpu: use native mode for dp aux transfer

2022-08-12 Thread kernel test robot
Hi Zhenneng,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on drm-misc/drm-misc-next]
[also build test WARNING on linus/master v5.19 next-20220812]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Zhenneng-Li/drm-amdgpu-use-native-mode-for-dp-aux-transfer/20220811-193443
base:   git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
config: s390-randconfig-r034-20220812 
(https://download.01.org/0day-ci/archive/20220813/202208130320.ndvnbevl-...@intel.com/config)
compiler: clang version 16.0.0 (https://github.com/llvm/llvm-project 
5f1c7e2cc5a3c07cbc2412e851a7283c1841f520)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install s390 cross compiling tool for clang build
# apt-get install binutils-s390x-linux-gnu
# 
https://github.com/intel-lab-lkp/linux/commit/1098c6fecb4292d634dbdccff9e720400dc7138d
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review 
Zhenneng-Li/drm-amdgpu-use-native-mode-for-dp-aux-transfer/20220811-193443
git checkout 1098c6fecb4292d634dbdccff9e720400dc7138d
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 
O=build_dir ARCH=s390 SHELL=/bin/bash drivers/gpu/drm/amd/amdgpu/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

   In file included from drivers/gpu/drm/amd/amdgpu/amdgpu_dp_auxch.c:25:
   In file included from drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgpu.h:52:
   In file included from include/linux/pci.h:39:
   In file included from include/linux/io.h:13:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:464:31: warning: performing pointer arithmetic on a 
null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   val = __raw_readb(PCI_IOBASE + addr);
 ~~ ^
   include/asm-generic/io.h:477:61: warning: performing pointer arithmetic on a 
null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
   ~~ ^
   include/uapi/linux/byteorder/big_endian.h:37:59: note: expanded from macro 
'__le16_to_cpu'
   #define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
 ^
   include/uapi/linux/swab.h:102:54: note: expanded from macro '__swab16'
   #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
^
   In file included from drivers/gpu/drm/amd/amdgpu/amdgpu_dp_auxch.c:25:
   In file included from drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgpu.h:52:
   In file included from include/linux/pci.h:39:
   In file included from include/linux/io.h:13:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a 
null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
   ~~ ^
   include/uapi/linux/byteorder/big_endian.h:35:59: note: expanded from macro 
'__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
 ^
   include/uapi/linux/swab.h:115:54: note: expanded from macro '__swab32'
   #define __swab32(x) (__u32)__builtin_bswap32((__u32)(x))
^
   In file included from drivers/gpu/drm/amd/amdgpu/amdgpu_dp_auxch.c:25:
   In file included from drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgpu.h:52:
   In file included from include/linux/pci.h:39:
   In file included from include/linux/io.h:13:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:501:33: warning: performing pointer arithmetic on a 
null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   __raw_writeb(value, PCI_IOBASE + addr);
   ~~ ^
   include/asm-generic/io.h:511:59: warning: performing pointer arithmetic on a 
null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
 ~~ ^
   include/asm-generic/io.h:521:59: warning: perf

Re: [PATCH v2] drm/amdkfd: Try to schedule bottom half on same core

2022-08-12 Thread Dave Airlie
On Sat, 13 Aug 2022 at 04:11, Felix Kuehling  wrote:
>
>
> On 2022-08-12 09:55, Philip Yang wrote:
> >
> > On 2022-08-11 15:04, Felix Kuehling wrote:
> >> On systems that support SMT (hyperthreading) schedule the bottom half of
> >> the KFD interrupt handler on the same core. This makes it possible to
> >> reserve a core for interrupt handling and have the bottom half run on
> >> that same core.
> >>
> >> On systems without SMT, pick another core in the same NUMA node, as
> >> before.
> >>
> >> Use for_each_cpu_wrap instead of open-coding it.
> >>
> >> Signed-off-by: Felix Kuehling 
> >
> > nit-pick below, looks better to use new_cpu as iterator, either way
> > this is
> >
> > Reviewed-by: Philip Yang 
>
> Thank you. I think I prefer cpu as the iterator and new_cpu as the
> variable that holds the CPU we choose to schedule to.

I don't think this sort of thing should be in a driver.

queue_work_node seems like it should be used or enhanced. Doing this
sort of thing in driver code should be the last place to do it.

At least please task someone to work on an upstream answer to this
sort of hacky downstream thing.

Dave.


RE: [PATCH] drm/amdgpu: fix reset domain xgmi hive info reference leak

2022-08-12 Thread Kim, Jonathan
[Public]

Hi Andrey,

Here's the load/unload stack trace.  This is a 2 GPU xGMI system.  I put 
dbg_xgmi_hive_get/put refcount print post kobj get/put.
It's stuck at 2 on unload.  If it's an 8 GPU system, it's stuck at 8.

e.g. of sysfs leak after driver unload:
atitest@atitest:/sys/devices/pci:80/:80:02.0/:81:00.0/:82:00.0/:83:00.0$
 ls xgmi_hive_info/
xgmi_hive_id

Thanks,

Jon


Driver load (get ref happens on both device add to hive and init per device):
[   61.975900] amdkcl: loading out-of-tree module taints kernel.
[   61.975973] amdkcl: module verification failed: signature and/or required 
key missing - tainting kernel
[   62.065546] amdkcl: Warning: fail to get symbol cancel_work, replace it with 
kcl stub
[   62.081920] AMD-Vi: AMD IOMMUv2 functionality not available on this system - 
This is not a bug.
[   62.491119] [drm] amdgpu kernel modesetting enabled.
[   62.491122] [drm] amdgpu version: 5.18.2
[   62.491124] [drm] OS DRM version: 5.15.0
[   62.491337] amdgpu: CRAT table not found
[   62.491341] amdgpu: Virtual CRAT table created for CPU
[   62.491360] amdgpu: Topology: Add CPU node
[   62.603556] amdgpu: PeerDirect support was initialized successfully
[   62.603847] amdgpu :83:00.0: enabling device (0100 -> 0102)
[   62.603987] [drm] initializing kernel modesetting (VEGA20 0x1002:0x66A1 
0x1002:0x0834 0x00).
[   62.604023] [drm] register mmio base: 0xFBD0
[   62.604026] [drm] register mmio size: 524288
[   62.604171] [drm] add ip block number 0 
[   62.604175] [drm] add ip block number 1 
[   62.604177] [drm] add ip block number 2 
[   62.604180] [drm] add ip block number 3 
[   62.604182] [drm] add ip block number 4 
[   62.604185] [drm] add ip block number 5 
[   62.604187] [drm] add ip block number 6 
[   62.604190] [drm] add ip block number 7 
[   62.604192] [drm] add ip block number 8 
[   62.604194] [drm] add ip block number 9 
[   62.641771] amdgpu :83:00.0: amdgpu: Fetched VBIOS from ROM BAR
[   62.641777] amdgpu: ATOM BIOS: 113-D1630200-112
[   62.713418] [drm] UVD(0) is enabled in VM mode
[   62.713423] [drm] UVD(1) is enabled in VM mode
[   62.713426] [drm] UVD(0) ENC is enabled in VM mode
[   62.713428] [drm] UVD(1) ENC is enabled in VM mode
[   62.713430] [drm] VCE enabled in VM mode
[   62.713433] amdgpu :83:00.0: amdgpu: Trusted Memory Zone (TMZ) feature 
not supported
[   62.713472] [drm] GPU posting now...
[   62.713993] amdgpu :83:00.0: amdgpu: MEM ECC is active.
[   62.713995] amdgpu :83:00.0: amdgpu: SRAM ECC is active.
[   62.714006] amdgpu :83:00.0: amdgpu: RAS INFO: ras initialized 
successfully, hardware ability[7fff] ras_mask[7fff]
[   62.714018] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, 
fragment size is 9-bit
[   62.714026] amdgpu :83:00.0: amdgpu: VRAM: 32752M 0x0080 - 
0x0087FEFF (32752M used)
[   62.714029] amdgpu :83:00.0: amdgpu: GART: 512M 0x - 
0x1FFF
[   62.714032] amdgpu :83:00.0: amdgpu: AGP: 267845632M 0x0090 
- 0x
[   62.714043] [drm] Detected VRAM RAM=32752M, BAR=32768M
[   62.714044] [drm] RAM width 4096bits HBM
[   62.714050] debugfs: Directory 'ttm' with parent '/' already present!
[   62.714146] [drm] amdgpu: 32752M of VRAM memory ready
[   62.714149] [drm] amdgpu: 40203M of GTT memory ready.
[   62.714170] [drm] GART: num cpu pages 131072, num gpu pages 131072
[   62.714266] [drm] PCIE GART of 512M enabled.
[   62.714267] [drm] PTB located at 0x0080
[   62.731067] amdgpu :83:00.0: amdgpu: PSP runtime database doesn't exist
[   62.731075] amdgpu :83:00.0: amdgpu: PSP runtime database doesn't exist
[   62.731449] amdgpu: [powerplay] hwmgr_sw_init smu backed is vega20_smu
[   62.743177] [drm] Found UVD firmware ENC: 1.2 DEC: .43 Family ID: 19
[   62.743244] [drm] PSP loading UVD firmware
[   62.744525] [drm] Found VCE firmware Version: 57.6 Binary ID: 4
[   62.744689] [drm] PSP loading VCE firmware
[   62.896804] [drm] reserve 0x40 from 0x87fec0 for PSP TMR
[   62.979421] amdgpu :83:00.0: amdgpu: HDCP: optional hdcp ta ucode is not 
available
[   62.979427] amdgpu :83:00.0: amdgpu: DTM: optional dtm ta ucode is not 
available
[   62.979430] amdgpu :83:00.0: amdgpu: RAP: optional rap ta ucode is not 
available
[   62.979432] amdgpu :83:00.0: amdgpu: SECUREDISPLAY: securedisplay ta 
ucode is not available
[   62.982386] [drm] Display Core initialized with v3.2.196!
[   62.984514] [drm] kiq ring mec 2 pipe 1 q 0
[   63.026846] [drm] UVD and UVD ENC initialized successfully.
[   63.225760] [drm] VCE initialized successfully.
[   63.22] amdgpu: [dbg_xgmi_hive_get] ref_count 2
[   63.28] CPU: 10 PID: 397 Comm: kworker/10:2 Tainted: G   OE 
5.15.0-46-generic #49~20.04.1-Ubuntu
[   63.244454] Hardware name: Supermicro X10DRi/X10DRi-T, BIOS 3.1 09/14/2018
[   63.244457] Workqueue: events work_for_cpu_fn
[   63.244471] Call Trace:
[   

Re: [PATCH v2] drm/amdkfd: Try to schedule bottom half on same core

2022-08-12 Thread Felix Kuehling



On 2022-08-12 09:55, Philip Yang wrote:


On 2022-08-11 15:04, Felix Kuehling wrote:

On systems that support SMT (hyperthreading) schedule the bottom half of
the KFD interrupt handler on the same core. This makes it possible to
reserve a core for interrupt handling and have the bottom half run on
that same core.

On systems without SMT, pick another core in the same NUMA node, as
before.

Use for_each_cpu_wrap instead of open-coding it.

Signed-off-by: Felix Kuehling 


nit-pick below, looks better to use new_cpu as iterator, either way 
this is


Reviewed-by: Philip Yang 


Thank you. I think I prefer cpu as the iterator and new_cpu as the 
variable that holds the CPU we choose to schedule to.


Regards,
  Felix





---
  drivers/gpu/drm/amd/amdkfd/kfd_device.c | 20 
  1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c

index f5853835f03a..4d1284714e7a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -24,6 +24,7 @@
  #include 
  #include 
  #include 
+#include 
  #include "kfd_priv.h"
  #include "kfd_device_queue_manager.h"
  #include "kfd_pm4_headers_vi.h"
@@ -801,13 +802,24 @@ static inline void kfd_queue_work(struct 
workqueue_struct *wq,

    struct work_struct *work)
  {
  int cpu, new_cpu;
+    const struct cpumask *mask = NULL;
    cpu = new_cpu = smp_processor_id();
-    do {
-    new_cpu = cpumask_next(new_cpu, cpu_online_mask) % nr_cpu_ids;
-    if (cpu_to_node(new_cpu) == numa_node_id())
+
+#if defined(CONFIG_SCHED_SMT)
+    /* CPU threads in the same core */
+    mask = cpu_smt_mask(cpu);
+#endif
+    if (!mask || cpumask_weight(mask) <= 1)
+    /* CPU threads in the same NUMA node */
+    mask = cpu_cpu_mask(cpu);
+    /* Pick the next online CPU thread in the same core or NUMA node */
+    for_each_cpu_wrap(cpu, mask, cpu+1) {
+    if (cpu != new_cpu && cpu_online(cpu)) {
+    new_cpu = cpu;
  break;
-    } while (cpu != new_cpu);
+    }
+    }
    queue_work_on(new_cpu, wq, work);
  }


for_each_cpu_wrap(new_cpu, mask, cpu + 1) {
    if (cpu != new_cpu && cpu_online(new_cpu)) {
    cpu = new_cpu;
 break;
    }
}
queue_work_on(cpu, wq, work);



Re: [PATCH] drm/amd/display: remove unreachable code

2022-08-12 Thread Tales Lelo da Aparecida

Hi,

On 12/08/2022 00:19, Jiapeng Chong wrote:

drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_util_32.c:1658
 dml32_TruncToValidBPP() warn: ignoring unreachable code.

Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=1894
Reported-by: Abaci Robot 
Signed-off-by: Jiapeng Chong 
---
  .../drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.c   | 4 
  1 file changed, 4 deletions(-)

diff --git 
a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.c
index 05fc14a47fba..0758e1da55a9 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.c
@@ -1654,10 +1654,6 @@ double dml32_TruncToValidBPP(
else
return DesiredBPP;
}
-
-   *RequiredSlots = dml_ceil(DesiredBPP / MaxLinkBPP * 64, 1);
-
-   return BPP_INVALID;
  } // TruncToValidBPP
  
  double dml32_RequiredDTBCLK(


Seems correct.

Reviewed-by: Tales Aparecida 

I feel like RequiredSlots is not actually used anywhere in the code, 
just passed around dml32_TruncToValidBPP() and 
dml32_CalculateOutputLink(). I've looked for any mentions of it in the 
mailing list, but could not find anything that implied it's part of 
ground working. I wonder if it's something outside the Linux tree for 
other platforms or related to HW gospel.


Re: [PATCH] drm/amd/display: Drop unused code

2022-08-12 Thread Alex Deucher
On Thu, Aug 11, 2022 at 6:36 PM Rodrigo Siqueira
 wrote:
>
> After removing some code for fixing the PowerPC compilation, we had some
> leftover functions that are not used anymore. This commit drops
> optc3_fpu_set_vrr_m_const since we don't need it anymore.
>
> Signed-off-by: Rodrigo Siqueira 

Reviewed-by: Alex Deucher 

> ---
>  .../drm/amd/display/dc/dml/dcn30/dcn30_fpu.c  | 77 ---
>  .../drm/amd/display/dc/dml/dcn30/dcn30_fpu.h  |  3 -
>  2 files changed, 80 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn30/dcn30_fpu.c 
> b/drivers/gpu/drm/amd/display/dc/dml/dcn30/dcn30_fpu.c
> index e1e92daba668..814374b1016c 100644
> --- a/drivers/gpu/drm/amd/display/dc/dml/dcn30/dcn30_fpu.c
> +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn30/dcn30_fpu.c
> @@ -177,83 +177,6 @@ struct _vcs_dpi_soc_bounding_box_st dcn3_0_soc = {
> .urgent_latency_adjustment_fabric_clock_reference_mhz = 1000,
>  };
>
> -
> -void optc3_fpu_set_vrr_m_const(struct timing_generator *optc,
> -   double vtotal_avg)
> -{
> -   struct optc *optc1 = DCN10TG_FROM_TG(optc);
> -   double vtotal_min, vtotal_max;
> -   double ratio, modulo, phase;
> -   uint32_t vblank_start;
> -   uint32_t v_total_mask_value = 0;
> -
> -   dc_assert_fp_enabled();
> -
> -   /* Compute VTOTAL_MIN and VTOTAL_MAX, so that
> -* VOTAL_MAX - VTOTAL_MIN = 1
> -*/
> -   v_total_mask_value = 16;
> -   vtotal_min = dcn_bw_floor(vtotal_avg);
> -   vtotal_max = dcn_bw_ceil(vtotal_avg);
> -
> -   /* Check that bottom VBLANK is at least 2 lines tall when running with
> -* VTOTAL_MIN. Note that VTOTAL registers are defined as 'total number
> -* of lines in a frame - 1'.
> -*/
> -   REG_GET(OTG_V_BLANK_START_END, OTG_V_BLANK_START,
> -   _start);
> -   ASSERT(vtotal_min >= vblank_start + 1);
> -
> -   /* Special case where the average frame rate can be achieved
> -* without using the DTO
> -*/
> -   if (vtotal_min == vtotal_max) {
> -   REG_SET(OTG_V_TOTAL, 0, OTG_V_TOTAL, (uint32_t)vtotal_min);
> -
> -   optc->funcs->set_vtotal_min_max(optc, 0, 0);
> -   REG_SET(OTG_M_CONST_DTO0, 0, OTG_M_CONST_DTO_PHASE, 0);
> -   REG_SET(OTG_M_CONST_DTO1, 0, OTG_M_CONST_DTO_MODULO, 0);
> -   REG_UPDATE_3(OTG_V_TOTAL_CONTROL,
> -   OTG_V_TOTAL_MIN_SEL, 0,
> -   OTG_V_TOTAL_MAX_SEL, 0,
> -   OTG_SET_V_TOTAL_MIN_MASK_EN, 0);
> -   return;
> -   }
> -
> -   ratio = vtotal_max - vtotal_avg;
> -   modulo = 65536.0 * 65536.0 - 1.0; /* 2^32 - 1 */
> -   phase = ratio * modulo;
> -
> -   /* Special cases where the DTO phase gets rounded to 0 or
> -* to DTO modulo
> -*/
> -   if (phase <= 0 || phase >= modulo) {
> -   REG_SET(OTG_V_TOTAL, 0, OTG_V_TOTAL,
> -   phase <= 0 ?
> -   (uint32_t)vtotal_max : (uint32_t)vtotal_min);
> -   REG_SET(OTG_V_TOTAL_MIN, 0, OTG_V_TOTAL_MIN, 0);
> -   REG_SET(OTG_V_TOTAL_MAX, 0, OTG_V_TOTAL_MAX, 0);
> -   REG_SET(OTG_M_CONST_DTO0, 0, OTG_M_CONST_DTO_PHASE, 0);
> -   REG_SET(OTG_M_CONST_DTO1, 0, OTG_M_CONST_DTO_MODULO, 0);
> -   REG_UPDATE_3(OTG_V_TOTAL_CONTROL,
> -   OTG_V_TOTAL_MIN_SEL, 0,
> -   OTG_V_TOTAL_MAX_SEL, 0,
> -   OTG_SET_V_TOTAL_MIN_MASK_EN, 0);
> -   return;
> -   }
> -   REG_UPDATE_6(OTG_V_TOTAL_CONTROL,
> -   OTG_V_TOTAL_MIN_SEL, 1,
> -   OTG_V_TOTAL_MAX_SEL, 1,
> -   OTG_SET_V_TOTAL_MIN_MASK_EN, 1,
> -   OTG_SET_V_TOTAL_MIN_MASK, v_total_mask_value,
> -   OTG_VTOTAL_MID_REPLACING_MIN_EN, 0,
> -   OTG_VTOTAL_MID_REPLACING_MAX_EN, 0);
> -   REG_SET(OTG_V_TOTAL, 0, OTG_V_TOTAL, (uint32_t)vtotal_min);
> -   optc->funcs->set_vtotal_min_max(optc, vtotal_min, vtotal_max);
> -   REG_SET(OTG_M_CONST_DTO0, 0, OTG_M_CONST_DTO_PHASE, (uint32_t)phase);
> -   REG_SET(OTG_M_CONST_DTO1, 0, OTG_M_CONST_DTO_MODULO, 
> (uint32_t)modulo);
> -}
> -
>  void dcn30_fpu_populate_dml_writeback_from_context(
> struct dc *dc, struct resource_context *res_ctx, 
> display_e2e_pipe_params_st *pipes)
>  {
> diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn30/dcn30_fpu.h 
> b/drivers/gpu/drm/amd/display/dc/dml/dcn30/dcn30_fpu.h
> index cab864095ce7..e3b6ad6a8784 100644
> --- a/drivers/gpu/drm/amd/display/dc/dml/dcn30/dcn30_fpu.h
> +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn30/dcn30_fpu.h
> @@ -29,9 +29,6 @@
>  #include "core_types.h"
>  #include "dcn20/dcn20_optc.h"
>
> -void optc3_fpu_set_vrr_m_const(struct timing_generator *optc,
> -   double vtotal_avg);
> -
>  void dcn30_fpu_populate_dml_writeback_from_context(
>   

Re: [PATCH v2] drm/amd/display: Fix a compilation failure on PowerPC caused by FPU code

2022-08-12 Thread Alex Deucher
On Thu, Aug 11, 2022 at 6:38 PM Rodrigo Siqueira Jordao
 wrote:
>
>
>
> On 2022-08-11 17:49, Alex Deucher wrote:
> > On Thu, Aug 11, 2022 at 3:56 PM Rodrigo Siqueira
> >  wrote:
> >>
> >> We got a report from Stephen/Michael that the PowerPC build was failing
> >> with the following error:
> >>
> >> ld: drivers/gpu/drm/amd/display/dc/dml/display_mode_lib.o uses hard float, 
> >> drivers/gpu/drm/amd/display/dc/dcn30/dcn30_optc.o uses soft float
> >> ld: failed to merge target specific data of file 
> >> drivers/gpu/drm/amd/display/dc/dcn30/dcn30_optc.o
> >>
> >> This error happened because of the function optc3_set_vrr_m_const. This
> >> function expects a double as a parameter in a code that is not allowed
> >> to have FPU operations. After further investigation, it became clear
> >> that optc3_set_vrr_m_const was never invoked, so we can safely drop this
> >> function and fix the ld issue.
> >>
> >> Changes since V1:
> >>   - Drop optc3_fpu_set_vrr_m_const since it is unused.
> >
> > FWIW, I upstreamed v1 already.  Can you rebase your v2 changes on that?
>
> Hi Alex,
>
> I guess the v1 was not merged into the amd-staging-drm-next. I just
> applied the v1 there (waiting for CI result).

Yeah, sorry, I should have mentioned that.  I thought you were going
to apply it.  I wanted to fix the failure before I send Dave my last
PR for the merge window.

Alex


>
> I also sent this patch:
>
> https://lore.kernel.org/amd-gfx/cadnq5_oiqwc7reg8cj_s6ukhobv0zge-+9wo1cexojk+7zw...@mail.gmail.com/T/#t
>
> Thanks
> Siqueira
>
> >
> > Alex
> >
> >>
> >> Cc: Alex Deucher 
> >> Cc: Melissa Wen 
> >> Cc: Maíra Canal 
> >> Reported-by: Stephen Rothwell 
> >> Reported-by: Michael Ellerman 
> >> Signed-off-by: Rodrigo Siqueira 
> >> ---
> >>   .../gpu/drm/amd/display/dc/dcn30/dcn30_optc.c |  8 --
> >>   .../gpu/drm/amd/display/dc/dcn30/dcn30_optc.h |  3 -
> >>   .../gpu/drm/amd/display/dc/dcn32/dcn32_optc.c |  1 -
> >>   .../drm/amd/display/dc/dml/dcn30/dcn30_fpu.c  | 77 ---
> >>   .../drm/amd/display/dc/dml/dcn30/dcn30_fpu.h  |  3 -
> >>   .../amd/display/dc/inc/hw/timing_generator.h  |  2 -
> >>   6 files changed, 94 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_optc.c 
> >> b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_optc.c
> >> index d072997477dd..1782b9c26cf4 100644
> >> --- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_optc.c
> >> +++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_optc.c
> >> @@ -184,14 +184,6 @@ void optc3_set_dsc_config(struct timing_generator 
> >> *optc,
> >>  REG_UPDATE(OTG_V_SYNC_A_CNTL, OTG_V_SYNC_MODE, 0);
> >>   }
> >>
> >> -void optc3_set_vrr_m_const(struct timing_generator *optc,
> >> -   double vtotal_avg)
> >> -{
> >> -   DC_FP_START();
> >> -   optc3_fpu_set_vrr_m_const(optc, vtotal_avg);
> >> -   DC_FP_END();
> >> -}
> >> -
> >>   void optc3_set_odm_bypass(struct timing_generator *optc,
> >>  const struct dc_crtc_timing *dc_crtc_timing)
> >>   {
> >> diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_optc.h 
> >> b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_optc.h
> >> index 33bd12f5dc17..dd45a5499b07 100644
> >> --- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_optc.h
> >> +++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_optc.h
> >> @@ -329,9 +329,6 @@ void optc3_lock_doublebuffer_enable(struct 
> >> timing_generator *optc);
> >>
> >>   void optc3_lock_doublebuffer_disable(struct timing_generator *optc);
> >>
> >> -void optc3_set_vrr_m_const(struct timing_generator *optc,
> >> -   double vtotal_avg);
> >> -
> >>   void optc3_set_drr_trigger_window(struct timing_generator *optc,
> >>  uint32_t window_start, uint32_t window_end);
> >>
> >> diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_optc.c 
> >> b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_optc.c
> >> index 9861be1dc063..1fad7b48bd5b 100644
> >> --- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_optc.c
> >> +++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_optc.c
> >> @@ -281,7 +281,6 @@ static struct timing_generator_funcs dcn32_tg_funcs = {
> >>  .lock_doublebuffer_enable = 
> >> optc3_lock_doublebuffer_enable,
> >>  .lock_doublebuffer_disable = 
> >> optc3_lock_doublebuffer_disable,
> >>  .enable_optc_clock = optc1_enable_optc_clock,
> >> -   .set_vrr_m_const = optc3_set_vrr_m_const,
> >>  .set_drr = optc32_set_drr,
> >>  .get_last_used_drr_vtotal = 
> >> optc2_get_last_used_drr_vtotal,
> >>  .set_vtotal_min_max = optc3_set_vtotal_min_max,
> >> diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn30/dcn30_fpu.c 
> >> b/drivers/gpu/drm/amd/display/dc/dml/dcn30/dcn30_fpu.c
> >> index e1e92daba668..814374b1016c 100644
> >> --- a/drivers/gpu/drm/amd/display/dc/dml/dcn30/dcn30_fpu.c
> >> +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn30/dcn30_fpu.c
> >> @@ -177,83 +177,6 @@ struct 

Re: [PATCH 1/2] drm/amdgpu: enable IH Clock Gating for OSS IP v6.0.1

2022-08-12 Thread Alex Deucher
Series is:
Reviewed-by: Alex Deucher 

On Fri, Aug 12, 2022 at 6:20 AM Tim Huang  wrote:
>
> Enable AMD_CG_SUPPORT_IH_CG support.
>
> Signed-off-by: Tim Huang 
> ---
>  drivers/gpu/drm/amd/amdgpu/soc21.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c 
> b/drivers/gpu/drm/amd/amdgpu/soc21.c
> index 6c3440e7ed3f..1ff7fc7bb340 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc21.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
> @@ -602,6 +602,7 @@ static int soc21_common_early_init(void *handle)
> AMD_CG_SUPPORT_HDP_LS |
> AMD_CG_SUPPORT_ATHUB_MGCG |
> AMD_CG_SUPPORT_ATHUB_LS |
> +   AMD_CG_SUPPORT_IH_CG |
> AMD_CG_SUPPORT_VCN_MGCG |
> AMD_CG_SUPPORT_JPEG_MGCG;
> adev->pg_flags =
> --
> 2.25.1
>


[PATCH -next 4/4] drm/amd/display: clean up one inconsistent indenting

2022-08-12 Thread Yang Li
The indentation of statements in the same curly bracket should be
consistent.

Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=1892
Reported-by: Abaci Robot 
Signed-off-by: Yang Li 
---
 drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
index 4aecbf230446..1a2554438f77 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
@@ -640,9 +640,9 @@ bool dcn32_set_output_transfer_func(struct dc *dc,
stream->out_transfer_func,
>blender_params, false))
params = >blender_params;
-/* there are no ROM LUTs in OUTGAM */
-   if (stream->out_transfer_func->type == TF_TYPE_PREDEFINED)
-   BREAK_TO_DEBUGGER();
+/* there are no ROM LUTs in OUTGAM */
+   if (stream->out_transfer_func->type == 
TF_TYPE_PREDEFINED)
+   BREAK_TO_DEBUGGER();
}
}
 
-- 
2.20.1.7.g153144c



Re: [Linaro-mm-sig] [PATCH v2 3/5] dma-buf: Move all dma-bufs to dynamic locking specification

2022-08-12 Thread Dmitry Osipenko
On 8/12/22 14:34, Christian König wrote:
> 
> 
> Am 10.08.22 um 20:53 schrieb Dmitry Osipenko:
>> On 8/10/22 21:25, Christian König wrote:
>>> Am 10.08.22 um 19:49 schrieb Dmitry Osipenko:
 On 8/10/22 14:30, Christian König wrote:
> Am 25.07.22 um 17:18 schrieb Dmitry Osipenko:
>> This patch moves the non-dynamic dma-buf users over to the dynamic
>> locking specification. The strict locking convention prevents
>> deadlock
>> situation for dma-buf importers and exporters.
>>
>> Previously the "unlocked" versions of the dma-buf API functions
>> weren't
>> taking the reservation lock and this patch makes them to take the
>> lock.
>>
>> Intel and AMD GPU drivers already were mapping imported dma-bufs
>> under
>> the held lock, hence the "locked" variant of the functions are added
>> for them and the drivers are updated to use the "locked" versions.
> In general "Yes, please", but that won't be that easy.
>
> You not only need to change amdgpu and i915, but all drivers
> implementing the map_dma_buf(), unmap_dma_buf() callbacks.
>
> Auditing all that code is a huge bunch of work.
 Hm, neither of drivers take the resv lock in map_dma_buf/unmap_dma_buf.
 It's easy to audit them all and I did it. So either I'm missing
 something or it doesn't take much time to check them all. Am I really
 missing something?
>>> Ok, so this is only changing map/unmap now?
>> It also vmap/vunmap and attach/detach: In the previous patch I added the
>> _unlocked postfix to the func names and in this patch I made them all to
>> actually take the lock.
> 
> 
> Take your patch "[PATCH v2 2/5] drm/gem: Take reservation lock for
> vmap/vunmap operations" as a blueprint on how to approach it.
> 
> E.g. one callback at a time and then document the result in the end.

Yeah, I'll do it for v3. I'm vaguely recalling that there was a problem
when I wanted to split this patch in the past, but don't remember what
it was.. maybe that problem is gone now, will see :)

-- 
Best regards,
Dmitry


[PATCH -next 2/4] drm/amd/display: clean up one inconsistent indenting

2022-08-12 Thread Yang Li
1. The indentation of statements in the same curly bracket should be
consistent.
2. Variable declarations in the same function should be aligned.

Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=1887
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=1888
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=1889
Reported-by: Abaci Robot 
Signed-off-by: Yang Li 
---
 .../gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c| 13 -
 .../amd/display/dc/dml/dcn32/display_mode_vba_32.c  |  6 +++---
 .../display/dc/dml/dcn32/display_mode_vba_util_32.c |  2 +-
 3 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
index 3316c4a64901..a8539922715e 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
@@ -2135,13 +2135,16 @@ void dcn32_update_bw_bounding_box_fpu(struct dc *dc, 
struct clk_bw_params *bw_pa
 
if 
(dc->ctx->dc_bios->funcs->get_soc_bb_info(dc->ctx->dc_bios, _info) == 
BP_RESULT_OK) {
if (bb_info.dram_clock_change_latency_100ns > 0)
-   dcn3_2_soc.dram_clock_change_latency_us 
= bb_info.dram_clock_change_latency_100ns * 10;
+   dcn3_2_soc.dram_clock_change_latency_us 
=
+   
bb_info.dram_clock_change_latency_100ns * 10;
 
-   if (bb_info.dram_sr_enter_exit_latency_100ns > 0)
-   dcn3_2_soc.sr_enter_plus_exit_time_us = 
bb_info.dram_sr_enter_exit_latency_100ns * 10;
+   if (bb_info.dram_sr_enter_exit_latency_100ns > 
0)
+   dcn3_2_soc.sr_enter_plus_exit_time_us =
+   
bb_info.dram_sr_enter_exit_latency_100ns * 10;
 
-   if (bb_info.dram_sr_exit_latency_100ns > 0)
-   dcn3_2_soc.sr_exit_time_us = 
bb_info.dram_sr_exit_latency_100ns * 10;
+   if (bb_info.dram_sr_exit_latency_100ns > 0)
+   dcn3_2_soc.sr_exit_time_us =
+   
bb_info.dram_sr_exit_latency_100ns * 10;
}
}
 
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_32.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_32.c
index cb2025771646..6a4f730419c0 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_32.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_32.c
@@ -677,9 +677,9 @@ static void 
DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerforman
dml_ceil((double) 
v->WritebackDelay[mode_lib->vba.VoltageLevel][k]
/ (mode_lib->vba.HTotal[k] / 
mode_lib->vba.PixelClock[k]), 1));
 
-   // Clamp to max OTG vstartup register limit
-   if (v->MaxVStartupLines[k] > 1023)
-   v->MaxVStartupLines[k] = 1023;
+   // Clamp to max OTG vstartup register limit
+   if (v->MaxVStartupLines[k] > 1023)
+   v->MaxVStartupLines[k] = 1023;
 
 #ifdef __DML_VBA_DEBUG__
dml_print("DML::%s: k=%d MaxVStartupLines = %d\n", __func__, k, 
v->MaxVStartupLines[k]);
diff --git 
a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.c
index 05fc14a47fba..3ba76aab0a20 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.c
@@ -4277,7 +4277,7 @@ void 
dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport(
double ActiveClockChangeLatencyHidingY;
double ActiveClockChangeLatencyHidingC;
double ActiveClockChangeLatencyHiding;
-double EffectiveDETBufferSizeY;
+   double EffectiveDETBufferSizeY;
double ActiveFCLKChangeLatencyMargin[DC__NUM_DPP__MAX];
double USRRetrainingLatencyMargin[DC__NUM_DPP__MAX];
double TotalPixelBW = 0.0;
-- 
2.20.1.7.g153144c



[PATCH -next 1/4] drm/amd/display: clean up one inconsistent indenting

2022-08-12 Thread Yang Li
The indentation of statements in the same curly bracket should be
consistent.

Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=1886
Reported-by: Abaci Robot 
Signed-off-by: Yang Li 
---
 .../gpu/drm/amd/display/dc/dml/dcn321/dcn321_fpu.c  | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn321/dcn321_fpu.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn321/dcn321_fpu.c
index c87091683b5d..7ebf25e87933 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn321/dcn321_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn321/dcn321_fpu.c
@@ -518,13 +518,16 @@ void dcn321_update_bw_bounding_box_fpu(struct dc *dc, 
struct clk_bw_params *bw_p
 
if 
(dc->ctx->dc_bios->funcs->get_soc_bb_info(dc->ctx->dc_bios, _info) == 
BP_RESULT_OK) {
if (bb_info.dram_clock_change_latency_100ns > 0)
-   
dcn3_21_soc.dram_clock_change_latency_us = 
bb_info.dram_clock_change_latency_100ns * 10;
+   
dcn3_21_soc.dram_clock_change_latency_us =
+   
bb_info.dram_clock_change_latency_100ns * 10;
 
-   if (bb_info.dram_sr_enter_exit_latency_100ns > 0)
-   dcn3_21_soc.sr_enter_plus_exit_time_us = 
bb_info.dram_sr_enter_exit_latency_100ns * 10;
+   if (bb_info.dram_sr_enter_exit_latency_100ns > 
0)
+   dcn3_21_soc.sr_enter_plus_exit_time_us =
+   
bb_info.dram_sr_enter_exit_latency_100ns * 10;
 
-   if (bb_info.dram_sr_exit_latency_100ns > 0)
-   dcn3_21_soc.sr_exit_time_us = 
bb_info.dram_sr_exit_latency_100ns * 10;
+   if (bb_info.dram_sr_exit_latency_100ns > 0)
+   dcn3_21_soc.sr_exit_time_us =
+   
bb_info.dram_sr_exit_latency_100ns * 10;
}
}
 
-- 
2.20.1.7.g153144c



Re: [PATCH v4 00/41] DYNDBG: opt-in class'd debug for modules, use in drm.

2022-08-12 Thread Greg KH
On Thu, Aug 11, 2022 at 06:52:40PM +0200, Daniel Vetter wrote:
> On Wed, Aug 03, 2022 at 04:13:05PM -0400, Jason Baron wrote:
> > 
> > 
> > On 8/3/22 15:56, jim.cro...@gmail.com wrote:
> > > On Wed, Jul 20, 2022 at 9:32 AM Jim Cromie  wrote:
> > >>
> > > 
> > >> Hi Jason, Greg, DRM-folk,
> > >>
> > >> This adds 'typed' "class FOO" support to dynamic-debug, where 'typed'
> > >> means either DISJOINT (like drm debug categories), or VERBOSE (like
> > >> nouveau debug-levels).  Use it in DRM modules: core, helpers, and in
> > >> drivers i915, amdgpu, nouveau.
> > >>
> > > 
> > > This revision fell over, on a conflict with something in drm-MUMBLE
> > > 
> > > Error: patch 
> > > https://urldefense.com/v3/__https://patchwork.freedesktop.org/api/1.0/series/106427/revisions/2/mbox/__;!!GjvTz_vk!UCPl5Uf32cDVwwysMTfaLwoGLWomargFXuR8HjBA3xsUOjxXHXC5hneAkP4iWK91yc-LjjJxWW89-51Z$
> > >  
> > > not applied
> > > Applying: dyndbg: fix static_branch manipulation
> > > Applying: dyndbg: fix module.dyndbg handling
> > > Applying: dyndbg: show both old and new in change-info
> > > Applying: dyndbg: reverse module walk in cat control
> > > Applying: dyndbg: reverse module.callsite walk in cat control
> > > Applying: dyndbg: use ESCAPE_SPACE for cat control
> > > Applying: dyndbg: let query-modname override actual module name
> > > Applying: dyndbg: add test_dynamic_debug module
> > > Applying: dyndbg: drop EXPORTed dynamic_debug_exec_queries
> > > 
> > > Jason,
> > > those above are decent maintenance patches, particularly the drop export.
> > > It would be nice to trim this unused api this cycle.
> > 
> > Hi Jim,
> > 
> > Agreed - I was thinking the same thing. Feel free to add
> > Acked-by: Jason Baron  to those first 9.
> 
> Does Greg KH usually pick up dyndbg patches or someone else or do I need
> to do something? Would be great to get some movement here since -rc1 goes
> out and merging will restart next week.

Yes, I can take these into my tree after -rc1 is out.

thanks,

greg k-h


[PATCH -next 3/4] drm/amd/display: clean up one inconsistent indenting

2022-08-12 Thread Yang Li
The indentation of statements in the same curly bracket should be
consistent.

Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=1890
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=1891
Reported-by: Abaci Robot 
Signed-off-by: Yang Li 
---
 drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c 
b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
index 5b5d952b2b8c..4ac8e4fcba77 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
@@ -2151,8 +2151,8 @@ static int dcn10_align_pixel_clocks(struct dc *dc, int 
group_size,

dc->res_pool->dp_clock_source->funcs->get_pixel_clk_frequency_100hz(
dc->res_pool->dp_clock_source,
grouped_pipes[i]->stream_res.tg->inst, 
);
-   
grouped_pipes[i]->stream->timing.pix_clk_100hz =
-   
pclk*get_clock_divider(grouped_pipes[i], false);
+   grouped_pipes[i]->stream->timing.pix_clk_100hz =
+   
pclk*get_clock_divider(grouped_pipes[i], false);
if (master == -1)
master = i;
}
@@ -2206,7 +2206,7 @@ void dcn10_enable_vblanks_synchronization(
grouped_pipes[i]->stream->timing.pix_clk_100hz,
get_clock_divider(grouped_pipes[master], false),
get_clock_divider(grouped_pipes[i], false));
-   grouped_pipes[i]->stream->vblank_synchronized = 
true;
+   grouped_pipes[i]->stream->vblank_synchronized = true;
}
grouped_pipes[master]->stream->vblank_synchronized = true;
DC_SYNC_INFO("Sync complete\n");
-- 
2.20.1.7.g153144c



[PATCH] drm/amd/display: remove unreachable code

2022-08-12 Thread Jiapeng Chong
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_util_32.c:1658
 dml32_TruncToValidBPP() warn: ignoring unreachable code.

Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=1894
Reported-by: Abaci Robot 
Signed-off-by: Jiapeng Chong 
---
 .../drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.c   | 4 
 1 file changed, 4 deletions(-)

diff --git 
a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.c
index 05fc14a47fba..0758e1da55a9 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.c
@@ -1654,10 +1654,6 @@ double dml32_TruncToValidBPP(
else
return DesiredBPP;
}
-
-   *RequiredSlots = dml_ceil(DesiredBPP / MaxLinkBPP * 64, 1);
-
-   return BPP_INVALID;
 } // TruncToValidBPP
 
 double dml32_RequiredDTBCLK(
-- 
2.20.1.7.g153144c



Re: build failure of next-20220811 due to b1a63a0b48ad ("drm/amd/display: consider DSC pass-through during mode validation")

2022-08-12 Thread Stephen Rothwell
Hi all,

On Thu, 11 Aug 2022 18:10:48 +0100 "Sudip Mukherjee (Codethink)" 
 wrote:
>
> Not sure if it has been reported, builds of riscv, alpha, s390, arm,
> arm64, xtensa, mips, csky allmodconfig have failed to build next-20220811
> with the error:
> 
> ERROR: modpost: "dc_dsc_compute_bandwidth_range" 
> [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
> ERROR: modpost: "dc_dsc_get_policy_for_timing" 
> [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
> 
> git bisect pointed to b1a63a0b48ad ("drm/amd/display: consider DSC 
> pass-through during mode validation")
> And, reverting that commit has fixed the build failure.
> 
> I will be happy to test any patch or provide any extra log if needed.

I have reverted that commit in today's linux-next.

-- 
Cheers,
Stephen Rothwell


pgpnn2ris7VwS.pgp
Description: OpenPGP digital signature


RE: [PATCH v3 2/3] drm/amdgpu_dm: Rely on split out luminance calculation function

2022-08-12 Thread Jani Nikula
On Thu, 11 Aug 2022, "Deucher, Alexander"  wrote:
> [Public]
>
>> -Original Message-
>> From: amd-gfx  On Behalf Of Jani
>> Nikula
>> Sent: Thursday, August 4, 2022 5:55 AM
>> To: Jouni Högander ; dri-
>> de...@lists.freedesktop.org; intel-...@lists.freedesktop.org; amd-
>> g...@lists.freedesktop.org
>> Cc: Siqueira, Rodrigo ; Li, Roman
>> ; Manasi Navare ; Mika
>> Kahola ; Jouni Högander
>> ; Wentland, Harry
>> 
>> Subject: Re: [PATCH v3 2/3] drm/amdgpu_dm: Rely on split out luminance
>> calculation function
>> 
>> On Tue, 19 Jul 2022, Jouni Högander  wrote:
>> > Luminance range calculation was split out into drm_edid.c and is now
>> > part of edid parsing. Rely on values calculated during edid parsing
>> > and use these for caps->aux_max_input_signal and caps-
>> >aux_min_input_signal.
>> 
>> Harry, I'll merge patches 1 & 3 in this series through drm-misc-next, 
>> because I
>> think they're good to go, and fix stuff in i915.
>> 
>> Can I get your rb/ack to merge this patch as well, or do you want to take 
>> this
>> later via your tree?
>
> You can take this via drm-misc.
> Acked-by: Alex Deucher 

Thanks, pushed the series to drm-misc-next.

BR,
Jani.

>
>
>> 
>> BR,
>> Jani.
>> 
>> 
>> >
>> > v2: Use values calculated during edid parsing
>> >
>> > Cc: Roman Li 
>> > Cc: Rodrigo Siqueira 
>> > Cc: Harry Wentland 
>> > Cc: Lyude Paul 
>> > Cc: Mika Kahola 
>> > Cc: Jani Nikula 
>> > Cc: Manasi Navare 
>> > Signed-off-by: Jouni Högander 
>> > ---
>> >  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 35
>> > +++
>> >  1 file changed, 4 insertions(+), 31 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> > index 3e83fed540e8..eb7abdeb8653 100644
>> > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> > @@ -2903,15 +2903,12 @@ static struct drm_mode_config_helper_funcs
>> > amdgpu_dm_mode_config_helperfuncs = {
>> >
>> >  static void update_connector_ext_caps(struct amdgpu_dm_connector
>> > *aconnector)  {
>> > -  u32 max_avg, min_cll, max, min, q, r;
>> >struct amdgpu_dm_backlight_caps *caps;
>> >struct amdgpu_display_manager *dm;
>> >struct drm_connector *conn_base;
>> >struct amdgpu_device *adev;
>> >struct dc_link *link = NULL;
>> > -  static const u8 pre_computed_values[] = {
>> > -  50, 51, 52, 53, 55, 56, 57, 58, 59, 61, 62, 63, 65, 66, 68, 69,
>> > -  71, 72, 74, 75, 77, 79, 81, 82, 84, 86, 88, 90, 92, 94, 96, 98};
>> > +  struct drm_luminance_range_info *luminance_range;
>> >int i;
>> >
>> >if (!aconnector || !aconnector->dc_link) @@ -2933,8 +2930,6 @@
>> > static void update_connector_ext_caps(struct amdgpu_dm_connector
>> *aconnector)
>> >caps = >backlight_caps[i];
>> >caps->ext_caps = >dc_link->dpcd_sink_ext_caps;
>> >caps->aux_support = false;
>> > -  max_avg = conn_base->hdr_sink_metadata.hdmi_type1.max_fall;
>> > -  min_cll = conn_base->hdr_sink_metadata.hdmi_type1.min_cll;
>> >
>> >if (caps->ext_caps->bits.oled == 1 /*||
>> >caps->ext_caps->bits.sdr_aux_backlight_control == 1 || @@
>> > -2946,31 +2941,9 @@ static void update_connector_ext_caps(struct
>> amdgpu_dm_connector *aconnector)
>> >else if (amdgpu_backlight == 1)
>> >caps->aux_support = true;
>> >
>> > -  /* From the specification (CTA-861-G), for calculating the maximum
>> > -   * luminance we need to use:
>> > -   *  Luminance = 50*2**(CV/32)
>> > -   * Where CV is a one-byte value.
>> > -   * For calculating this expression we may need float point precision;
>> > -   * to avoid this complexity level, we take advantage that CV is divided
>> > -   * by a constant. From the Euclids division algorithm, we know that
>> CV
>> > -   * can be written as: CV = 32*q + r. Next, we replace CV in the
>> > -   * Luminance expression and get 50*(2**q)*(2**(r/32)), hence we
>> just
>> > -   * need to pre-compute the value of r/32. For pre-computing the
>> values
>> > -   * We just used the following Ruby line:
>> > -   *  (0...32).each {|cv| puts (50*2**(cv/32.0)).round}
>> > -   * The results of the above expressions can be verified at
>> > -   * pre_computed_values.
>> > -   */
>> > -  q = max_avg >> 5;
>> > -  r = max_avg % 32;
>> > -  max = (1 << q) * pre_computed_values[r];
>> > -
>> > -  // min luminance: maxLum * (CV/255)^2 / 100
>> > -  q = DIV_ROUND_CLOSEST(min_cll, 255);
>> > -  min = max * DIV_ROUND_CLOSEST((q * q), 100);
>> > -
>> > -  caps->aux_max_input_signal = max;
>> > -  caps->aux_min_input_signal = min;
>> > +  luminance_range = _base->display_info.luminance_range;
>> > +  caps->aux_min_input_signal = luminance_range->min_luminance;
>> > +  caps->aux_max_input_signal = luminance_range->max_luminance;
>> >  }
>> >
>> >  void amdgpu_dm_update_connector_after_detect(
>> 
>> --
>> Jani Nikula, Intel Open Source Graphics Center

-- 
Jani Nikula, Intel Open 

Re: [PATCH v2] drm/amdkfd: Try to schedule bottom half on same core

2022-08-12 Thread Philip Yang



On 2022-08-11 15:04, Felix Kuehling wrote:

On systems that support SMT (hyperthreading) schedule the bottom half of
the KFD interrupt handler on the same core. This makes it possible to
reserve a core for interrupt handling and have the bottom half run on
that same core.

On systems without SMT, pick another core in the same NUMA node, as
before.

Use for_each_cpu_wrap instead of open-coding it.

Signed-off-by: Felix Kuehling 


nit-pick below, looks better to use new_cpu as iterator, either way this is

Reviewed-by: Philip Yang 


---
  drivers/gpu/drm/amd/amdkfd/kfd_device.c | 20 
  1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index f5853835f03a..4d1284714e7a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -24,6 +24,7 @@
  #include 
  #include 
  #include 
+#include 
  #include "kfd_priv.h"
  #include "kfd_device_queue_manager.h"
  #include "kfd_pm4_headers_vi.h"
@@ -801,13 +802,24 @@ static inline void kfd_queue_work(struct workqueue_struct 
*wq,
  struct work_struct *work)
  {
int cpu, new_cpu;
+   const struct cpumask *mask = NULL;
  
  	cpu = new_cpu = smp_processor_id();

-   do {
-   new_cpu = cpumask_next(new_cpu, cpu_online_mask) % nr_cpu_ids;
-   if (cpu_to_node(new_cpu) == numa_node_id())
+
+#if defined(CONFIG_SCHED_SMT)
+   /* CPU threads in the same core */
+   mask = cpu_smt_mask(cpu);
+#endif
+   if (!mask || cpumask_weight(mask) <= 1)
+   /* CPU threads in the same NUMA node */
+   mask = cpu_cpu_mask(cpu);
+   /* Pick the next online CPU thread in the same core or NUMA node */
+   for_each_cpu_wrap(cpu, mask, cpu+1) {
+   if (cpu != new_cpu && cpu_online(cpu)) {
+   new_cpu = cpu;
break;
-   } while (cpu != new_cpu);
+   }
+   }
  
  	queue_work_on(new_cpu, wq, work);

  }


for_each_cpu_wrap(new_cpu, mask, cpu + 1) {
if (cpu != new_cpu && cpu_online(new_cpu)) {
cpu = new_cpu;
break;
}
}
queue_work_on(cpu, wq, work);



[PATCH v6 6/6] drm/ttm: Switch to using the new res callback

2022-08-12 Thread Arunpravin Paneer Selvam
Apply new intersect and compatible callback instead
of having a generic placement range verfications.

v2: Added a separate callback for compatiblilty
checks (Christian)
v3: Cleanups and removal of workarounds

Signed-off-by: Christian König 
Signed-off-by: Arunpravin Paneer Selvam 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 45 +++--
 drivers/gpu/drm/ttm/ttm_resource.c  | 17 ++
 2 files changed, 15 insertions(+), 47 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 170935c294f5..7d25a10395c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1328,11 +1328,12 @@ uint64_t amdgpu_ttm_tt_pte_flags(struct amdgpu_device 
*adev, struct ttm_tt *ttm,
 static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
const struct ttm_place *place)
 {
-   unsigned long num_pages = bo->resource->num_pages;
struct dma_resv_iter resv_cursor;
-   struct amdgpu_res_cursor cursor;
struct dma_fence *f;
 
+   if (!amdgpu_bo_is_amdgpu_bo(bo))
+   return ttm_bo_eviction_valuable(bo, place);
+
/* Swapout? */
if (bo->resource->mem_type == TTM_PL_SYSTEM)
return true;
@@ -1351,40 +1352,20 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct 
ttm_buffer_object *bo,
return false;
}
 
-   switch (bo->resource->mem_type) {
-   case AMDGPU_PL_PREEMPT:
-   /* Preemptible BOs don't own system resources managed by the
-* driver (pages, VRAM, GART space). They point to resources
-* owned by someone else (e.g. pageable memory in user mode
-* or a DMABuf). They are used in a preemptible context so we
-* can guarantee no deadlocks and good QoS in case of MMU
-* notifiers or DMABuf move notifiers from the resource owner.
-*/
+   /* Preemptible BOs don't own system resources managed by the
+* driver (pages, VRAM, GART space). They point to resources
+* owned by someone else (e.g. pageable memory in user mode
+* or a DMABuf). They are used in a preemptible context so we
+* can guarantee no deadlocks and good QoS in case of MMU
+* notifiers or DMABuf move notifiers from the resource owner.
+*/
+   if (bo->resource->mem_type == AMDGPU_PL_PREEMPT)
return false;
-   case TTM_PL_TT:
-   if (amdgpu_bo_is_amdgpu_bo(bo) &&
-   amdgpu_bo_encrypted(ttm_to_amdgpu_bo(bo)))
-   return false;
-   return true;
 
-   case TTM_PL_VRAM:
-   /* Check each drm MM node individually */
-   amdgpu_res_first(bo->resource, 0, (u64)num_pages << PAGE_SHIFT,
-);
-   while (cursor.remaining) {
-   if (place->fpfn < PFN_DOWN(cursor.start + cursor.size)
-   && !(place->lpfn &&
-place->lpfn <= PFN_DOWN(cursor.start)))
-   return true;
-
-   amdgpu_res_next(, cursor.size);
-   }
+   if (bo->resource->mem_type == TTM_PL_TT &&
+   amdgpu_bo_encrypted(ttm_to_amdgpu_bo(bo)))
return false;
 
-   default:
-   break;
-   }
-
return ttm_bo_eviction_valuable(bo, place);
 }
 
diff --git a/drivers/gpu/drm/ttm/ttm_resource.c 
b/drivers/gpu/drm/ttm/ttm_resource.c
index 0d1f862a582b..a729c32a1e48 100644
--- a/drivers/gpu/drm/ttm/ttm_resource.c
+++ b/drivers/gpu/drm/ttm/ttm_resource.c
@@ -276,17 +276,9 @@ bool ttm_resource_intersects(struct ttm_device *bdev,
if (!res)
return false;
 
-   if (!place)
-   return true;
-
man = ttm_manager_type(bdev, res->mem_type);
-   if (!man->func->intersects) {
-   if (place->fpfn >= (res->start + res->num_pages) ||
-   (place->lpfn && place->lpfn <= res->start))
-   return false;
-
+   if (!place || !man->func->intersects)
return true;
-   }
 
return man->func->intersects(man, res, place, size);
 }
@@ -314,13 +306,8 @@ bool ttm_resource_compatible(struct ttm_device *bdev,
return false;
 
man = ttm_manager_type(bdev, res->mem_type);
-   if (!man->func->compatible) {
-   if (res->start < place->fpfn ||
-   (place->lpfn && (res->start + res->num_pages) > 
place->lpfn))
-   return false;
-
+   if (!man->func->compatible)
return true;
-   }
 
return man->func->compatible(man, res, place, size);
 }
-- 
2.25.1



[PATCH v6 4/6] drm/i915: Implement intersect/compatible functions

2022-08-12 Thread Arunpravin Paneer Selvam
Implemented a new intersect and compatible callback function
fetching start offset from drm buddy allocator.

v3: move the bits that are specific to buddy_man (Matthew)
v4: consider the block size /range (Matthew)

Signed-off-by: Christian König 
Signed-off-by: Arunpravin Paneer Selvam 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   | 41 +--
 drivers/gpu/drm/i915/i915_ttm_buddy_manager.c | 73 +++
 2 files changed, 74 insertions(+), 40 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 5a5cf332d8a5..bc9c432edffe 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -361,7 +361,6 @@ static bool i915_ttm_eviction_valuable(struct 
ttm_buffer_object *bo,
   const struct ttm_place *place)
 {
struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
-   struct ttm_resource *res = bo->resource;
 
if (!obj)
return false;
@@ -378,45 +377,7 @@ static bool i915_ttm_eviction_valuable(struct 
ttm_buffer_object *bo,
if (!i915_gem_object_evictable(obj))
return false;
 
-   switch (res->mem_type) {
-   case I915_PL_LMEM0: {
-   struct ttm_resource_manager *man =
-   ttm_manager_type(bo->bdev, res->mem_type);
-   struct i915_ttm_buddy_resource *bman_res =
-   to_ttm_buddy_resource(res);
-   struct drm_buddy *mm = bman_res->mm;
-   struct drm_buddy_block *block;
-
-   if (!place->fpfn && !place->lpfn)
-   return true;
-
-   GEM_BUG_ON(!place->lpfn);
-
-   /*
-* If we just want something mappable then we can quickly check
-* if the current victim resource is using any of the CPU
-* visible portion.
-*/
-   if (!place->fpfn &&
-   place->lpfn == i915_ttm_buddy_man_visible_size(man))
-   return bman_res->used_visible_size > 0;
-
-   /* Real range allocation */
-   list_for_each_entry(block, _res->blocks, link) {
-   unsigned long fpfn =
-   drm_buddy_block_offset(block) >> PAGE_SHIFT;
-   unsigned long lpfn = fpfn +
-   (drm_buddy_block_size(mm, block) >> PAGE_SHIFT);
-
-   if (place->fpfn < lpfn && place->lpfn > fpfn)
-   return true;
-   }
-   return false;
-   } default:
-   break;
-   }
-
-   return true;
+   return ttm_bo_eviction_valuable(bo, place);
 }
 
 static void i915_ttm_evict_flags(struct ttm_buffer_object *bo,
diff --git a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c 
b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c
index 427de1aaab36..e19452f0e100 100644
--- a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c
+++ b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c
@@ -173,6 +173,77 @@ static void i915_ttm_buddy_man_free(struct 
ttm_resource_manager *man,
kfree(bman_res);
 }
 
+static bool i915_ttm_buddy_man_intersects(struct ttm_resource_manager *man,
+ struct ttm_resource *res,
+ const struct ttm_place *place,
+ size_t size)
+{
+   struct i915_ttm_buddy_resource *bman_res = to_ttm_buddy_resource(res);
+   struct i915_ttm_buddy_manager *bman = to_buddy_manager(man);
+   struct drm_buddy *mm = >mm;
+   struct drm_buddy_block *block;
+
+   if (!place->fpfn && !place->lpfn)
+   return true;
+
+   GEM_BUG_ON(!place->lpfn);
+
+   /*
+* If we just want something mappable then we can quickly check
+* if the current victim resource is using any of the CPU
+* visible portion.
+*/
+   if (!place->fpfn &&
+   place->lpfn == i915_ttm_buddy_man_visible_size(man))
+   return bman_res->used_visible_size > 0;
+
+   /* Check each drm buddy block individually */
+   list_for_each_entry(block, _res->blocks, link) {
+   unsigned long fpfn =
+   drm_buddy_block_offset(block) >> PAGE_SHIFT;
+   unsigned long lpfn = fpfn +
+   (drm_buddy_block_size(mm, block) >> PAGE_SHIFT);
+
+   if (place->fpfn < lpfn && place->lpfn > fpfn)
+   return true;
+   }
+
+   return false;
+}
+
+static bool i915_ttm_buddy_man_compatible(struct ttm_resource_manager *man,
+ struct ttm_resource *res,
+ const struct ttm_place *place,
+ size_t size)
+{
+   struct i915_ttm_buddy_resource 

[PATCH v6 5/6] drm/nouveau: Implement intersect/compatible functions

2022-08-12 Thread Arunpravin Paneer Selvam
Implemented a new intersect and compatible callback function
fetching the start offset from struct ttm_resource.

Signed-off-by: Christian König 
Signed-off-by: Arunpravin Paneer Selvam 
---
 drivers/gpu/drm/nouveau/nouveau_mem.c | 29 +++
 drivers/gpu/drm/nouveau/nouveau_mem.h |  6 ++
 drivers/gpu/drm/nouveau/nouveau_ttm.c | 24 ++
 3 files changed, 59 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_mem.c 
b/drivers/gpu/drm/nouveau/nouveau_mem.c
index 2e517cdc24c9..76f8edefa637 100644
--- a/drivers/gpu/drm/nouveau/nouveau_mem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_mem.c
@@ -187,3 +187,32 @@ nouveau_mem_new(struct nouveau_cli *cli, u8 kind, u8 comp,
*res = >base;
return 0;
 }
+
+bool
+nouveau_mem_intersects(struct ttm_resource *res,
+  const struct ttm_place *place,
+  size_t size)
+{
+   u32 num_pages = PFN_UP(size);
+
+   /* Don't evict BOs outside of the requested placement range */
+   if (place->fpfn >= (res->start + num_pages) ||
+   (place->lpfn && place->lpfn <= res->start))
+   return false;
+
+   return true;
+}
+
+bool
+nouveau_mem_compatible(struct ttm_resource *res,
+  const struct ttm_place *place,
+  size_t size)
+{
+   u32 num_pages = PFN_UP(size);
+
+   if (res->start < place->fpfn ||
+   (place->lpfn && (res->start + num_pages) > place->lpfn))
+   return false;
+
+   return true;
+}
diff --git a/drivers/gpu/drm/nouveau/nouveau_mem.h 
b/drivers/gpu/drm/nouveau/nouveau_mem.h
index 325551eba5cd..1ee6cdb9ad9b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_mem.h
+++ b/drivers/gpu/drm/nouveau/nouveau_mem.h
@@ -25,6 +25,12 @@ int nouveau_mem_new(struct nouveau_cli *, u8 kind, u8 comp,
struct ttm_resource **);
 void nouveau_mem_del(struct ttm_resource_manager *man,
 struct ttm_resource *);
+bool nouveau_mem_intersects(struct ttm_resource *res,
+   const struct ttm_place *place,
+   size_t size);
+bool nouveau_mem_compatible(struct ttm_resource *res,
+   const struct ttm_place *place,
+   size_t size);
 int nouveau_mem_vram(struct ttm_resource *, bool contig, u8 page);
 int nouveau_mem_host(struct ttm_resource *, struct ttm_tt *);
 void nouveau_mem_fini(struct nouveau_mem *);
diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c 
b/drivers/gpu/drm/nouveau/nouveau_ttm.c
index 85f1f5a0fe5d..9602c30928f2 100644
--- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
@@ -42,6 +42,24 @@ nouveau_manager_del(struct ttm_resource_manager *man,
nouveau_mem_del(man, reg);
 }
 
+static bool
+nouveau_manager_intersects(struct ttm_resource_manager *man,
+  struct ttm_resource *res,
+  const struct ttm_place *place,
+  size_t size)
+{
+   return nouveau_mem_intersects(res, place, size);
+}
+
+static bool
+nouveau_manager_compatible(struct ttm_resource_manager *man,
+  struct ttm_resource *res,
+  const struct ttm_place *place,
+  size_t size)
+{
+   return nouveau_mem_compatible(res, place, size);
+}
+
 static int
 nouveau_vram_manager_new(struct ttm_resource_manager *man,
 struct ttm_buffer_object *bo,
@@ -73,6 +91,8 @@ nouveau_vram_manager_new(struct ttm_resource_manager *man,
 const struct ttm_resource_manager_func nouveau_vram_manager = {
.alloc = nouveau_vram_manager_new,
.free = nouveau_manager_del,
+   .intersects = nouveau_manager_intersects,
+   .compatible = nouveau_manager_compatible,
 };
 
 static int
@@ -97,6 +117,8 @@ nouveau_gart_manager_new(struct ttm_resource_manager *man,
 const struct ttm_resource_manager_func nouveau_gart_manager = {
.alloc = nouveau_gart_manager_new,
.free = nouveau_manager_del,
+   .intersects = nouveau_manager_intersects,
+   .compatible = nouveau_manager_compatible,
 };
 
 static int
@@ -130,6 +152,8 @@ nv04_gart_manager_new(struct ttm_resource_manager *man,
 const struct ttm_resource_manager_func nv04_gart_manager = {
.alloc = nv04_gart_manager_new,
.free = nouveau_manager_del,
+   .intersects = nouveau_manager_intersects,
+   .compatible = nouveau_manager_compatible,
 };
 
 static int
-- 
2.25.1



[PATCH v6 3/6] drm/amdgpu: Implement intersect/compatible functions

2022-08-12 Thread Arunpravin Paneer Selvam
Implemented a new intersect and compatible callback function
fetching start offset from backend drm buddy allocator.

Signed-off-by: Christian König 
Signed-off-by: Arunpravin Paneer Selvam 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c  | 38 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 68 
 2 files changed, 106 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
index 8c6b2284cf56..1f3302aebeff 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
@@ -204,6 +204,42 @@ void amdgpu_gtt_mgr_recover(struct amdgpu_gtt_mgr *mgr)
amdgpu_gart_invalidate_tlb(adev);
 }
 
+/**
+ * amdgpu_gtt_mgr_intersects - test for intersection
+ *
+ * @man: Our manager object
+ * @res: The resource to test
+ * @place: The place for the new allocation
+ * @size: The size of the new allocation
+ *
+ * Simplified intersection test, only interesting if we need GART or not.
+ */
+static bool amdgpu_gtt_mgr_intersects(struct ttm_resource_manager *man,
+ struct ttm_resource *res,
+ const struct ttm_place *place,
+ size_t size)
+{
+   return !place->lpfn || amdgpu_gtt_mgr_has_gart_addr(res);
+}
+
+/**
+ * amdgpu_gtt_mgr_compatible - test for compatibility
+ *
+ * @man: Our manager object
+ * @res: The resource to test
+ * @place: The place for the new allocation
+ * @size: The size of the new allocation
+ *
+ * Simplified compatibility test.
+ */
+static bool amdgpu_gtt_mgr_compatible(struct ttm_resource_manager *man,
+ struct ttm_resource *res,
+ const struct ttm_place *place,
+ size_t size)
+{
+   return !place->lpfn || amdgpu_gtt_mgr_has_gart_addr(res);
+}
+
 /**
  * amdgpu_gtt_mgr_debug - dump VRAM table
  *
@@ -225,6 +261,8 @@ static void amdgpu_gtt_mgr_debug(struct 
ttm_resource_manager *man,
 static const struct ttm_resource_manager_func amdgpu_gtt_mgr_func = {
.alloc = amdgpu_gtt_mgr_new,
.free = amdgpu_gtt_mgr_del,
+   .intersects = amdgpu_gtt_mgr_intersects,
+   .compatible = amdgpu_gtt_mgr_compatible,
.debug = amdgpu_gtt_mgr_debug
 };
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 28ec5f8ac1c1..d1a2619fa89f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -720,6 +720,72 @@ uint64_t amdgpu_vram_mgr_vis_usage(struct amdgpu_vram_mgr 
*mgr)
return atomic64_read(>vis_usage);
 }
 
+/**
+ * amdgpu_vram_mgr_intersects - test each drm buddy block for intersection
+ *
+ * @man: TTM memory type manager
+ * @res: The resource to test
+ * @place: The place to test against
+ * @size: Size of the new allocation
+ *
+ * Test each drm buddy block for intersection for eviction decision.
+ */
+static bool amdgpu_vram_mgr_intersects(struct ttm_resource_manager *man,
+  struct ttm_resource *res,
+  const struct ttm_place *place,
+  size_t size)
+{
+   struct amdgpu_vram_mgr_resource *mgr = to_amdgpu_vram_mgr_resource(res);
+   struct drm_buddy_block *block;
+
+   /* Check each drm buddy block individually */
+   list_for_each_entry(block, >blocks, link) {
+   unsigned long fpfn =
+   amdgpu_vram_mgr_block_start(block) >> PAGE_SHIFT;
+   unsigned long lpfn = fpfn +
+   (amdgpu_vram_mgr_block_size(block) >> PAGE_SHIFT);
+
+   if (place->fpfn < lpfn &&
+   (place->lpfn && place->lpfn > fpfn))
+   return true;
+   }
+
+   return false;
+}
+
+/**
+ * amdgpu_vram_mgr_compatible - test each drm buddy block for compatibility
+ *
+ * @man: TTM memory type manager
+ * @res: The resource to test
+ * @place: The place to test against
+ * @size: Size of the new allocation
+ *
+ * Test each drm buddy block for placement compatibility.
+ */
+static bool amdgpu_vram_mgr_compatible(struct ttm_resource_manager *man,
+  struct ttm_resource *res,
+  const struct ttm_place *place,
+  size_t size)
+{
+   struct amdgpu_vram_mgr_resource *mgr = to_amdgpu_vram_mgr_resource(res);
+   struct drm_buddy_block *block;
+
+   /* Check each drm buddy block individually */
+   list_for_each_entry(block, >blocks, link) {
+   unsigned long fpfn =
+   amdgpu_vram_mgr_block_start(block) >> PAGE_SHIFT;
+   unsigned long lpfn = fpfn +
+   (amdgpu_vram_mgr_block_size(block) >> PAGE_SHIFT);
+
+   if 

[PATCH v6 2/6] drm/ttm: Implement intersect/compatible functions

2022-08-12 Thread Arunpravin Paneer Selvam
Implemented a new intersect and compatible callback functions
to ttm range manager fetching start offset from drm mm range
allocator.

Signed-off-by: Christian König 
Signed-off-by: Arunpravin Paneer Selvam 
---
 drivers/gpu/drm/ttm/ttm_range_manager.c | 33 +
 1 file changed, 33 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_range_manager.c 
b/drivers/gpu/drm/ttm/ttm_range_manager.c
index d91666721dc6..4cfef2b3514d 100644
--- a/drivers/gpu/drm/ttm/ttm_range_manager.c
+++ b/drivers/gpu/drm/ttm/ttm_range_manager.c
@@ -113,6 +113,37 @@ static void ttm_range_man_free(struct ttm_resource_manager 
*man,
kfree(node);
 }
 
+static bool ttm_range_man_intersects(struct ttm_resource_manager *man,
+struct ttm_resource *res,
+const struct ttm_place *place,
+size_t size)
+{
+   struct drm_mm_node *node = _ttm_range_mgr_node(res)->mm_nodes[0];
+   u32 num_pages = PFN_UP(size);
+
+   /* Don't evict BOs outside of the requested placement range */
+   if (place->fpfn >= (node->start + num_pages) ||
+   (place->lpfn && place->lpfn <= node->start))
+   return false;
+
+   return true;
+}
+
+static bool ttm_range_man_compatible(struct ttm_resource_manager *man,
+struct ttm_resource *res,
+const struct ttm_place *place,
+size_t size)
+{
+   struct drm_mm_node *node = _ttm_range_mgr_node(res)->mm_nodes[0];
+   u32 num_pages = PFN_UP(size);
+
+   if (node->start < place->fpfn ||
+   (place->lpfn && (node->start + num_pages) > place->lpfn))
+   return false;
+
+   return true;
+}
+
 static void ttm_range_man_debug(struct ttm_resource_manager *man,
struct drm_printer *printer)
 {
@@ -126,6 +157,8 @@ static void ttm_range_man_debug(struct ttm_resource_manager 
*man,
 static const struct ttm_resource_manager_func ttm_range_manager_func = {
.alloc = ttm_range_man_alloc,
.free = ttm_range_man_free,
+   .intersects = ttm_range_man_intersects,
+   .compatible = ttm_range_man_compatible,
.debug = ttm_range_man_debug
 };
 
-- 
2.25.1



[PATCH v6 1/6] drm/ttm: Add new callbacks to ttm res mgr

2022-08-12 Thread Arunpravin Paneer Selvam
We are adding two new callbacks to ttm resource manager
function to handle intersection and compatibility of
placement and resources.

v2: move the amdgpu and ttm_range_manager changes to
separate patches (Christian)
v3: rename "intersect" to "intersects" (Matthew)
v4: move !place check to the !res if and return false
in ttm_resource_compatible() function (Christian)
v5: move bits of code from patch number 6 to avoid
temporary driver breakup (Christian)

Signed-off-by: Christian König 
Signed-off-by: Arunpravin Paneer Selvam 
---
 drivers/gpu/drm/ttm/ttm_bo.c   |  9 ++--
 drivers/gpu/drm/ttm/ttm_resource.c | 77 +-
 include/drm/ttm/ttm_resource.h | 40 
 3 files changed, 119 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index c1bd006a5525..f066e8124c50 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -518,6 +518,9 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo,
 bool ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
  const struct ttm_place *place)
 {
+   struct ttm_resource *res = bo->resource;
+   struct ttm_device *bdev = bo->bdev;
+
dma_resv_assert_held(bo->base.resv);
if (bo->resource->mem_type == TTM_PL_SYSTEM)
return true;
@@ -525,11 +528,7 @@ bool ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
/* Don't evict this BO if it's outside of the
 * requested placement range
 */
-   if (place->fpfn >= (bo->resource->start + bo->resource->num_pages) ||
-   (place->lpfn && place->lpfn <= bo->resource->start))
-   return false;
-
-   return true;
+   return ttm_resource_intersects(bdev, res, place, bo->base.size);
 }
 EXPORT_SYMBOL(ttm_bo_eviction_valuable);
 
diff --git a/drivers/gpu/drm/ttm/ttm_resource.c 
b/drivers/gpu/drm/ttm/ttm_resource.c
index 20f9adcc3235..0d1f862a582b 100644
--- a/drivers/gpu/drm/ttm/ttm_resource.c
+++ b/drivers/gpu/drm/ttm/ttm_resource.c
@@ -253,10 +253,84 @@ void ttm_resource_free(struct ttm_buffer_object *bo, 
struct ttm_resource **res)
 }
 EXPORT_SYMBOL(ttm_resource_free);
 
+/**
+ * ttm_resource_intersects - test for intersection
+ *
+ * @bdev: TTM device structure
+ * @res: The resource to test
+ * @place: The placement to test
+ * @size: How many bytes the new allocation needs.
+ *
+ * Test if @res intersects with @place and @size. Used for testing if evictions
+ * are valueable or not.
+ *
+ * Returns true if the res placement intersects with @place and @size.
+ */
+bool ttm_resource_intersects(struct ttm_device *bdev,
+struct ttm_resource *res,
+const struct ttm_place *place,
+size_t size)
+{
+   struct ttm_resource_manager *man;
+
+   if (!res)
+   return false;
+
+   if (!place)
+   return true;
+
+   man = ttm_manager_type(bdev, res->mem_type);
+   if (!man->func->intersects) {
+   if (place->fpfn >= (res->start + res->num_pages) ||
+   (place->lpfn && place->lpfn <= res->start))
+   return false;
+
+   return true;
+   }
+
+   return man->func->intersects(man, res, place, size);
+}
+
+/**
+ * ttm_resource_compatible - test for compatibility
+ *
+ * @bdev: TTM device structure
+ * @res: The resource to test
+ * @place: The placement to test
+ * @size: How many bytes the new allocation needs.
+ *
+ * Test if @res compatible with @place and @size.
+ *
+ * Returns true if the res placement compatible with @place and @size.
+ */
+bool ttm_resource_compatible(struct ttm_device *bdev,
+struct ttm_resource *res,
+const struct ttm_place *place,
+size_t size)
+{
+   struct ttm_resource_manager *man;
+
+   if (!res || !place)
+   return false;
+
+   man = ttm_manager_type(bdev, res->mem_type);
+   if (!man->func->compatible) {
+   if (res->start < place->fpfn ||
+   (place->lpfn && (res->start + res->num_pages) > 
place->lpfn))
+   return false;
+
+   return true;
+   }
+
+   return man->func->compatible(man, res, place, size);
+}
+
 static bool ttm_resource_places_compat(struct ttm_resource *res,
   const struct ttm_place *places,
   unsigned num_placement)
 {
+   struct ttm_buffer_object *bo = res->bo;
+   struct ttm_device *bdev = bo->bdev;
unsigned i;
 
if (res->placement & TTM_PL_FLAG_TEMPORARY)
@@ -265,8 +339,7 @@ static bool ttm_resource_places_compat(struct ttm_resource 
*res,
for (i = 0; i < num_placement; i++) {
const struct ttm_place *heap = [i];
 
-   if (res->start < heap->fpfn 

Re: [PATCH] drm/amdgpu: modify mcbp implement for gfx9(v3)

2022-08-12 Thread Christian König




Am 11.08.22 um 05:19 schrieb jiadong@amd.com:

From: "Jiadong.Zhu" 

1. Use unmap_queue package to trigger preemption on gfx9
Add trailing fence to track the preemption done.
2. Modify emit_ce_meta emit_de_meta functions
for the resumed ibs.

Signed-off-by: Jiadong.Zhu 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   1 +
  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 161 ---
  drivers/gpu/drm/amd/amdgpu/soc15d.h  |   2 +
  3 files changed, 143 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 82c178a9033a..ca626f0ad7b1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -59,6 +59,7 @@ enum amdgpu_ring_priority_level {
  #define AMDGPU_FENCE_FLAG_64BIT (1 << 0)
  #define AMDGPU_FENCE_FLAG_INT   (1 << 1)
  #define AMDGPU_FENCE_FLAG_TC_WB_ONLY(1 << 2)



+#define AMDGPU_FENCE_FLAG_EXEC  (1 << 3)


Ok, that here needs much more explanation why you need it and how all 
this is supposed to work?


Regards,
Christian.

  
  #define to_amdgpu_ring(s) container_of((s), struct amdgpu_ring, sched)
  
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c

index 5332899642dc..887021fd56aa 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -751,7 +751,7 @@ static void gfx_v9_0_set_rlc_funcs(struct amdgpu_device 
*adev);
  static int gfx_v9_0_get_cu_info(struct amdgpu_device *adev,
struct amdgpu_cu_info *cu_info);
  static uint64_t gfx_v9_0_get_gpu_clock_counter(struct amdgpu_device *adev);
-static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring *ring);
+static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring *ring, bool resume);
  static u64 gfx_v9_0_ring_get_rptr_compute(struct amdgpu_ring *ring);
  static void gfx_v9_0_query_ras_error_count(struct amdgpu_device *adev,
  void *ras_error_status);
@@ -824,9 +824,10 @@ static void gfx_v9_0_kiq_unmap_queues(struct amdgpu_ring 
*kiq_ring,

PACKET3_UNMAP_QUEUES_DOORBELL_OFFSET0(ring->doorbell_index));
  
  	if (action == PREEMPT_QUEUES_NO_UNMAP) {

-   amdgpu_ring_write(kiq_ring, lower_32_bits(gpu_addr));
-   amdgpu_ring_write(kiq_ring, upper_32_bits(gpu_addr));
-   amdgpu_ring_write(kiq_ring, seq);
+   amdgpu_ring_write(kiq_ring, lower_32_bits(ring->wptr & 
ring->buf_mask));
+   amdgpu_ring_write(kiq_ring, 0);
+   amdgpu_ring_write(kiq_ring, 0);
+
} else {
amdgpu_ring_write(kiq_ring, 0);
amdgpu_ring_write(kiq_ring, 0);
@@ -5446,11 +5447,16 @@ static void gfx_v9_0_ring_emit_ib_gfx(struct 
amdgpu_ring *ring,
  
  	control |= ib->length_dw | (vmid << 24);
  
-	if (amdgpu_sriov_vf(ring->adev) && (ib->flags & AMDGPU_IB_FLAG_PREEMPT)) {

+   if ((amdgpu_sriov_vf(ring->adev) || amdgpu_mcbp) && (ib->flags & 
AMDGPU_IB_FLAG_PREEMPT)) {
control |= INDIRECT_BUFFER_PRE_ENB(1);
  
+		if (flags & AMDGPU_IB_PREEMPTED)

+   control |= INDIRECT_BUFFER_PRE_RESUME(1);
+
if (!(ib->flags & AMDGPU_IB_FLAG_CE) && vmid)
-   gfx_v9_0_ring_emit_de_meta(ring);
+   gfx_v9_0_ring_emit_de_meta(ring,
+(!amdgpu_sriov_vf(ring->adev) && flags & 
AMDGPU_IB_PREEMPTED) ?
+   true : false);
}
  
  	amdgpu_ring_write(ring, header);

@@ -5505,6 +5511,7 @@ static void gfx_v9_0_ring_emit_fence(struct amdgpu_ring 
*ring, u64 addr,
bool write64bit = flags & AMDGPU_FENCE_FLAG_64BIT;
bool int_sel = flags & AMDGPU_FENCE_FLAG_INT;
bool writeback = flags & AMDGPU_FENCE_FLAG_TC_WB_ONLY;
+   bool exec = flags & AMDGPU_FENCE_FLAG_EXEC;
  
  	/* RELEASE_MEM - flush caches, send int */

amdgpu_ring_write(ring, PACKET3(PACKET3_RELEASE_MEM, 6));
@@ -5515,6 +5522,7 @@ static void gfx_v9_0_ring_emit_fence(struct amdgpu_ring 
*ring, u64 addr,
   EOP_TC_WB_ACTION_EN |
   EOP_TC_MD_ACTION_EN)) |
 EVENT_TYPE(CACHE_FLUSH_AND_INV_TS_EVENT) |
+(exec ? EOP_EXEC : 0x0) |
 EVENT_INDEX(5)));
amdgpu_ring_write(ring, DATA_SEL(write64bit ? 2 : 1) | INT_SEL(int_sel 
? 2 : 0));
  
@@ -5620,33 +5628,135 @@ static void gfx_v9_ring_emit_sb(struct amdgpu_ring *ring)

amdgpu_ring_write(ring, 0);
  }
  
-static void gfx_v9_0_ring_emit_ce_meta(struct amdgpu_ring *ring)

+static void gfx_v9_0_ring_emit_ce_meta(struct amdgpu_ring *ring, bool resume)
  {
+   struct amdgpu_device *adev = ring->adev;
struct v9_ce_ib_state ce_payload = {0};
-   

Re: [PATCH 1/2] drm/amdgpu: modify mcbp implement for gfx9(v2)

2022-08-12 Thread Christian König

Hi Jiadong,

yeah, the bug fixes indeed sound like something we would want to have. 
Just drop the part 3 for now.


Regards,
Christian.

Am 11.08.22 um 05:18 schrieb Zhu, Jiadong:

[AMD Official Use Only - General]

Hi Christian,

Thank you for the reply, I will update the patch to fix style issue.

The patch has several changes
1. change the unmap package for mcbp which is not correct in 
gfx_v9_0_kiq_unmap_queues.
2. change the emitted ce/de meta data used for preempted ibs
3. add the function gfx_v9_0_ring_preempt_ib used for debugfs case.

Though the part 3 may be removed in the future.  Those functions of 1 and 2 
could be still used by some projects such as virtualization etc.

Thanks,
Jiadong


-Original Message-
From: Christian König 
Sent: Thursday, August 11, 2022 12:06 AM
To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org
Cc: Huang, Ray ; Liu, Aaron 
Subject: Re: [PATCH 1/2] drm/amdgpu: modify mcbp implement for gfx9(v2)

[CAUTION: External Email]

Hi, Jiadong,

first of all your patches have major style issues. Please use the checkpatch.pl 
script before sending those out.

Apart from that as discussed on our call on Monday MCBP is not something we 
will implement on Linux. So we will probably remove the existing debugfs test 
sooner or later.

Regards,
Christian.

Am 09.08.22 um 11:21 schrieb Zhu, Jiadong:

[AMD Official Use Only - General]

Hi,

This patch is to correct the mcbp package for gfx9, which is the basic function 
used for debugfs.
There are no logic about when to trigger mcbp.
Shall we get this reviewed?

Thanks,
Jiadong

-Original Message-
From: Zhu, Jiadong 
Sent: Tuesday, August 9, 2022 5:15 PM
To: amd-gfx@lists.freedesktop.org
Cc: Liu, Aaron ; Huang, Ray ;
Zhu, Jiadong 
Subject: [PATCH 1/2] drm/amdgpu: modify mcbp implement for gfx9(v2)

From: "Jiadong.Zhu" 

1. Use unmap_queue package to trigger preemption on gfx9
 Add trailing fence to track the preemption done.
2. Modify emit_ce_meta emit_de_meta functions
 for the resumed ibs.
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   1 +
   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 159 ---
   drivers/gpu/drm/amd/amdgpu/soc15d.h  |   2 +
   3 files changed, 141 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 82c178a9033a..ca626f0ad7b1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -59,6 +59,7 @@ enum amdgpu_ring_priority_level {
   #define AMDGPU_FENCE_FLAG_64BIT (1 << 0)
   #define AMDGPU_FENCE_FLAG_INT   (1 << 1)
   #define AMDGPU_FENCE_FLAG_TC_WB_ONLY(1 << 2)
+#define AMDGPU_FENCE_FLAG_EXEC  (1 << 3)

   #define to_amdgpu_ring(s) container_of((s), struct amdgpu_ring,
sched)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 5332899642dc..0b7cb4cf13c4 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -751,7 +751,7 @@ static void gfx_v9_0_set_rlc_funcs(struct amdgpu_device 
*adev);  static int gfx_v9_0_get_cu_info(struct amdgpu_device *adev,
  struct amdgpu_cu_info *cu_info);
   static uint64_t gfx_v9_0_get_gpu_clock_counter(struct amdgpu_device
*adev); -static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring
*ring);
+static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring *ring, bool
+resume);
   static u64 gfx_v9_0_ring_get_rptr_compute(struct amdgpu_ring *ring);  static 
void gfx_v9_0_query_ras_error_count(struct amdgpu_device *adev,
void *ras_error_status); @@
-824,9 +824,10 @@ static void gfx_v9_0_kiq_unmap_queues(struct amdgpu_ring 
*kiq_ring,

PACKET3_UNMAP_QUEUES_DOORBELL_OFFSET0(ring->doorbell_index));

  if (action == PREEMPT_QUEUES_NO_UNMAP) {
-   amdgpu_ring_write(kiq_ring, lower_32_bits(gpu_addr));
-   amdgpu_ring_write(kiq_ring, upper_32_bits(gpu_addr));
-   amdgpu_ring_write(kiq_ring, seq);
+   amdgpu_ring_write(kiq_ring, lower_32_bits(ring->wptr & 
ring->buf_mask));
+   amdgpu_ring_write(kiq_ring, 0);
+   amdgpu_ring_write(kiq_ring, 0);
+
  } else {
  amdgpu_ring_write(kiq_ring, 0);
  amdgpu_ring_write(kiq_ring, 0); @@ -5446,11 +5447,15
@@ static void gfx_v9_0_ring_emit_ib_gfx(struct amdgpu_ring *ring,

  control |= ib->length_dw | (vmid << 24);

-   if (amdgpu_sriov_vf(ring->adev) && (ib->flags & 
AMDGPU_IB_FLAG_PREEMPT)) {
+   if ((amdgpu_sriov_vf(ring->adev) || amdgpu_mcbp) && (ib->flags
+&
+AMDGPU_IB_FLAG_PREEMPT)) {
  control |= INDIRECT_BUFFER_PRE_ENB(1);

+   if (flags & AMDGPU_IB_PREEMPTED)
+   control |= INDIRECT_BUFFER_PRE_RESUME(1);
+
  if (!(ib->flags & AMDGPU_IB_FLAG_CE) && vmid)
-   

Re: [Linaro-mm-sig] [PATCH v2 3/5] dma-buf: Move all dma-bufs to dynamic locking specification

2022-08-12 Thread Christian König




Am 10.08.22 um 20:53 schrieb Dmitry Osipenko:

On 8/10/22 21:25, Christian König wrote:

Am 10.08.22 um 19:49 schrieb Dmitry Osipenko:

On 8/10/22 14:30, Christian König wrote:

Am 25.07.22 um 17:18 schrieb Dmitry Osipenko:

This patch moves the non-dynamic dma-buf users over to the dynamic
locking specification. The strict locking convention prevents deadlock
situation for dma-buf importers and exporters.

Previously the "unlocked" versions of the dma-buf API functions weren't
taking the reservation lock and this patch makes them to take the lock.

Intel and AMD GPU drivers already were mapping imported dma-bufs under
the held lock, hence the "locked" variant of the functions are added
for them and the drivers are updated to use the "locked" versions.

In general "Yes, please", but that won't be that easy.

You not only need to change amdgpu and i915, but all drivers
implementing the map_dma_buf(), unmap_dma_buf() callbacks.

Auditing all that code is a huge bunch of work.

Hm, neither of drivers take the resv lock in map_dma_buf/unmap_dma_buf.
It's easy to audit them all and I did it. So either I'm missing
something or it doesn't take much time to check them all. Am I really
missing something?

Ok, so this is only changing map/unmap now?

It also vmap/vunmap and attach/detach: In the previous patch I added the
_unlocked postfix to the func names and in this patch I made them all to
actually take the lock.



Take your patch "[PATCH v2 2/5] drm/gem: Take reservation lock for 
vmap/vunmap operations" as a blueprint on how to approach it.


E.g. one callback at a time and then document the result in the end.

Regards,
Christian.




In this case please separate this from the documentation change.

I'll factor out the doc in the v3.


I would also drop the _locked postfix from the function name, just
having _unlocked on all functions which are supposed to be called with
the lock held should be sufficient.

Noted for the v3.


Thanks for looking into this,
Christian.

Thank you for the review.





Re: [PATCH] drm/amdgpu: remove useless condition in amdgpu_job_stop_all_jobs_on_sched()

2022-08-12 Thread Christian König

@Alex was that one already picked up?

Am 25.07.22 um 18:40 schrieb Andrey Grodzovsky:

Reviewed-by: Andrey Grodzovsky 

Andrey

On 2022-07-19 06:39, Andrey Strachuk wrote:

Local variable 'rq' is initialized by an address
of field of drm_sched_job, so it does not make
sense to compare 'rq' with NULL.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Andrey Strachuk 
Fixes: 7c6e68c777f1 ("drm/amdgpu: Avoid HW GPU reset for RAS.")
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 4 
  1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c

index 67f66f2f1809..600401f2a98f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -285,10 +285,6 @@ void amdgpu_job_stop_all_jobs_on_sched(struct 
drm_gpu_scheduler *sched)

  /* Signal all jobs not yet scheduled */
  for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= 
DRM_SCHED_PRIORITY_MIN; i--) {

  struct drm_sched_rq *rq = >sched_rq[i];
-
-    if (!rq)
-    continue;
-
  spin_lock(>lock);
  list_for_each_entry(s_entity, >entities, list) {
  while ((s_job = 
to_drm_sched_job(spsc_queue_pop(_entity->job_queue {




Re: [PATCH] drm/radeon: add a force flush to delay work when radeon

2022-08-12 Thread Christian König

Am 11.08.22 um 09:25 schrieb Zhenneng Li:

Although radeon card fence and wait for gpu to finish processing current batch 
rings,
there is still a corner case that radeon lockup work queue may not be fully 
flushed,
and meanwhile the radeon_suspend_kms() function has called 
pci_set_power_state() to
put device in D3hot state.


If I'm not completely mistaken the reset worker uses the suspend/resume 
functionality as well to get the hardware into a working state again.


So if I'm not completely mistaken this here would lead to a deadlock, 
please double check that.


Regards,
Christian.


Per PCI spec rev 4.0 on 5.3.1.4.1 D3hot State.

Configuration and Message requests are the only TLPs accepted by a Function in
the D3hot state. All other received Requests must be handled as Unsupported 
Requests,
and all received Completions may optionally be handled as Unexpected 
Completions.

This issue will happen in following logs:
Unable to handle kernel paging request at virtual address 8800e0008010
CPU 0 kworker/0:3(131): Oops 0
pc = []  ra = []  ps =  Tainted: G  
  W
pc is at si_gpu_check_soft_reset+0x3c/0x240
ra is at si_dma_is_lockup+0x34/0xd0
v0 =   t0 = fff08800e0008010  t1 = 0001
t2 = 8010  t3 = fff7e3c0  t4 = fff7e3c00258
t5 =   t6 = 0001  t7 = fff7ef078000
s0 = fff7e3c016e8  s1 = fff7e3c0  s2 = fff7e3c00018
s3 = fff7e3c0  s4 = fff7fff59d80  s5 = 
s6 = fff7ef07bd98
a0 = fff7e3c0  a1 = fff7e3c016e8  a2 = 0008
a3 = 0001  a4 = 8f5c28f5c28f5c29  a5 = 810f4338
t8 = 0275  t9 = 809b66f8  t10 = ff6769c5d964b800
t11= b886  pv = 811bea20  at = 
gp = 81d89690  sp = aa814126
Disabling lock debugging due to kernel taint
Trace:
[] si_dma_is_lockup+0x34/0xd0
[] radeon_fence_check_lockup+0xd0/0x290
[] process_one_work+0x280/0x550
[] worker_thread+0x70/0x7c0
[] worker_thread+0x130/0x7c0
[] kthread+0x200/0x210
[] worker_thread+0x0/0x7c0
[] kthread+0x14c/0x210
[] ret_from_kernel_thread+0x18/0x20
[] kthread+0x0/0x210
  Code: ad3e0008  43f0074a  ad7e0018  ad9e0020  8c3001e8  40230101
  <8821> 4821ed21
So force lockup work queue flush to fix this problem.

Signed-off-by: Zhenneng Li 
---
  drivers/gpu/drm/radeon/radeon_device.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index 15692cb241fc..e608ca26780a 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1604,6 +1604,9 @@ int radeon_suspend_kms(struct drm_device *dev, bool 
suspend,
if (r) {
/* delay GPU reset to resume */
radeon_fence_driver_force_completion(rdev, i);
+   } else {
+   /* finish executing delayed work */
+   flush_delayed_work(>fence_drv[i].lockup_work);
}
}
  




[PATCH 2/2] drm/amd/pm: Enable GFXOFF feature for SMU IP v13.0.4

2022-08-12 Thread Tim Huang
The driver needs to set EnableGfxImu message parameter to tell the PMFW
to set the flag that enables the GFXOFF feature.

Signed-off-by: Tim Huang 
---
 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
index e56ec06012dd..3651f6f75068 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
@@ -2345,8 +2345,8 @@ int smu_v13_0_set_gfx_power_up_by_imu(struct smu_context 
*smu)
 
index = smu_cmn_to_asic_specific_index(smu, CMN2ASIC_MAPPING_MSG,
   SMU_MSG_EnableGfxImu);
-
-   return smu_cmn_send_msg_without_waiting(smu, index, 0);
+   /* Param 1 to tell PMFW to enable GFXOFF feature */
+   return smu_cmn_send_msg_without_waiting(smu, index, 1);
 }
 
 int smu_v13_0_od_edit_dpm_table(struct smu_context *smu,
-- 
2.25.1



[PATCH 1/2] drm/amdgpu: enable IH Clock Gating for OSS IP v6.0.1

2022-08-12 Thread Tim Huang
Enable AMD_CG_SUPPORT_IH_CG support.

Signed-off-by: Tim Huang 
---
 drivers/gpu/drm/amd/amdgpu/soc21.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c 
b/drivers/gpu/drm/amd/amdgpu/soc21.c
index 6c3440e7ed3f..1ff7fc7bb340 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc21.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
@@ -602,6 +602,7 @@ static int soc21_common_early_init(void *handle)
AMD_CG_SUPPORT_HDP_LS |
AMD_CG_SUPPORT_ATHUB_MGCG |
AMD_CG_SUPPORT_ATHUB_LS |
+   AMD_CG_SUPPORT_IH_CG |
AMD_CG_SUPPORT_VCN_MGCG |
AMD_CG_SUPPORT_JPEG_MGCG;
adev->pg_flags =
-- 
2.25.1



[PATCH 2/2] drm/amdgpu: fix hive reference leak when adding xgmi device

2022-08-12 Thread YiPeng Chai
Only amdgpu_get_xgmi_hive but no amdgpu_put_xgmi_hive
which will leak the hive reference.

Signed-off-by: YiPeng Chai 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 2445255bbf01..4cdc50873621 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2456,12 +2456,14 @@ static int amdgpu_device_ip_init(struct amdgpu_device 
*adev)
if (!hive->reset_domain ||
!amdgpu_reset_get_reset_domain(hive->reset_domain)) 
{
r = -ENOENT;
+   amdgpu_put_xgmi_hive(hive);
goto init_failed;
}
 
/* Drop the early temporary reset domain we created for 
device */
amdgpu_reset_put_reset_domain(adev->reset_domain);
adev->reset_domain = hive->reset_domain;
+   amdgpu_put_xgmi_hive(hive);
}
}
 
-- 
2.25.1



[PATCH 1/2] drm/amdgpu: The call to amdgpu_xgmi_remove_device needs to be earlier than psp_hw_fini

2022-08-12 Thread YiPeng Chai
The amdgpu_xgmi_remove_device function will send unload command
to psp through psp ring to terminate xgmi, but psp ring has been
destroyed in psp_hw_fini.

Signed-off-by: YiPeng Chai 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index c84fdef0ac45..2445255bbf01 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2787,6 +2787,9 @@ static int amdgpu_device_ip_fini_early(struct 
amdgpu_device *adev)
 
amdgpu_amdkfd_suspend(adev, false);
 
+   if (adev->gmc.xgmi.num_physical_nodes > 1)
+   amdgpu_xgmi_remove_device(adev);
+
/* Workaroud for ASICs need to disable SMC first */
amdgpu_device_smu_fini_early(adev);
 
@@ -2830,9 +2833,6 @@ static int amdgpu_device_ip_fini(struct amdgpu_device 
*adev)
if (amdgpu_sriov_vf(adev) && adev->virt.ras_init_done)
amdgpu_virt_release_ras_err_handler_data(adev);
 
-   if (adev->gmc.xgmi.num_physical_nodes > 1)
-   amdgpu_xgmi_remove_device(adev);
-
amdgpu_amdkfd_device_fini_sw(adev);
 
for (i = adev->num_ip_blocks - 1; i >= 0; i--) {
-- 
2.25.1



RE: [PATCH v2] drm/amdkfd: reserve 2 queues for sdma 6.0.1

2022-08-12 Thread Huang, Tim
[AMD Official Use Only - General]

Reviewed-by: Tim Huang 

Best Regards,
Tim Huang

-Original Message-
From: amd-gfx  On Behalf Of Yifan Zhang
Sent: Thursday, August 11, 2022 8:46 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Joshi, Mukul 
; Kuehling, Felix ; Zhang, Yifan 

Subject: [PATCH v2] drm/amdkfd: reserve 2 queues for sdma 6.0.1

There is only one engine in sdma 6.0.1, the total number of reserved queues 
should be 2.

Signed-off-by: Yifan Zhang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index f5853835f03a..357298e69495 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -102,13 +102,18 @@ static void kfd_device_info_set_sdma_info(struct kfd_dev 
*kfd)

switch (sdma_version) {
case IP_VERSION(6, 0, 0):
-   case IP_VERSION(6, 0, 1):
case IP_VERSION(6, 0, 2):
/* Reserve 1 for paging and 1 for gfx */
kfd->device_info.num_reserved_sdma_queues_per_engine = 2;
/* BIT(0)=engine-0 queue-0; BIT(1)=engine-1 queue-0; 
BIT(2)=engine-0 queue-1; ... */
kfd->device_info.reserved_sdma_queues_bitmap = 0xFULL;
break;
+   case IP_VERSION(6, 0, 1):
+   /* Reserve 1 for paging and 1 for gfx */
+   kfd->device_info.num_reserved_sdma_queues_per_engine = 2;
+   /* BIT(0)=engine-0 queue-0; BIT(1)=engine-0 queue-1; ... */
+   kfd->device_info.reserved_sdma_queues_bitmap = 0x3ULL;
+   break;
default:
break;
}
--
2.37.1



[PATCH] drm/amdkfd: potential crash in kfd_create_indirect_link_prop()

2022-08-12 Thread Dan Carpenter
This code has two bugs.  If kfd_topology_device_by_proximity_domain()
failed on the first iteration through the loop then "cpu_link" is
uninitialized and should not be dereferenced.

The second bug is that we cannot dereference a list iterator when it
points to the list head.  In other words, if we exit the
list_for_each_entry() loop exits without hitting a break then "cpu_link"
is not a valid pointer and should not be dereferenced.

Fix both of these problems by setting "cpu_link" to NULL when it is invalid
and non-NULL when it is valid.  That makes it easier to test for
valid vs invalid.

Fixes: 0f28cca87e9a ("drm/amdkfd: Extend KFD device topology to surface 
peer-to-peer links")
Signed-off-by: Dan Carpenter 
---
I reported these in June but never heard back.

 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 25990bec600d..3f0a4a415907 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1392,8 +1392,8 @@ static int kfd_build_p2p_node_entry(struct 
kfd_topology_device *dev,
 
 static int kfd_create_indirect_link_prop(struct kfd_topology_device *kdev, int 
gpu_node)
 {
+   struct kfd_iolink_properties *gpu_link, *tmp_link, *cpu_link;
struct kfd_iolink_properties *props = NULL, *props2 = NULL;
-   struct kfd_iolink_properties *gpu_link, *cpu_link;
struct kfd_topology_device *cpu_dev;
int ret = 0;
int i, num_cpu;
@@ -1416,16 +1416,19 @@ static int kfd_create_indirect_link_prop(struct 
kfd_topology_device *kdev, int g
continue;
 
/* find CPU <-->  CPU links */
+   cpu_link = NULL;
cpu_dev = kfd_topology_device_by_proximity_domain(i);
if (cpu_dev) {
-   list_for_each_entry(cpu_link,
+   list_for_each_entry(tmp_link,
_dev->io_link_props, list) {
-   if (cpu_link->node_to == gpu_link->node_to)
+   if (tmp_link->node_to == gpu_link->node_to) {
+   cpu_link = tmp_link;
break;
+   }
}
}
 
-   if (cpu_link->node_to != gpu_link->node_to)
+   if (!cpu_link)
return -ENOMEM;
 
/* CPU <--> CPU <--> GPU, GPU node*/
-- 
2.35.1