date:20220204

Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler

2022-02-04 Thread Sharma, Shashank


Hey Alex,
Agree, we are moving it above, Christian also had the same feedback.

- Shashank

On 2/4/2022 7:44 PM, Deucher, Alexander wrote:

[Public]


Seems like this functionality should be moved up into the callers.  
Maybe add new IP callbacks (dump_reset_registers()) so that each IP can 
specify what registers are relevant for a reset debugging and then we 
can walk the IP list and call the callback before we call the asic_reset 
callbacks.


Alex


*From:* amd-gfx  on behalf of 
Deucher, Alexander 

*Sent:* Friday, February 4, 2022 1:41 PM
*To:* Sharma, Shashank ; Lazar, Lijo 
; amd-gfx@lists.freedesktop.org 

*Cc:* Somalapuram, Amaranath ; Koenig, 
Christian 

*Subject:* Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler

[Public]


[Public]


In the suspend and hibernate cases, we don't care.  In most cases the 
power rail will be cut once the system enters suspend so it doesn't 
really matter.  That's why we call the asic reset callback directly 
rather than going through the whole recovery process. The device is 
already quiescent at this point we just want to make sure the device is 
in a known state when we come out of suspend (in case suspend overall 
fails).


Alex



*From:* Sharma, Shashank 
*Sent:* Friday, February 4, 2022 12:22 PM
*To:* Lazar, Lijo ; amd-gfx@lists.freedesktop.org 

*Cc:* Deucher, Alexander ; Somalapuram, 
Amaranath ; Koenig, Christian 


*Subject:* Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler


On 2/4/2022 6:20 PM, Lazar, Lijo wrote:

[AMD Official Use Only]

One more thing
    In suspend-reset case, won't this cause to schedule a work item on 
suspend? I don't know if that is a good idea, ideally we would like to clean up 
all work items before going to suspend.

Thanks,
Lijo


Again, this opens scope for discussion. What if there is a GPU error
during suspend-reset, which is very probable case.

- Shashank



-Original Message-
From: Sharma, Shashank 
Sent: Friday, February 4, 2022 10:47 PM
To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Somalapuram, Amaranath 
; Koenig, Christian 
Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler



On 2/4/2022 6:11 PM, Lazar, Lijo wrote:

BTW, since this is already providing a set of values it would be useful to 
provide one more field as the reset reason - RAS error recovery, GPU hung 
recovery or something else.


Adding this additional parameter instead of blocking something in kernel, seems 
like a better idea. The app can filter out and read what it is interested into.

- Shashank

[PATCH 13/13] drm/amd/display: Basic support with device ID

2022-02-04 Thread Jasdeep Dhillon

From: Oliver Logush 

[why]
To get the the cyan_skillfish check working

Reviewed-by: Charlene Liu 
Reviewed-by: Charlene Liu 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Oliver Logush 
---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 24 +--
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |  2 +-
 .../gpu/drm/amd/display/dc/core/dc_resource.c |  2 +-
 .../gpu/drm/amd/display/include/dal_asic_id.h |  3 ++-
 4 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 8f53c9f6b267..f5941e59e5ad 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1014,6 +1014,14 @@ static  void amdgpu_dm_audio_eld_notify(struct 
amdgpu_device *adev, int pin)
}
 }
 
+bool is_skillfish_series(struct amdgpu_device *adev)
+{
+   if (adev->asic_type == CHIP_CYAN_SKILLFISH || adev->pdev->revision == 
0x143F) {
+   return true;
+   }
+   return false;
+}
+
 static int dm_dmub_hw_init(struct amdgpu_device *adev)
 {
const struct dmcub_firmware_header_v1_0 *hdr;
@@ -1049,7 +1057,7 @@ static int dm_dmub_hw_init(struct amdgpu_device *adev)
return -EINVAL;
}
 
-   if (!has_hw_support) {
+   if (is_skillfish_series(adev)) {
DRM_INFO("DMUB unsupported on ASIC\n");
return 0;
}
@@ -1471,6 +1479,10 @@ static int amdgpu_dm_init(struct amdgpu_device *adev)
default:
break;
}
+   if (is_skillfish_series(adev)) {
+   init_data.flags.disable_dmcu = true;
+   break;
+   }
break;
}
 
@@ -1777,7 +1789,6 @@ static int load_dmcu_fw(struct amdgpu_device *adev)
case CHIP_VEGA10:
case CHIP_VEGA12:
case CHIP_VEGA20:
-   return 0;
case CHIP_NAVI12:
fw_name_dmcu = FIRMWARE_NAVI12_DMCU;
break;
@@ -1805,6 +1816,9 @@ static int load_dmcu_fw(struct amdgpu_device *adev)
default:
break;
}
+   if (is_skillfish_series(adev)) {
+   return 0;
+   }
DRM_ERROR("Unsupported ASIC type: 0x%X\n", adev->asic_type);
return -EINVAL;
}
@@ -4515,6 +4529,12 @@ static int dm_early_init(void *handle)
adev->mode_info.num_dig = 6;
break;
default:
+   if (is_skillfish_series(adev)) {
+   adev->mode_info.num_crtc = 2;
+   adev->mode_info.num_hpd = 2;
+   adev->mode_info.num_dig = 2;
+   break;
+   }
 #if defined(CONFIG_DRM_AMD_DC_DCN)
switch (adev->ip_versions[DCE_HWIP][0]) {
case IP_VERSION(2, 0, 2):
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
index e35977fda5c1..13875d669acd 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
@@ -82,7 +82,7 @@ struct common_irq_params {
enum dc_irq_source irq_src;
atomic64_t previous_timestamp;
 };
-
+bool is_skillfish_series(struct amdgpu_device *adev);
 /**
  * struct dm_compressor_info - Buffer info used by frame buffer compression
  * @cpu_addr: MMIO cpu addr
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
index b36bae4b5bc9..318d381e2910 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
@@ -135,7 +135,7 @@ enum dce_version resource_parse_asic_id(struct hw_asic_id 
asic_id)
 
case FAMILY_NV:
dc_version = DCN_VERSION_2_0;
-   if (asic_id.chip_id == DEVICE_ID_NV_13FE) {
+   if (asic_id.chip_id == DEVICE_ID_NV_NAVI10_LITE_P_13FE || 
asic_id.chip_id == DEVICE_ID_NV_NAVI10_LITE_P_143F) {
dc_version = DCN_VERSION_2_01;
break;
}
diff --git a/drivers/gpu/drm/amd/display/include/dal_asic_id.h 
b/drivers/gpu/drm/amd/display/include/dal_asic_id.h
index e4a2dfacab4c..37ec6343dbd6 100644
--- a/drivers/gpu/drm/amd/display/include/dal_asic_id.h
+++ b/drivers/gpu/drm/amd/display/include/dal_asic_id.h
@@ -211,7 +211,8 @@ enum {
 #ifndef ASICREV_IS_GREEN_SARDINE
 #define ASICREV_IS_GREEN_SARDINE(eChipRev) ((eChipRev >= GREEN_SARDINE_A0) && 
(eChipRev < 0xFF))
 #endif
-#define DEVICE_ID_NV_13FE 0x13FE  // CYAN_SKILLFISH
+#define DEVICE_ID_NV_NAVI10_LITE_P_13FE  0x13FE  // CYAN_SKILLFISH
+#define DEVICE_ID_NV_NAVI10_LITE_P_143F0x143F 
 #define FAMILY_VGH 144
 #define DEVICE_ID_VGH_163F 0x163F
 #define VANGOGH_A0 0x01
--

[PATCH 12/13] drm/amd/display: handle null link encoder

2022-02-04 Thread Jasdeep Dhillon

From: Martin Tsai 

[Why]
The link encoder mapping could return a null one and causes system crash.

[How]
Let the mapping can get an available link encoder without endpoint
identification check.

Reviewed-by: Wenjing Liu 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Martin Tsai 
---
 drivers/gpu/drm/amd/display/dc/core/dc_link.c | 25 +++
 .../drm/amd/display/dc/dcn20/dcn20_resource.c | 11 +---
 2 files changed, 5 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
index ec4b300ec067..b1718600fa02 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
@@ -3530,11 +3530,7 @@ enum dc_status dc_link_allocate_mst_payload(struct 
pipe_ctx *pipe_ctx)
const struct link_hwss *link_hwss = get_link_hwss(link, 
_ctx->link_res);
DC_LOGGER_INIT(link->ctx->logger);
 
-   /* Link encoder may have been dynamically assigned to non-physical 
display endpoint. */
-   if (link->ep_type == DISPLAY_ENDPOINT_PHY)
-   link_encoder = link->link_enc;
-   else if (link->dc->res_pool->funcs->link_encs_assign)
-   link_encoder = 
link_enc_cfg_get_link_enc_used_by_stream(pipe_ctx->stream->ctx->dc, stream);
+   link_encoder = link_enc_cfg_get_link_enc(link);
ASSERT(link_encoder);
 
/* enable_link_dp_mst already check link->enabled_stream_count
@@ -3823,11 +3819,7 @@ static enum dc_status deallocate_mst_payload(struct 
pipe_ctx *pipe_ctx)
const struct dc_link_settings empty_link_settings = {0};
DC_LOGGER_INIT(link->ctx->logger);
 
-   /* Link encoder may have been dynamically assigned to non-physical 
display endpoint. */
-   if (link->ep_type == DISPLAY_ENDPOINT_PHY)
-   link_encoder = link->link_enc;
-   else if (link->dc->res_pool->funcs->link_encs_assign)
-   link_encoder = 
link_enc_cfg_get_link_enc_used_by_stream(pipe_ctx->stream->ctx->dc, stream);
+   link_encoder = link_enc_cfg_get_link_enc(link);
ASSERT(link_encoder);
 
/* deallocate_mst_payload is called before disable link. When mode or
@@ -3944,13 +3936,7 @@ static void update_psp_stream_config(struct pipe_ctx 
*pipe_ctx, bool dpms_off)
if (cp_psp == NULL || cp_psp->funcs.update_stream_config == NULL)
return;
 
-   if (pipe_ctx->stream->link->ep_type == DISPLAY_ENDPOINT_PHY)
-   link_enc = pipe_ctx->stream->link->link_enc;
-   else if (pipe_ctx->stream->link->ep_type == DISPLAY_ENDPOINT_USB4_DPIA 
&&
-   
pipe_ctx->stream->link->dc->res_pool->funcs->link_encs_assign)
-   link_enc = link_enc_cfg_get_link_enc_used_by_stream(
-   pipe_ctx->stream->ctx->dc,
-   pipe_ctx->stream);
+   link_enc = link_enc_cfg_get_link_enc(pipe_ctx->stream->link);
ASSERT(link_enc);
if (link_enc == NULL)
return;
@@ -4100,10 +4086,7 @@ void core_link_enable_stream(
dc_is_virtual_signal(pipe_ctx->stream->signal))
return;
 
-   if (dc->res_pool->funcs->link_encs_assign && stream->link->ep_type != 
DISPLAY_ENDPOINT_PHY)
-   link_enc = link_enc_cfg_get_link_enc_used_by_stream(dc, stream);
-   else
-   link_enc = stream->link->link_enc;
+   link_enc = link_enc_cfg_get_link_enc(link);
ASSERT(link_enc);
 
if (!dc_is_virtual_signal(pipe_ctx->stream->signal)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
index fcf388b509db..b55868a0e0df 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
@@ -1605,16 +1605,7 @@ static void get_pixel_clock_parameters(
 
pixel_clk_params->requested_pix_clk_100hz = 
stream->timing.pix_clk_100hz;
 
-   /* Links supporting dynamically assigned link encoder will be assigned 
next
-* available encoder if one not already assigned.
-*/
-   if (link->is_dig_mapping_flexible &&
-   link->dc->res_pool->funcs->link_encs_assign) {
-   link_enc = 
link_enc_cfg_get_link_enc_used_by_stream(stream->ctx->dc, stream);
-   if (link_enc == NULL)
-   link_enc = 
link_enc_cfg_get_next_avail_link_enc(stream->ctx->dc);
-   } else
-   link_enc = stream->link->link_enc;
+   link_enc = link_enc_cfg_get_link_enc(link);
ASSERT(link_enc);
 
if (link_enc)
-- 
2.25.1

[PATCH 07/13] drm/amd/display: change fastboot timing validation

2022-02-04 Thread Jasdeep Dhillon

From: Paul Hsieh 

[Why]
VBIOS light up eDP with 6bpc but driver use 8bpc without
disable valid stream then re-enable valid stream. Some
panels can't runtime change color depth.

[How]
Change fastboot timing validation function. Not only check
LANE_COUNT, LINK_RATE...etc

Reviewed-by: Anthony Koo 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Paul Hsieh 
---
 drivers/gpu/drm/amd/display/dc/core/dc.c| 2 +-
 drivers/gpu/drm/amd/display/dc/core/dc_resource.c   | 2 +-
 drivers/gpu/drm/amd/display/dc/dc.h | 2 +-
 drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c | 3 ++-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index 1d9404ff29ed..997eb7e2d2b3 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -1467,7 +1467,7 @@ static bool context_changed(
return false;
 }
 
-bool dc_validate_seamless_boot_timing(const struct dc *dc,
+bool dc_validate_boot_timing(const struct dc *dc,
const struct dc_sink *sink,
struct dc_crtc_timing *crtc_timing)
 {
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
index 9df66501a453..b36bae4b5bc9 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
@@ -2168,7 +2168,7 @@ static void mark_seamless_boot_stream(
 
if (dc->config.allow_seamless_boot_optimization &&
!dcb->funcs->is_accelerated_mode(dcb)) {
-   if (dc_validate_seamless_boot_timing(dc, stream->sink, 
>timing))
+   if (dc_validate_boot_timing(dc, stream->sink, >timing))
stream->apply_seamless_boot_optimization = true;
}
 }
diff --git a/drivers/gpu/drm/amd/display/dc/dc.h 
b/drivers/gpu/drm/amd/display/dc/dc.h
index 69d264dd69a7..8248d4b75066 100644
--- a/drivers/gpu/drm/amd/display/dc/dc.h
+++ b/drivers/gpu/drm/amd/display/dc/dc.h
@@ -1125,7 +1125,7 @@ struct dc_validation_set {
uint8_t plane_count;
 };
 
-bool dc_validate_seamless_boot_timing(const struct dc *dc,
+bool dc_validate_boot_timing(const struct dc *dc,
const struct dc_sink *sink,
struct dc_crtc_timing *crtc_timing);
 
diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c 
b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
index 8c32b9cb3b49..52b22a944f94 100644
--- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
+++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
@@ -1761,7 +1761,8 @@ void dce110_enable_accelerated_mode(struct dc *dc, struct 
dc_state *context)
edp_link->link_status.link_active) {
struct dc_stream_state *edp_stream = 
edp_streams[0];
 
-   can_apply_edp_fast_boot = 
!is_edp_ilr_optimization_required(edp_stream->link, _stream->timing);
+   can_apply_edp_fast_boot = 
dc_validate_boot_timing(dc,
+   edp_stream->sink, _stream->timing);
edp_stream->apply_edp_fast_boot_optimization = 
can_apply_edp_fast_boot;
if (can_apply_edp_fast_boot)
DC_LOG_EVENT_LINK_TRAINING("eDP fast 
boot disabled to optimize link rate\n");
-- 
2.25.1

[PATCH 10/13] drm/amd/display: [FW Promotion] Release 0.0.103.0

2022-02-04 Thread Jasdeep Dhillon

From: Anthony Koo 

Reviewed-by: Aric Cyr 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Anthony Koo 
---
 drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h 
b/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h
index a01814631911..c5cbbb0470ee 100644
--- a/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h
+++ b/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h
@@ -47,10 +47,10 @@
 
 /* Firmware versioning. */
 #ifdef DMUB_EXPOSE_VERSION
-#define DMUB_FW_VERSION_GIT_HASH 0xab0ae3c8
+#define DMUB_FW_VERSION_GIT_HASH 0x5189adbf
 #define DMUB_FW_VERSION_MAJOR 0
 #define DMUB_FW_VERSION_MINOR 0
-#define DMUB_FW_VERSION_REVISION 102
+#define DMUB_FW_VERSION_REVISION 103
 #define DMUB_FW_VERSION_TEST 0
 #define DMUB_FW_VERSION_VBIOS 0
 #define DMUB_FW_VERSION_HOTFIX 0
-- 
2.25.1

[PATCH 11/13] drm/amd/display: 3.2.172

2022-02-04 Thread Jasdeep Dhillon

From: Aric Cyr 

This version brings along the following fixes:
-fix for build failure uninitalized error
-Bug fix for DP2 using uncertified cable
-limit unbounded request to 5k
-fix DP LT sequence on EQ fail
-Bug fixes for S3/S4

Acked-by: Jasdeep Dhillon 
Signed-off-by: Aric Cyr 
---
 drivers/gpu/drm/amd/display/dc/dc.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dc.h 
b/drivers/gpu/drm/amd/display/dc/dc.h
index 8248d4b75066..89ca672866f6 100644
--- a/drivers/gpu/drm/amd/display/dc/dc.h
+++ b/drivers/gpu/drm/amd/display/dc/dc.h
@@ -47,7 +47,7 @@ struct aux_payload;
 struct set_config_cmd_payload;
 struct dmub_notification;
 
-#define DC_VER "3.2.171"
+#define DC_VER "3.2.172"
 
 #define MAX_SURFACES 3
 #define MAX_PLANES 6
-- 
2.25.1

[PATCH 09/13] drm/amd/display: Fix DP LT sequence on EQ fail

2022-02-04 Thread Jasdeep Dhillon

From: Ilya 

[Why]
The number of lanes wasn't being reset to maximum when reducing link
rate due to an EQ failure. This could result in having fewer lanes in
the verified link capabilities, a lower maximum link bandwidth, and
fewer modes being supported.

[How]
Reset the number of lanes to max when dropping link rate due to EQ
failure during link training.

Reviewed-by: Aric Cyr 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Ilya 
---
 drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
index d0cb40df60a4..cd9c31b5e55d 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
@@ -3504,6 +3504,7 @@ static bool decide_fallback_link_setting(
current_link_setting->link_rate =
reduce_link_rate(
current_link_setting->link_rate);
+   current_link_setting->lane_count = 
initial_link_settings.lane_count;
} else {
return false;
}
@@ -3516,6 +3517,7 @@ static bool decide_fallback_link_setting(
current_link_setting->link_rate =
reduce_link_rate(
current_link_setting->link_rate);
+   current_link_setting->lane_count = 
initial_link_settings.lane_count;
} else {
return false;
}
-- 
2.25.1

[PATCH 06/13] drm/amd/display: fix yellow carp wm clamping

2022-02-04 Thread Jasdeep Dhillon

From: Dmytro Laktyushkin 

Fix clamping to match register field size

Reviewed-by: Charlene Liu 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Dmytro Laktyushkin 
---
 .../drm/amd/display/dc/dcn31/dcn31_hubbub.c   | 61 ++-
 1 file changed, 32 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hubbub.c 
b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hubbub.c
index 90c73a1cb986..5e3bcaf12cac 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hubbub.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hubbub.c
@@ -138,8 +138,11 @@ static uint32_t convert_and_clamp(
ret_val = wm_ns * refclk_mhz;
ret_val /= 1000;
 
-   if (ret_val > clamp_value)
+   if (ret_val > clamp_value) {
+   /* clamping WMs is abnormal, unexpected and may lead to 
underflow*/
+   ASSERT(0);
ret_val = clamp_value;
+   }
 
return ret_val;
 }
@@ -159,7 +162,7 @@ static bool hubbub31_program_urgent_watermarks(
if (safe_to_lower || watermarks->a.urgent_ns > 
hubbub2->watermarks.a.urgent_ns) {
hubbub2->watermarks.a.urgent_ns = watermarks->a.urgent_ns;
prog_wm_value = convert_and_clamp(watermarks->a.urgent_ns,
-   refclk_mhz, 0x1f);
+   refclk_mhz, 0x3fff);
REG_SET(DCHUBBUB_ARB_DATA_URGENCY_WATERMARK_A, 0,
DCHUBBUB_ARB_DATA_URGENCY_WATERMARK_A, 
prog_wm_value);
 
@@ -193,7 +196,7 @@ static bool hubbub31_program_urgent_watermarks(
if (safe_to_lower || watermarks->a.urgent_latency_ns > 
hubbub2->watermarks.a.urgent_latency_ns) {
hubbub2->watermarks.a.urgent_latency_ns = 
watermarks->a.urgent_latency_ns;
prog_wm_value = 
convert_and_clamp(watermarks->a.urgent_latency_ns,
-   refclk_mhz, 0x1f);
+   refclk_mhz, 0x3fff);
REG_SET(DCHUBBUB_ARB_REFCYC_PER_TRIP_TO_MEMORY_A, 0,
DCHUBBUB_ARB_REFCYC_PER_TRIP_TO_MEMORY_A, 
prog_wm_value);
} else if (watermarks->a.urgent_latency_ns < 
hubbub2->watermarks.a.urgent_latency_ns)
@@ -203,7 +206,7 @@ static bool hubbub31_program_urgent_watermarks(
if (safe_to_lower || watermarks->b.urgent_ns > 
hubbub2->watermarks.b.urgent_ns) {
hubbub2->watermarks.b.urgent_ns = watermarks->b.urgent_ns;
prog_wm_value = convert_and_clamp(watermarks->b.urgent_ns,
-   refclk_mhz, 0x1f);
+   refclk_mhz, 0x3fff);
REG_SET(DCHUBBUB_ARB_DATA_URGENCY_WATERMARK_B, 0,
DCHUBBUB_ARB_DATA_URGENCY_WATERMARK_B, 
prog_wm_value);
 
@@ -237,7 +240,7 @@ static bool hubbub31_program_urgent_watermarks(
if (safe_to_lower || watermarks->b.urgent_latency_ns > 
hubbub2->watermarks.b.urgent_latency_ns) {
hubbub2->watermarks.b.urgent_latency_ns = 
watermarks->b.urgent_latency_ns;
prog_wm_value = 
convert_and_clamp(watermarks->b.urgent_latency_ns,
-   refclk_mhz, 0x1f);
+   refclk_mhz, 0x3fff);
REG_SET(DCHUBBUB_ARB_REFCYC_PER_TRIP_TO_MEMORY_B, 0,
DCHUBBUB_ARB_REFCYC_PER_TRIP_TO_MEMORY_B, 
prog_wm_value);
} else if (watermarks->b.urgent_latency_ns < 
hubbub2->watermarks.b.urgent_latency_ns)
@@ -247,7 +250,7 @@ static bool hubbub31_program_urgent_watermarks(
if (safe_to_lower || watermarks->c.urgent_ns > 
hubbub2->watermarks.c.urgent_ns) {
hubbub2->watermarks.c.urgent_ns = watermarks->c.urgent_ns;
prog_wm_value = convert_and_clamp(watermarks->c.urgent_ns,
-   refclk_mhz, 0x1f);
+   refclk_mhz, 0x3fff);
REG_SET(DCHUBBUB_ARB_DATA_URGENCY_WATERMARK_C, 0,
DCHUBBUB_ARB_DATA_URGENCY_WATERMARK_C, 
prog_wm_value);
 
@@ -281,7 +284,7 @@ static bool hubbub31_program_urgent_watermarks(
if (safe_to_lower || watermarks->c.urgent_latency_ns > 
hubbub2->watermarks.c.urgent_latency_ns) {
hubbub2->watermarks.c.urgent_latency_ns = 
watermarks->c.urgent_latency_ns;
prog_wm_value = 
convert_and_clamp(watermarks->c.urgent_latency_ns,
-   refclk_mhz, 0x1f);
+   refclk_mhz, 0x3fff);
REG_SET(DCHUBBUB_ARB_REFCYC_PER_TRIP_TO_MEMORY_C, 0,
DCHUBBUB_ARB_REFCYC_PER_TRIP_TO_MEMORY_C, 
prog_wm_value);
} else if (watermarks->c.urgent_latency_ns < 
hubbub2->watermarks.c.urgent_latency_ns)
@@ -291,7 +294,7 @@ static bool hubbub31_program_urgent_watermarks(
if (safe_to_lower || watermarks->d.urgent_ns > 
hubbub2->watermarks.d.urgent_ns) {

[PATCH 03/13] drm/amd/display: limit unbounded requesting to 5k

2022-02-04 Thread Jasdeep Dhillon

From: Dmytro Laktyushkin 

Unbounded requesting is unsupported on pipe split modes
and this change prevents us running into such a situation
with wide modes.

Reviewed-by: Charlene Liu 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Dmytro Laktyushkin 
---
 drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c
index 7f9ceda4229b..007a7dc4f5be 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c
@@ -1836,7 +1836,8 @@ static int dcn31_populate_dml_pipes_from_context(
if (is_dual_plane(pipe->plane_state->format)
&& pipe->plane_state->src_rect.width <= 1920 && 
pipe->plane_state->src_rect.height <= 1080) {
dc->config.enable_4to1MPC = true;
-   } else if (!is_dual_plane(pipe->plane_state->format)) {
+   } else if (!is_dual_plane(pipe->plane_state->format) && 
pipe->plane_state->src_rect.width <= 5120) {
+   /* Limit to 5k max to avoid forced pipe split when 
there is not enough detile for swath */
context->bw_ctx.dml.ip.det_buffer_size_kbytes = 192;
pipes[0].pipe.src.unbounded_req_mode = true;
}
-- 
2.25.1

[PATCH 08/13] drm/amd/display: keep eDP Vdd on when eDP stream is already enabled

2022-02-04 Thread Jasdeep Dhillon

From: Zhan Liu 

[Why]
Even if can_apply_edp_fast_boot is set to 1 at boot, this flag will
be cleared to 0 at S3 resume.

[How]
Keep eDP Vdd on when eDP stream is already enabled.

Reviewed-by: Charlene Liu 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Zhan Liu 
---
 .../display/dc/dce110/dce110_hw_sequencer.c   | 24 +--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c 
b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
index 52b22a944f94..ace04e2ed34e 100644
--- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
+++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
@@ -1770,9 +1770,29 @@ void dce110_enable_accelerated_mode(struct dc *dc, 
struct dc_state *context)
break;
}
}
-   // We are trying to enable eDP, don't power down VDD
-   if (can_apply_edp_fast_boot)
+
+   /*
+* TO-DO: So far the code logic below only addresses single eDP 
case.
+* For dual eDP case, there are a few things that need to be
+* implemented first:
+*
+* 1. Change the fastboot logic above, so eDP link[0 or 1]'s
+* stream[0 or 1] will all be checked.
+*
+* 2. Change keep_edp_vdd_on to an array, and maintain 
keep_edp_vdd_on
+* for each eDP.
+*
+* Once above 2 things are completed, we can then change the 
logic below
+* correspondingly, so dual eDP case will be fully covered.
+*/
+
+   // We are trying to enable eDP, don't power down VDD if eDP 
stream is existing
+   if ((edp_stream_num == 1 && edp_streams[0] != NULL) || 
can_apply_edp_fast_boot) {
keep_edp_vdd_on = true;
+   DC_LOG_EVENT_LINK_TRAINING("Keep eDP Vdd on\n");
+   } else {
+   DC_LOG_EVENT_LINK_TRAINING("No eDP stream enabled, turn 
eDP Vdd off\n");
+   }
}
 
// Check seamless boot support
-- 
2.25.1

[PATCH 05/13] dc: do blocked MST topology discovery at resume from S3/S4

2022-02-04 Thread Jasdeep Dhillon

From: "Guo, Bing" 

Why:
When resume from sleep or hiberation, blocked MST Topology discovery might
need to be used.

How:
Added "DETECT_REASON_RESUMEFROMS3S4" to enum dc_detect_reason; use it to
require blocked MST Topology discovery.

Reviewed-by: Wenjing Liu 
Acked-by: Jasdeep Dhillon 
Signed-off-by: "Bing Guo" 
---
 drivers/gpu/drm/amd/display/dc/core/dc_link.c | 2 +-
 drivers/gpu/drm/amd/display/dc/dc_link.h  | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
index cb6c91cd6e83..ec4b300ec067 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
@@ -831,7 +831,7 @@ static bool discover_dp_mst_topology(struct dc_link *link, 
enum dc_detect_reason
link->type = dc_connection_mst_branch;
dm_helpers_dp_update_branch_info(link->ctx, link);
if (dm_helpers_dp_mst_start_top_mgr(link->ctx,
-   link, reason == DETECT_REASON_BOOT)) {
+   link, (reason == DETECT_REASON_BOOT || reason == 
DETECT_REASON_RESUMEFROMS3S4))) {
link_disconnect_sink(link);
} else {
link->type = dc_connection_sst_branch;
diff --git a/drivers/gpu/drm/amd/display/dc/dc_link.h 
b/drivers/gpu/drm/amd/display/dc/dc_link.h
index e7ead9e25318..9ad3ee4079c3 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_link.h
+++ b/drivers/gpu/drm/amd/display/dc/dc_link.h
@@ -312,6 +312,7 @@ void dc_link_blank_dp_stream(struct dc_link *link, bool 
hw_init);
  */
 enum dc_detect_reason {
DETECT_REASON_BOOT,
+   DETECT_REASON_RESUMEFROMS3S4,
DETECT_REASON_HPD,
DETECT_REASON_HPDRX,
DETECT_REASON_FALLBACK,
-- 
2.25.1

[PATCH 01/13] drm/amd/display: Fix for variable may be used uninitialized error

2022-02-04 Thread Jasdeep Dhillon

From: Eric Bernstein 

[Why]
Build failure due to ‘status’ may be used uninitialized

[How]
Initialize status to LINK_TRAINING_SUCCESS

Reviewed-by: Wenjing Liu 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Eric Bernstein 
---
 drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
index 18fe8f77821a..d0cb40df60a4 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
@@ -3199,7 +3199,7 @@ static bool dp_verify_link_cap(
bool success = false;
bool skip_video_pattern;
enum clock_source_id dp_cs_id = get_clock_source_id(link);
-   enum link_training_result status;
+   enum link_training_result status = LINK_TRAINING_SUCCESS;
union hpd_irq_data irq_data;
struct link_resource link_res;
 
-- 
2.25.1

[PATCH 04/13] drm/amd/display: remove static from optc31_set_drr

2022-02-04 Thread Jasdeep Dhillon

From: Eric Bernstein 

remove static from optc31_set_drr

Reviewed-by: Nevenko Stupar 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Eric Bernstein 
---
 drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c | 2 +-
 drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c 
b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c
index e8562fa11366..8afe2130d7c5 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c
@@ -161,7 +161,7 @@ static bool optc31_immediate_disable_crtc(struct 
timing_generator *optc)
return true;
 }
 
-static void optc31_set_drr(
+void optc31_set_drr(
struct timing_generator *optc,
const struct drr_params *params)
 {
diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h 
b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h
index d8ef2f0d0c95..a37b16040c1d 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h
@@ -256,4 +256,6 @@
 
 void dcn31_timing_generator_init(struct optc *optc1);
 
+void optc31_set_drr(struct timing_generator *optc, const struct drr_params 
*params);
+
 #endif /* __DC_OPTC_DCN31_H__ */
-- 
2.25.1

[PATCH 02/13] drm/amd/display: Fix stream->link_enc unassigned during stream removal

2022-02-04 Thread Jasdeep Dhillon

From: Nicholas Kazlauskas 

[Why]
Found when running igt@kms_atomic.

Userspace attempts to do a TEST_COMMIT when 0 streams which calls
dc_remove_stream_from_ctx. This in turn calls link_enc_unassign
which ends up modifying stream->link = NULL directly, causing the
global link_enc to be removed preventing further link activity
and future link validation from passing.

[How]
We take care of link_enc unassignment at the start of
link_enc_cfg_link_encs_assign so this call is no longer necessary.

Fixes global state from being modified while unlocked.

Reviewed-by: Jimmy Kizito 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Nicholas Kazlauskas 
---
 drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
index e82aa0559bdf..9df66501a453 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
@@ -1965,10 +1965,6 @@ enum dc_status dc_remove_stream_from_ctx(
dc->res_pool,
del_pipe->stream_res.stream_enc,
false);
-   /* Release link encoder from stream in new dc_state. */
-   if (dc->res_pool->funcs->link_enc_unassign)
-   dc->res_pool->funcs->link_enc_unassign(new_ctx, 
del_pipe->stream);
-
if (is_dp_128b_132b_signal(del_pipe)) {
update_hpo_dp_stream_engine_usage(
_ctx->res_ctx, dc->res_pool,
-- 
2.25.1

[PATCH 00/13] DC Patchset, Feb 7 2022 v3

2022-02-04 Thread Jasdeep Dhillon

This DC patchset brings improvements in multiple areas. In summary, we have:
-fix for build failure uninitalized error
-Bug fix for DP2 using uncertified cable
-limit unbounded request to 5k
-fix DP LT sequence on EQ fail
-Bug fixes for S3/S4

Anthony Koo (1):
  drm/amd/display: [FW Promotion] Release 0.0.103.0

Aric Cyr (1):
  drm/amd/display: 3.2.172

Dmytro Laktyushkin (2):
  drm/amd/display: limit unbounded requesting to 5k
  drm/amd/display: fix yellow carp wm clamping

Eric Bernstein (2):
  drm/amd/display: Fix for variable may be used uninitialized error
  drm/amd/display: remove static from optc31_set_drr

Guo, Bing (1):
  dc: do blocked MST topology discovery at resume from S3/S4

Ilya (1):
  drm/amd/display: Fix DP LT sequence on EQ fail

Martin Tsai (1):
  drm/amd/display: handle null link encoder

Nicholas Kazlauskas (1):
  drm/amd/display: Fix stream->link_enc unassigned during stream removal

Oliver Logush (1):
  drm/amd/display: Basic support with device ID

Paul Hsieh (1):
  drm/amd/display: change fastboot timing validation

Zhan Liu (1):
  drm/amd/display: keep eDP Vdd on when eDP stream is already enabled

 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 24 +++-
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |  2 +-
 drivers/gpu/drm/amd/display/dc/core/dc.c  |  2 +-
 drivers/gpu/drm/amd/display/dc/core/dc_link.c | 27 ++--
 .../gpu/drm/amd/display/dc/core/dc_link_dp.c  |  4 +-
 .../gpu/drm/amd/display/dc/core/dc_resource.c |  8 +--
 drivers/gpu/drm/amd/display/dc/dc.h   |  4 +-
 drivers/gpu/drm/amd/display/dc/dc_link.h  |  1 +
 .../display/dc/dce110/dce110_hw_sequencer.c   | 27 +++-
 .../drm/amd/display/dc/dcn20/dcn20_resource.c | 11 +---
 .../drm/amd/display/dc/dcn31/dcn31_hubbub.c   | 61 ++-
 .../gpu/drm/amd/display/dc/dcn31/dcn31_optc.c |  2 +-
 .../gpu/drm/amd/display/dc/dcn31/dcn31_optc.h |  2 +
 .../drm/amd/display/dc/dcn31/dcn31_resource.c |  3 +-
 .../gpu/drm/amd/display/dmub/inc/dmub_cmd.h   |  4 +-
 .../gpu/drm/amd/display/include/dal_asic_id.h |  3 +-
 16 files changed, 103 insertions(+), 82 deletions(-)

-- 
2.25.1

[PATCH 14/14] SWDEV-321758 - dc: Code clean

2022-02-04 Thread Jasdeep Dhillon

From: Oliver Logush 

Signed-off-by: Oliver Logush 
---
 drivers/gpu/drm/amd/display/include/dal_asic_id.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/include/dal_asic_id.h 
b/drivers/gpu/drm/amd/display/include/dal_asic_id.h
index c25bc4d9cd4b..16c5c929e8d5 100644
--- a/drivers/gpu/drm/amd/display/include/dal_asic_id.h
+++ b/drivers/gpu/drm/amd/display/include/dal_asic_id.h
@@ -212,7 +212,7 @@ enum {
 #define ASICREV_IS_GREEN_SARDINE(eChipRev) ((eChipRev >= GREEN_SARDINE_A0) && 
(eChipRev < 0xFF))
 #endif
 #define DEVICE_ID_NV_NAVI10_LITE_P_13FE  0x13FE  // CYAN_SKILLFISH
-#define DEVICE_ID_NV_NAVI10_LITE_P_143F0x143F // ROBIN+
+#define DEVICE_ID_NV_NAVI10_LITE_P_143F0x143F
 #define FAMILY_VGH 144
 #define DEVICE_ID_VGH_163F 0x163F
 #define VANGOGH_A0 0x01
-- 
2.25.1

[PATCH 11/14] drm/amd/display: [FW Promotion] Release 0.0.103.0

2022-02-04 Thread Jasdeep Dhillon

From: Anthony Koo 

Reviewed-by: Aric Cyr 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Anthony Koo 
---
 drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h 
b/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h
index a01814631911..c5cbbb0470ee 100644
--- a/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h
+++ b/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h
@@ -47,10 +47,10 @@
 
 /* Firmware versioning. */
 #ifdef DMUB_EXPOSE_VERSION
-#define DMUB_FW_VERSION_GIT_HASH 0xab0ae3c8
+#define DMUB_FW_VERSION_GIT_HASH 0x5189adbf
 #define DMUB_FW_VERSION_MAJOR 0
 #define DMUB_FW_VERSION_MINOR 0
-#define DMUB_FW_VERSION_REVISION 102
+#define DMUB_FW_VERSION_REVISION 103
 #define DMUB_FW_VERSION_TEST 0
 #define DMUB_FW_VERSION_VBIOS 0
 #define DMUB_FW_VERSION_HOTFIX 0
-- 
2.25.1

[PATCH 13/14] drm/amd/display: handle null link encoder

2022-02-04 Thread Jasdeep Dhillon

From: Martin Tsai 

[Why]
The link encoder mapping could return a null one and causes system crash.

[How]
Let the mapping can get an available link encoder without endpoint
identification check.

Reviewed-by: Wenjing Liu 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Martin Tsai 
---
 drivers/gpu/drm/amd/display/dc/core/dc_link.c | 25 +++
 .../drm/amd/display/dc/dcn20/dcn20_resource.c | 11 +---
 2 files changed, 5 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
index ec4b300ec067..b1718600fa02 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
@@ -3530,11 +3530,7 @@ enum dc_status dc_link_allocate_mst_payload(struct 
pipe_ctx *pipe_ctx)
const struct link_hwss *link_hwss = get_link_hwss(link, 
_ctx->link_res);
DC_LOGGER_INIT(link->ctx->logger);
 
-   /* Link encoder may have been dynamically assigned to non-physical 
display endpoint. */
-   if (link->ep_type == DISPLAY_ENDPOINT_PHY)
-   link_encoder = link->link_enc;
-   else if (link->dc->res_pool->funcs->link_encs_assign)
-   link_encoder = 
link_enc_cfg_get_link_enc_used_by_stream(pipe_ctx->stream->ctx->dc, stream);
+   link_encoder = link_enc_cfg_get_link_enc(link);
ASSERT(link_encoder);
 
/* enable_link_dp_mst already check link->enabled_stream_count
@@ -3823,11 +3819,7 @@ static enum dc_status deallocate_mst_payload(struct 
pipe_ctx *pipe_ctx)
const struct dc_link_settings empty_link_settings = {0};
DC_LOGGER_INIT(link->ctx->logger);
 
-   /* Link encoder may have been dynamically assigned to non-physical 
display endpoint. */
-   if (link->ep_type == DISPLAY_ENDPOINT_PHY)
-   link_encoder = link->link_enc;
-   else if (link->dc->res_pool->funcs->link_encs_assign)
-   link_encoder = 
link_enc_cfg_get_link_enc_used_by_stream(pipe_ctx->stream->ctx->dc, stream);
+   link_encoder = link_enc_cfg_get_link_enc(link);
ASSERT(link_encoder);
 
/* deallocate_mst_payload is called before disable link. When mode or
@@ -3944,13 +3936,7 @@ static void update_psp_stream_config(struct pipe_ctx 
*pipe_ctx, bool dpms_off)
if (cp_psp == NULL || cp_psp->funcs.update_stream_config == NULL)
return;
 
-   if (pipe_ctx->stream->link->ep_type == DISPLAY_ENDPOINT_PHY)
-   link_enc = pipe_ctx->stream->link->link_enc;
-   else if (pipe_ctx->stream->link->ep_type == DISPLAY_ENDPOINT_USB4_DPIA 
&&
-   
pipe_ctx->stream->link->dc->res_pool->funcs->link_encs_assign)
-   link_enc = link_enc_cfg_get_link_enc_used_by_stream(
-   pipe_ctx->stream->ctx->dc,
-   pipe_ctx->stream);
+   link_enc = link_enc_cfg_get_link_enc(pipe_ctx->stream->link);
ASSERT(link_enc);
if (link_enc == NULL)
return;
@@ -4100,10 +4086,7 @@ void core_link_enable_stream(
dc_is_virtual_signal(pipe_ctx->stream->signal))
return;
 
-   if (dc->res_pool->funcs->link_encs_assign && stream->link->ep_type != 
DISPLAY_ENDPOINT_PHY)
-   link_enc = link_enc_cfg_get_link_enc_used_by_stream(dc, stream);
-   else
-   link_enc = stream->link->link_enc;
+   link_enc = link_enc_cfg_get_link_enc(link);
ASSERT(link_enc);
 
if (!dc_is_virtual_signal(pipe_ctx->stream->signal)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
index fcf388b509db..b55868a0e0df 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
@@ -1605,16 +1605,7 @@ static void get_pixel_clock_parameters(
 
pixel_clk_params->requested_pix_clk_100hz = 
stream->timing.pix_clk_100hz;
 
-   /* Links supporting dynamically assigned link encoder will be assigned 
next
-* available encoder if one not already assigned.
-*/
-   if (link->is_dig_mapping_flexible &&
-   link->dc->res_pool->funcs->link_encs_assign) {
-   link_enc = 
link_enc_cfg_get_link_enc_used_by_stream(stream->ctx->dc, stream);
-   if (link_enc == NULL)
-   link_enc = 
link_enc_cfg_get_next_avail_link_enc(stream->ctx->dc);
-   } else
-   link_enc = stream->link->link_enc;
+   link_enc = link_enc_cfg_get_link_enc(link);
ASSERT(link_enc);
 
if (link_enc)
-- 
2.25.1

[PATCH 12/14] drm/amd/display: 3.2.172

2022-02-04 Thread Jasdeep Dhillon

From: Aric Cyr 

This version brings along the following fixes:
-fix for build failure uninitalized error
-Bug fix for DP2 using uncertified cable
-limit unbounded request to 5k
-fix DP LT sequence on EQ fail
-Bug fixes for S3/S4

Acked-by: Jasdeep Dhillon 
Signed-off-by: Aric Cyr 
---
 drivers/gpu/drm/amd/display/dc/dc.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dc.h 
b/drivers/gpu/drm/amd/display/dc/dc.h
index 8248d4b75066..89ca672866f6 100644
--- a/drivers/gpu/drm/amd/display/dc/dc.h
+++ b/drivers/gpu/drm/amd/display/dc/dc.h
@@ -47,7 +47,7 @@ struct aux_payload;
 struct set_config_cmd_payload;
 struct dmub_notification;
 
-#define DC_VER "3.2.171"
+#define DC_VER "3.2.172"
 
 #define MAX_SURFACES 3
 #define MAX_PLANES 6
-- 
2.25.1

[PATCH 09/14] drm/amd/display: keep eDP Vdd on when eDP stream is already enabled

2022-02-04 Thread Jasdeep Dhillon

From: Zhan Liu 

[Why]
Even if can_apply_edp_fast_boot is set to 1 at boot, this flag will
be cleared to 0 at S3 resume.

[How]
Keep eDP Vdd on when eDP stream is already enabled.

Reviewed-by: Charlene Liu 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Zhan Liu 
---
 .../display/dc/dce110/dce110_hw_sequencer.c   | 24 +--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c 
b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
index 52b22a944f94..ace04e2ed34e 100644
--- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
+++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
@@ -1770,9 +1770,29 @@ void dce110_enable_accelerated_mode(struct dc *dc, 
struct dc_state *context)
break;
}
}
-   // We are trying to enable eDP, don't power down VDD
-   if (can_apply_edp_fast_boot)
+
+   /*
+* TO-DO: So far the code logic below only addresses single eDP 
case.
+* For dual eDP case, there are a few things that need to be
+* implemented first:
+*
+* 1. Change the fastboot logic above, so eDP link[0 or 1]'s
+* stream[0 or 1] will all be checked.
+*
+* 2. Change keep_edp_vdd_on to an array, and maintain 
keep_edp_vdd_on
+* for each eDP.
+*
+* Once above 2 things are completed, we can then change the 
logic below
+* correspondingly, so dual eDP case will be fully covered.
+*/
+
+   // We are trying to enable eDP, don't power down VDD if eDP 
stream is existing
+   if ((edp_stream_num == 1 && edp_streams[0] != NULL) || 
can_apply_edp_fast_boot) {
keep_edp_vdd_on = true;
+   DC_LOG_EVENT_LINK_TRAINING("Keep eDP Vdd on\n");
+   } else {
+   DC_LOG_EVENT_LINK_TRAINING("No eDP stream enabled, turn 
eDP Vdd off\n");
+   }
}
 
// Check seamless boot support
-- 
2.25.1

[PATCH 10/14] drm/amd/display: Fix DP LT sequence on EQ fail

2022-02-04 Thread Jasdeep Dhillon

From: Ilya 

[Why]
The number of lanes wasn't being reset to maximum when reducing link
rate due to an EQ failure. This could result in having fewer lanes in
the verified link capabilities, a lower maximum link bandwidth, and
fewer modes being supported.

[How]
Reset the number of lanes to max when dropping link rate due to EQ
failure during link training.

Reviewed-by: Aric Cyr 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Ilya 
---
 drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
index d0cb40df60a4..cd9c31b5e55d 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
@@ -3504,6 +3504,7 @@ static bool decide_fallback_link_setting(
current_link_setting->link_rate =
reduce_link_rate(
current_link_setting->link_rate);
+   current_link_setting->lane_count = 
initial_link_settings.lane_count;
} else {
return false;
}
@@ -3516,6 +3517,7 @@ static bool decide_fallback_link_setting(
current_link_setting->link_rate =
reduce_link_rate(
current_link_setting->link_rate);
+   current_link_setting->lane_count = 
initial_link_settings.lane_count;
} else {
return false;
}
-- 
2.25.1

[PATCH 08/14] drm/amd/display: change fastboot timing validation

2022-02-04 Thread Jasdeep Dhillon

From: Paul Hsieh 

[Why]
VBIOS light up eDP with 6bpc but driver use 8bpc without
disable valid stream then re-enable valid stream. Some
panels can't runtime change color depth.

[How]
Change fastboot timing validation function. Not only check
LANE_COUNT, LINK_RATE...etc

Reviewed-by: Anthony Koo 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Paul Hsieh 
---
 drivers/gpu/drm/amd/display/dc/core/dc.c| 2 +-
 drivers/gpu/drm/amd/display/dc/core/dc_resource.c   | 2 +-
 drivers/gpu/drm/amd/display/dc/dc.h | 2 +-
 drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c | 3 ++-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index 1d9404ff29ed..997eb7e2d2b3 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -1467,7 +1467,7 @@ static bool context_changed(
return false;
 }
 
-bool dc_validate_seamless_boot_timing(const struct dc *dc,
+bool dc_validate_boot_timing(const struct dc *dc,
const struct dc_sink *sink,
struct dc_crtc_timing *crtc_timing)
 {
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
index 8a199d661a66..318d381e2910 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
@@ -2168,7 +2168,7 @@ static void mark_seamless_boot_stream(
 
if (dc->config.allow_seamless_boot_optimization &&
!dcb->funcs->is_accelerated_mode(dcb)) {
-   if (dc_validate_seamless_boot_timing(dc, stream->sink, 
>timing))
+   if (dc_validate_boot_timing(dc, stream->sink, >timing))
stream->apply_seamless_boot_optimization = true;
}
 }
diff --git a/drivers/gpu/drm/amd/display/dc/dc.h 
b/drivers/gpu/drm/amd/display/dc/dc.h
index 69d264dd69a7..8248d4b75066 100644
--- a/drivers/gpu/drm/amd/display/dc/dc.h
+++ b/drivers/gpu/drm/amd/display/dc/dc.h
@@ -1125,7 +1125,7 @@ struct dc_validation_set {
uint8_t plane_count;
 };
 
-bool dc_validate_seamless_boot_timing(const struct dc *dc,
+bool dc_validate_boot_timing(const struct dc *dc,
const struct dc_sink *sink,
struct dc_crtc_timing *crtc_timing);
 
diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c 
b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
index 8c32b9cb3b49..52b22a944f94 100644
--- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
+++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
@@ -1761,7 +1761,8 @@ void dce110_enable_accelerated_mode(struct dc *dc, struct 
dc_state *context)
edp_link->link_status.link_active) {
struct dc_stream_state *edp_stream = 
edp_streams[0];
 
-   can_apply_edp_fast_boot = 
!is_edp_ilr_optimization_required(edp_stream->link, _stream->timing);
+   can_apply_edp_fast_boot = 
dc_validate_boot_timing(dc,
+   edp_stream->sink, _stream->timing);
edp_stream->apply_edp_fast_boot_optimization = 
can_apply_edp_fast_boot;
if (can_apply_edp_fast_boot)
DC_LOG_EVENT_LINK_TRAINING("eDP fast 
boot disabled to optimize link rate\n");
-- 
2.25.1

[PATCH 06/14] dc: do blocked MST topology discovery at resume from S3/S4

2022-02-04 Thread Jasdeep Dhillon

From: "Guo, Bing" 

Why:
When resume from sleep or hiberation, blocked MST Topology discovery might
need to be used.

How:
Added "DETECT_REASON_RESUMEFROMS3S4" to enum dc_detect_reason; use it to
require blocked MST Topology discovery.

Reviewed-by: Wenjing Liu 
Acked-by: Jasdeep Dhillon 
Signed-off-by: "Bing Guo" 
---
 drivers/gpu/drm/amd/display/dc/core/dc_link.c | 2 +-
 drivers/gpu/drm/amd/display/dc/dc_link.h  | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
index cb6c91cd6e83..ec4b300ec067 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
@@ -831,7 +831,7 @@ static bool discover_dp_mst_topology(struct dc_link *link, 
enum dc_detect_reason
link->type = dc_connection_mst_branch;
dm_helpers_dp_update_branch_info(link->ctx, link);
if (dm_helpers_dp_mst_start_top_mgr(link->ctx,
-   link, reason == DETECT_REASON_BOOT)) {
+   link, (reason == DETECT_REASON_BOOT || reason == 
DETECT_REASON_RESUMEFROMS3S4))) {
link_disconnect_sink(link);
} else {
link->type = dc_connection_sst_branch;
diff --git a/drivers/gpu/drm/amd/display/dc/dc_link.h 
b/drivers/gpu/drm/amd/display/dc/dc_link.h
index e7ead9e25318..9ad3ee4079c3 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_link.h
+++ b/drivers/gpu/drm/amd/display/dc/dc_link.h
@@ -312,6 +312,7 @@ void dc_link_blank_dp_stream(struct dc_link *link, bool 
hw_init);
  */
 enum dc_detect_reason {
DETECT_REASON_BOOT,
+   DETECT_REASON_RESUMEFROMS3S4,
DETECT_REASON_HPD,
DETECT_REASON_HPDRX,
DETECT_REASON_FALLBACK,
-- 
2.25.1

[PATCH 07/14] drm/amd/display: fix yellow carp wm clamping

2022-02-04 Thread Jasdeep Dhillon

From: Dmytro Laktyushkin 

Fix clamping to match register field size

Reviewed-by: Charlene Liu 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Dmytro Laktyushkin 
---
 .../drm/amd/display/dc/dcn31/dcn31_hubbub.c   | 61 ++-
 1 file changed, 32 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hubbub.c 
b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hubbub.c
index 90c73a1cb986..5e3bcaf12cac 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hubbub.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hubbub.c
@@ -138,8 +138,11 @@ static uint32_t convert_and_clamp(
ret_val = wm_ns * refclk_mhz;
ret_val /= 1000;
 
-   if (ret_val > clamp_value)
+   if (ret_val > clamp_value) {
+   /* clamping WMs is abnormal, unexpected and may lead to 
underflow*/
+   ASSERT(0);
ret_val = clamp_value;
+   }
 
return ret_val;
 }
@@ -159,7 +162,7 @@ static bool hubbub31_program_urgent_watermarks(
if (safe_to_lower || watermarks->a.urgent_ns > 
hubbub2->watermarks.a.urgent_ns) {
hubbub2->watermarks.a.urgent_ns = watermarks->a.urgent_ns;
prog_wm_value = convert_and_clamp(watermarks->a.urgent_ns,
-   refclk_mhz, 0x1f);
+   refclk_mhz, 0x3fff);
REG_SET(DCHUBBUB_ARB_DATA_URGENCY_WATERMARK_A, 0,
DCHUBBUB_ARB_DATA_URGENCY_WATERMARK_A, 
prog_wm_value);
 
@@ -193,7 +196,7 @@ static bool hubbub31_program_urgent_watermarks(
if (safe_to_lower || watermarks->a.urgent_latency_ns > 
hubbub2->watermarks.a.urgent_latency_ns) {
hubbub2->watermarks.a.urgent_latency_ns = 
watermarks->a.urgent_latency_ns;
prog_wm_value = 
convert_and_clamp(watermarks->a.urgent_latency_ns,
-   refclk_mhz, 0x1f);
+   refclk_mhz, 0x3fff);
REG_SET(DCHUBBUB_ARB_REFCYC_PER_TRIP_TO_MEMORY_A, 0,
DCHUBBUB_ARB_REFCYC_PER_TRIP_TO_MEMORY_A, 
prog_wm_value);
} else if (watermarks->a.urgent_latency_ns < 
hubbub2->watermarks.a.urgent_latency_ns)
@@ -203,7 +206,7 @@ static bool hubbub31_program_urgent_watermarks(
if (safe_to_lower || watermarks->b.urgent_ns > 
hubbub2->watermarks.b.urgent_ns) {
hubbub2->watermarks.b.urgent_ns = watermarks->b.urgent_ns;
prog_wm_value = convert_and_clamp(watermarks->b.urgent_ns,
-   refclk_mhz, 0x1f);
+   refclk_mhz, 0x3fff);
REG_SET(DCHUBBUB_ARB_DATA_URGENCY_WATERMARK_B, 0,
DCHUBBUB_ARB_DATA_URGENCY_WATERMARK_B, 
prog_wm_value);
 
@@ -237,7 +240,7 @@ static bool hubbub31_program_urgent_watermarks(
if (safe_to_lower || watermarks->b.urgent_latency_ns > 
hubbub2->watermarks.b.urgent_latency_ns) {
hubbub2->watermarks.b.urgent_latency_ns = 
watermarks->b.urgent_latency_ns;
prog_wm_value = 
convert_and_clamp(watermarks->b.urgent_latency_ns,
-   refclk_mhz, 0x1f);
+   refclk_mhz, 0x3fff);
REG_SET(DCHUBBUB_ARB_REFCYC_PER_TRIP_TO_MEMORY_B, 0,
DCHUBBUB_ARB_REFCYC_PER_TRIP_TO_MEMORY_B, 
prog_wm_value);
} else if (watermarks->b.urgent_latency_ns < 
hubbub2->watermarks.b.urgent_latency_ns)
@@ -247,7 +250,7 @@ static bool hubbub31_program_urgent_watermarks(
if (safe_to_lower || watermarks->c.urgent_ns > 
hubbub2->watermarks.c.urgent_ns) {
hubbub2->watermarks.c.urgent_ns = watermarks->c.urgent_ns;
prog_wm_value = convert_and_clamp(watermarks->c.urgent_ns,
-   refclk_mhz, 0x1f);
+   refclk_mhz, 0x3fff);
REG_SET(DCHUBBUB_ARB_DATA_URGENCY_WATERMARK_C, 0,
DCHUBBUB_ARB_DATA_URGENCY_WATERMARK_C, 
prog_wm_value);
 
@@ -281,7 +284,7 @@ static bool hubbub31_program_urgent_watermarks(
if (safe_to_lower || watermarks->c.urgent_latency_ns > 
hubbub2->watermarks.c.urgent_latency_ns) {
hubbub2->watermarks.c.urgent_latency_ns = 
watermarks->c.urgent_latency_ns;
prog_wm_value = 
convert_and_clamp(watermarks->c.urgent_latency_ns,
-   refclk_mhz, 0x1f);
+   refclk_mhz, 0x3fff);
REG_SET(DCHUBBUB_ARB_REFCYC_PER_TRIP_TO_MEMORY_C, 0,
DCHUBBUB_ARB_REFCYC_PER_TRIP_TO_MEMORY_C, 
prog_wm_value);
} else if (watermarks->c.urgent_latency_ns < 
hubbub2->watermarks.c.urgent_latency_ns)
@@ -291,7 +294,7 @@ static bool hubbub31_program_urgent_watermarks(
if (safe_to_lower || watermarks->d.urgent_ns > 
hubbub2->watermarks.d.urgent_ns) {

[PATCH 03/14] drm/amd/display: limit unbounded requesting to 5k

2022-02-04 Thread Jasdeep Dhillon

From: Dmytro Laktyushkin 

Unbounded requesting is unsupported on pipe split modes
and this change prevents us running into such a situation
with wide modes.

Reviewed-by: Charlene Liu 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Dmytro Laktyushkin 
---
 drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c
index 7f9ceda4229b..007a7dc4f5be 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c
@@ -1836,7 +1836,8 @@ static int dcn31_populate_dml_pipes_from_context(
if (is_dual_plane(pipe->plane_state->format)
&& pipe->plane_state->src_rect.width <= 1920 && 
pipe->plane_state->src_rect.height <= 1080) {
dc->config.enable_4to1MPC = true;
-   } else if (!is_dual_plane(pipe->plane_state->format)) {
+   } else if (!is_dual_plane(pipe->plane_state->format) && 
pipe->plane_state->src_rect.width <= 5120) {
+   /* Limit to 5k max to avoid forced pipe split when 
there is not enough detile for swath */
context->bw_ctx.dml.ip.det_buffer_size_kbytes = 192;
pipes[0].pipe.src.unbounded_req_mode = true;
}
-- 
2.25.1

[PATCH 05/14] drm/amd/display: remove static from optc31_set_drr

2022-02-04 Thread Jasdeep Dhillon

From: Eric Bernstein 

remove static from optc31_set_drr

Reviewed-by: Nevenko Stupar 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Eric Bernstein 
---
 drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c | 2 +-
 drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c 
b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c
index e8562fa11366..8afe2130d7c5 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c
@@ -161,7 +161,7 @@ static bool optc31_immediate_disable_crtc(struct 
timing_generator *optc)
return true;
 }
 
-static void optc31_set_drr(
+void optc31_set_drr(
struct timing_generator *optc,
const struct drr_params *params)
 {
diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h 
b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h
index d8ef2f0d0c95..a37b16040c1d 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h
@@ -256,4 +256,6 @@
 
 void dcn31_timing_generator_init(struct optc *optc1);
 
+void optc31_set_drr(struct timing_generator *optc, const struct drr_params 
*params);
+
 #endif /* __DC_OPTC_DCN31_H__ */
-- 
2.25.1

[PATCH 04/14] drm/amd/display: Basic support with device ID

2022-02-04 Thread Jasdeep Dhillon

From: Oliver Logush 

[why]
To get the the cyan_skillfish check working

Reviewed-by: Charlene Liu 
Reviewed-by: Charlene Liu 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Oliver Logush 
---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 24 +--
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |  2 +-
 .../gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c  |  2 +-
 .../gpu/drm/amd/display/dc/core/dc_resource.c |  2 +-
 .../gpu/drm/amd/display/include/dal_asic_id.h |  3 ++-
 5 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 8f53c9f6b267..f5941e59e5ad 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1014,6 +1014,14 @@ static  void amdgpu_dm_audio_eld_notify(struct 
amdgpu_device *adev, int pin)
}
 }
 
+bool is_skillfish_series(struct amdgpu_device *adev)
+{
+   if (adev->asic_type == CHIP_CYAN_SKILLFISH || adev->pdev->revision == 
0x143F) {
+   return true;
+   }
+   return false;
+}
+
 static int dm_dmub_hw_init(struct amdgpu_device *adev)
 {
const struct dmcub_firmware_header_v1_0 *hdr;
@@ -1049,7 +1057,7 @@ static int dm_dmub_hw_init(struct amdgpu_device *adev)
return -EINVAL;
}
 
-   if (!has_hw_support) {
+   if (is_skillfish_series(adev)) {
DRM_INFO("DMUB unsupported on ASIC\n");
return 0;
}
@@ -1471,6 +1479,10 @@ static int amdgpu_dm_init(struct amdgpu_device *adev)
default:
break;
}
+   if (is_skillfish_series(adev)) {
+   init_data.flags.disable_dmcu = true;
+   break;
+   }
break;
}
 
@@ -1777,7 +1789,6 @@ static int load_dmcu_fw(struct amdgpu_device *adev)
case CHIP_VEGA10:
case CHIP_VEGA12:
case CHIP_VEGA20:
-   return 0;
case CHIP_NAVI12:
fw_name_dmcu = FIRMWARE_NAVI12_DMCU;
break;
@@ -1805,6 +1816,9 @@ static int load_dmcu_fw(struct amdgpu_device *adev)
default:
break;
}
+   if (is_skillfish_series(adev)) {
+   return 0;
+   }
DRM_ERROR("Unsupported ASIC type: 0x%X\n", adev->asic_type);
return -EINVAL;
}
@@ -4515,6 +4529,12 @@ static int dm_early_init(void *handle)
adev->mode_info.num_dig = 6;
break;
default:
+   if (is_skillfish_series(adev)) {
+   adev->mode_info.num_crtc = 2;
+   adev->mode_info.num_hpd = 2;
+   adev->mode_info.num_dig = 2;
+   break;
+   }
 #if defined(CONFIG_DRM_AMD_DC_DCN)
switch (adev->ip_versions[DCE_HWIP][0]) {
case IP_VERSION(2, 0, 2):
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
index e35977fda5c1..13875d669acd 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
@@ -82,7 +82,7 @@ struct common_irq_params {
enum dc_irq_source irq_src;
atomic64_t previous_timestamp;
 };
-
+bool is_skillfish_series(struct amdgpu_device *adev);
 /**
  * struct dm_compressor_info - Buffer info used by frame buffer compression
  * @cpu_addr: MMIO cpu addr
diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c 
b/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c
index 9200c8ce02ba..54ef83fe5a9b 100644
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c
+++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c
@@ -259,7 +259,7 @@ struct clk_mgr *dc_clk_mgr_create(struct dc_context *ctx, 
struct pp_smu_funcs *p
dcn3_clk_mgr_construct(ctx, clk_mgr, pp_smu, dccg);
return _mgr->base;
}
-   if (asic_id.chip_id == DEVICE_ID_NV_13FE) {
+   if (asic_id.chip_id == DEVICE_ID_NV_NAVI10_LITE_P_13FE) {
dcn201_clk_mgr_construct(ctx, clk_mgr, pp_smu, dccg);
return _mgr->base;
}
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
index 9df66501a453..8a199d661a66 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
@@ -135,7 +135,7 @@ enum dce_version resource_parse_asic_id(struct hw_asic_id 
asic_id)
 
case FAMILY_NV:
dc_version = DCN_VERSION_2_0;
-   if (asic_id.chip_id == DEVICE_ID_NV_13FE) {
+   if (asic_id.chip_id == DEVICE_ID_NV_NAVI10_LITE_P_13FE || 
asic_id.chip_id == DEVICE_ID_NV_NAVI10_LITE_P_143F) {

[PATCH 02/14] drm/amd/display: Fix stream->link_enc unassigned during stream removal

2022-02-04 Thread Jasdeep Dhillon

From: Nicholas Kazlauskas 

[Why]
Found when running igt@kms_atomic.

Userspace attempts to do a TEST_COMMIT when 0 streams which calls
dc_remove_stream_from_ctx. This in turn calls link_enc_unassign
which ends up modifying stream->link = NULL directly, causing the
global link_enc to be removed preventing further link activity
and future link validation from passing.

[How]
We take care of link_enc unassignment at the start of
link_enc_cfg_link_encs_assign so this call is no longer necessary.

Fixes global state from being modified while unlocked.

Reviewed-by: Jimmy Kizito 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Nicholas Kazlauskas 
---
 drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
index e82aa0559bdf..9df66501a453 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
@@ -1965,10 +1965,6 @@ enum dc_status dc_remove_stream_from_ctx(
dc->res_pool,
del_pipe->stream_res.stream_enc,
false);
-   /* Release link encoder from stream in new dc_state. */
-   if (dc->res_pool->funcs->link_enc_unassign)
-   dc->res_pool->funcs->link_enc_unassign(new_ctx, 
del_pipe->stream);
-
if (is_dp_128b_132b_signal(del_pipe)) {
update_hpo_dp_stream_engine_usage(
_ctx->res_ctx, dc->res_pool,
-- 
2.25.1

[PATCH 01/14] drm/amd/display: Fix for variable may be used uninitialized error

2022-02-04 Thread Jasdeep Dhillon

From: Eric Bernstein 

[Why]
Build failure due to ‘status’ may be used uninitialized

[How]
Initialize status to LINK_TRAINING_SUCCESS

Reviewed-by: Wenjing Liu 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Eric Bernstein 
---
 drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
index 18fe8f77821a..d0cb40df60a4 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
@@ -3199,7 +3199,7 @@ static bool dp_verify_link_cap(
bool success = false;
bool skip_video_pattern;
enum clock_source_id dp_cs_id = get_clock_source_id(link);
-   enum link_training_result status;
+   enum link_training_result status = LINK_TRAINING_SUCCESS;
union hpd_irq_data irq_data;
struct link_resource link_res;
 
-- 
2.25.1

[PATCH 00/14] DC Patchset, Feb 7 2022 v2

2022-02-04 Thread Jasdeep Dhillon

This DC patchset brings improvements in multiple areas. In summary, we have:
-fix for build failure uninitalized error
-Bug fix for DP2 using uncertified cable
-limit unbounded request to 5k
-fix DP LT sequence on EQ fail
-Bug fixes for S3/S4


Anthony Koo (1):
  drm/amd/display: [FW Promotion] Release 0.0.103.0

Aric Cyr (1):
  drm/amd/display: 3.2.172

Dmytro Laktyushkin (2):
  drm/amd/display: limit unbounded requesting to 5k
  drm/amd/display: fix yellow carp wm clamping

Eric Bernstein (2):
  drm/amd/display: Fix for variable may be used uninitialized error
  drm/amd/display: remove static from optc31_set_drr

Guo, Bing (1):
  dc: do blocked MST topology discovery at resume from S3/S4

Ilya (1):
  drm/amd/display: Fix DP LT sequence on EQ fail

Martin Tsai (1):
  drm/amd/display: handle null link encoder

Nicholas Kazlauskas (1):
  drm/amd/display: Fix stream->link_enc unassigned during stream removal

Oliver Logush (2):
  drm/amd/display: Basic support with device ID
  SWDEV-321758 - dc: Code clean

Paul Hsieh (1):
  drm/amd/display: change fastboot timing validation

Zhan Liu (1):
  drm/amd/display: keep eDP Vdd on when eDP stream is already enabled

 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 24 +++-
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |  2 +-
 .../gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c  |  2 +-
 drivers/gpu/drm/amd/display/dc/core/dc.c  |  2 +-
 drivers/gpu/drm/amd/display/dc/core/dc_link.c | 27 ++--
 .../gpu/drm/amd/display/dc/core/dc_link_dp.c  |  4 +-
 .../gpu/drm/amd/display/dc/core/dc_resource.c |  8 +--
 drivers/gpu/drm/amd/display/dc/dc.h   |  4 +-
 drivers/gpu/drm/amd/display/dc/dc_link.h  |  1 +
 .../display/dc/dce110/dce110_hw_sequencer.c   | 27 +++-
 .../drm/amd/display/dc/dcn20/dcn20_resource.c | 11 +---
 .../drm/amd/display/dc/dcn31/dcn31_hubbub.c   | 61 ++-
 .../gpu/drm/amd/display/dc/dcn31/dcn31_optc.c |  2 +-
 .../gpu/drm/amd/display/dc/dcn31/dcn31_optc.h |  2 +
 .../drm/amd/display/dc/dcn31/dcn31_resource.c |  3 +-
 .../gpu/drm/amd/display/dmub/inc/dmub_cmd.h   |  4 +-
 .../gpu/drm/amd/display/include/dal_asic_id.h |  3 +-
 17 files changed, 104 insertions(+), 83 deletions(-)

-- 
2.25.1

[PATCH 2/3] drm/amdkfd: Remove unused old debugger implementation

2022-02-04 Thread Mukul Joshi

Cleanup the kfd code by removing the unused old debugger
implementation.
Only a small piece of resetting wavefronts is kept and
is moved to kfd_device_queue_manager.c

Signed-off-by: Mukul Joshi 
---
 drivers/gpu/drm/amd/amdkfd/Makefile   |   2 -
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  | 282 +-
 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c   | 845 --
 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.h   | 230 -
 drivers/gpu/drm/amd/amdkfd/kfd_dbgmgr.c   | 158 
 drivers/gpu/drm/amd/amdkfd/kfd_dbgmgr.h   | 293 --
 drivers/gpu/drm/amd/amdkfd/kfd_device.c   |   2 -
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |  59 ++
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |  35 +
 drivers/gpu/drm/amd/amdkfd/kfd_iommu.c|  12 -
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   5 -
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  |  19 -
 12 files changed, 98 insertions(+), 1844 deletions(-)
 delete mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
 delete mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.h
 delete mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_dbgmgr.c
 delete mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_dbgmgr.h

diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile 
b/drivers/gpu/drm/amd/amdkfd/Makefile
index c4f3aff11072..19cfbf9577b4 100644
--- a/drivers/gpu/drm/amd/amdkfd/Makefile
+++ b/drivers/gpu/drm/amd/amdkfd/Makefile
@@ -51,8 +51,6 @@ AMDKFD_FILES  := $(AMDKFD_PATH)/kfd_module.o \
$(AMDKFD_PATH)/kfd_events.o \
$(AMDKFD_PATH)/cik_event_interrupt.o \
$(AMDKFD_PATH)/kfd_int_process_v9.o \
-   $(AMDKFD_PATH)/kfd_dbgdev.o \
-   $(AMDKFD_PATH)/kfd_dbgmgr.o \
$(AMDKFD_PATH)/kfd_smi_events.o \
$(AMDKFD_PATH)/kfd_crat.o
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 64e3b4e3a712..cfe12525165f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -39,7 +39,6 @@
 #include 
 #include "kfd_priv.h"
 #include "kfd_device_queue_manager.h"
-#include "kfd_dbgmgr.h"
 #include "kfd_svm.h"
 #include "amdgpu_amdkfd.h"
 #include "kfd_smi_events.h"
@@ -580,299 +579,26 @@ static int kfd_ioctl_set_trap_handler(struct file *filep,
 static int kfd_ioctl_dbg_register(struct file *filep,
struct kfd_process *p, void *data)
 {
-   struct kfd_ioctl_dbg_register_args *args = data;
-   struct kfd_dev *dev;
-   struct kfd_dbgmgr *dbgmgr_ptr;
-   struct kfd_process_device *pdd;
-   bool create_ok;
-   long status = 0;
-
-   mutex_lock(>mutex);
-   pdd = kfd_process_device_data_by_id(p, args->gpu_id);
-   if (!pdd) {
-   status = -EINVAL;
-   goto err_pdd;
-   }
-   dev = pdd->dev;
-
-   if (dev->adev->asic_type == CHIP_CARRIZO) {
-   pr_debug("kfd_ioctl_dbg_register not supported on CZ\n");
-   status = -EINVAL;
-   goto err_chip_unsupp;
-   }
-
-   mutex_lock(kfd_get_dbgmgr_mutex());
-
-   /*
-* make sure that we have pdd, if this the first queue created for
-* this process
-*/
-   pdd = kfd_bind_process_to_device(dev, p);
-   if (IS_ERR(pdd)) {
-   status = PTR_ERR(pdd);
-   goto out;
-   }
-
-   if (!dev->dbgmgr) {
-   /* In case of a legal call, we have no dbgmgr yet */
-   create_ok = kfd_dbgmgr_create(_ptr, dev);
-   if (create_ok) {
-   status = kfd_dbgmgr_register(dbgmgr_ptr, p);
-   if (status != 0)
-   kfd_dbgmgr_destroy(dbgmgr_ptr);
-   else
-   dev->dbgmgr = dbgmgr_ptr;
-   }
-   } else {
-   pr_debug("debugger already registered\n");
-   status = -EINVAL;
-   }
-
-out:
-   mutex_unlock(kfd_get_dbgmgr_mutex());
-err_pdd:
-err_chip_unsupp:
-   mutex_unlock(>mutex);
-
-   return status;
+   return -EPERM;
 }
 
 static int kfd_ioctl_dbg_unregister(struct file *filep,
struct kfd_process *p, void *data)
 {
-   struct kfd_ioctl_dbg_unregister_args *args = data;
-   struct kfd_process_device *pdd;
-   struct kfd_dev *dev;
-   long status;
-
-   mutex_lock(>mutex);
-   pdd = kfd_process_device_data_by_id(p, args->gpu_id);
-   mutex_unlock(>mutex);
-   if (!pdd || !pdd->dev->dbgmgr)
-   return -EINVAL;
-
-   dev = pdd->dev;
-
-   if (dev->adev->asic_type == CHIP_CARRIZO) {
-   pr_debug("kfd_ioctl_dbg_unregister not supported on CZ\n");
-   return -EINVAL;
-   }
-
-   mutex_lock(kfd_get_dbgmgr_mutex());
-
-   status = kfd_dbgmgr_unregister(dev->dbgmgr, p);
-   if (!status) {
-   kfd_dbgmgr_destroy(dev->dbgmgr);
-

[PATCH 3/3] drm/amdkfd: Consolidate MQD manager functions

2022-02-04 Thread Mukul Joshi

A few MQD manager functions are duplicated for all versions of
MQD manager. Remove this duplication by moving the common
functions into kfd_mqd_manager.c file.

Signed-off-by: Mukul Joshi 
---
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c  | 63 +
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h  | 27 
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c  | 54 ---
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c  | 61 -
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c   | 68 ---
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c   | 53 ---
 6 files changed, 90 insertions(+), 236 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
index e2825ad4d699..f4a6af98db2d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
@@ -173,3 +173,66 @@ void mqd_symmetrically_map_cu_mask(struct mqd_manager *mm,
}
}
 }
+
+int hiq_load_mqd_kiq(struct mqd_manager *mm, void *mqd,
+uint32_t pipe_id, uint32_t queue_id,
+struct queue_properties *p, struct mm_struct *mms)
+{
+   return mm->dev->kfd2kgd->hiq_mqd_load(mm->dev->adev, mqd, pipe_id,
+ queue_id, p->doorbell_off);
+}
+
+int destroy_mqd(struct mqd_manager *mm, void *mqd,
+   enum kfd_preempt_type type, unsigned int timeout,
+   uint32_t pipe_id,uint32_t queue_id)
+{
+   return mm->dev->kfd2kgd->hqd_destroy(mm->dev->adev, mqd, type, timeout,
+   pipe_id, queue_id);
+}
+
+void free_mqd(struct mqd_manager *mm, void *mqd,
+ struct kfd_mem_obj *mqd_mem_obj)
+{
+   if (mqd_mem_obj->gtt_mem) {
+   amdgpu_amdkfd_free_gtt_mem(mm->dev->adev, mqd_mem_obj->gtt_mem);
+   kfree(mqd_mem_obj);
+   } else {
+   kfd_gtt_sa_free(mm->dev, mqd_mem_obj);
+   }
+}
+
+bool is_occupied(struct mqd_manager *mm, void *mqd,
+uint64_t queue_address, uint32_t pipe_id,
+uint32_t queue_id)
+{
+   return mm->dev->kfd2kgd->hqd_is_occupied(mm->dev->adev, queue_address,
+   pipe_id, queue_id);
+}
+
+int load_mqd_sdma(struct mqd_manager *mm, void *mqd,
+ uint32_t pipe_id, uint32_t queue_id,
+ struct queue_properties *p, struct mm_struct *mms)
+{
+   return mm->dev->kfd2kgd->hqd_sdma_load(mm->dev->adev, mqd,
+   (uint32_t __user *)p->write_ptr,
+   mms);
+}
+
+/*
+ * preempt type here is ignored because there is only one way
+ * to preempt sdma queue
+ */
+int destroy_mqd_sdma(struct mqd_manager *mm, void *mqd,
+enum kfd_preempt_type type,
+unsigned int timeout, uint32_t pipe_id,
+uint32_t queue_id)
+{
+   return mm->dev->kfd2kgd->hqd_sdma_destroy(mm->dev->adev, mqd, timeout);
+}
+
+bool is_occupied_sdma(struct mqd_manager *mm, void *mqd,
+ uint64_t queue_address, uint32_t pipe_id,
+ uint32_t queue_id)
+{
+   return mm->dev->kfd2kgd->hqd_sdma_is_occupied(mm->dev->adev, mqd);
+}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h
index 23486a23df84..76f20637b938 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h
@@ -136,4 +136,31 @@ void mqd_symmetrically_map_cu_mask(struct mqd_manager *mm,
const uint32_t *cu_mask, uint32_t cu_mask_count,
uint32_t *se_mask);
 
+int hiq_load_mqd_kiq(struct mqd_manager *mm, void *mqd,
+   uint32_t pipe_id, uint32_t queue_id,
+   struct queue_properties *p, struct mm_struct *mms);
+
+int destroy_mqd(struct mqd_manager *mm, void *mqd,
+   enum kfd_preempt_type type, unsigned int timeout,
+   uint32_t pipe_id,uint32_t queue_id);
+
+void free_mqd(struct mqd_manager *mm, void *mqd,
+   struct kfd_mem_obj *mqd_mem_obj);
+
+bool is_occupied(struct mqd_manager *mm, void *mqd,
+uint64_t queue_address, uint32_t pipe_id,
+uint32_t queue_id);
+
+int load_mqd_sdma(struct mqd_manager *mm, void *mqd,
+   uint32_t pipe_id, uint32_t queue_id,
+   struct queue_properties *p, struct mm_struct *mms);
+
+int destroy_mqd_sdma(struct mqd_manager *mm, void *mqd,
+   enum kfd_preempt_type type,unsigned int timeout,
+   uint32_t pipe_id, uint32_t queue_id);
+
+bool is_occupied_sdma(struct mqd_manager *mm, void *mqd,
+   uint64_t queue_address, uint32_t pipe_id,
+   uint32_t queue_id);
+
 #endif /* KFD_MQD_MANAGER_H_ */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c

[PATCH 1/3] drm/amdkfd: Fix TLB flushing in KFD SVM with no HWS

2022-02-04 Thread Mukul Joshi

With no HWS, TLB flushing will not work in SVM code.
Fix this by calling kfd_flush_tlb() which works for both
HWS and no HWS case.

Signed-off-by: Mukul Joshi 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 41f03d165bad..b1315c97b952 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1229,15 +1229,14 @@ svm_range_unmap_from_gpus(struct svm_range *prange, 
unsigned long start,
if (r)
break;
}
-   amdgpu_amdkfd_flush_gpu_tlb_pasid(pdd->dev->adev,
-   p->pasid, TLB_FLUSH_HEAVYWEIGHT);
+   kfd_flush_tlb(pdd, TLB_FLUSH_HEAVYWEIGHT);
}
 
return r;
 }
 
 static int
-svm_range_map_to_gpu(struct amdgpu_device *adev, struct amdgpu_vm *vm,
+svm_range_map_to_gpu(struct amdgpu_device *adev, struct kfd_process_device 
*pdd,
 struct svm_range *prange, unsigned long offset,
 unsigned long npages, bool readonly, dma_addr_t *dma_addr,
 struct amdgpu_device *bo_adev, struct dma_fence **fence)
@@ -1248,6 +1247,7 @@ svm_range_map_to_gpu(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
int last_domain;
int r = 0;
int64_t i, j;
+   struct amdgpu_vm *vm = drm_priv_to_vm(pdd->drm_priv);
 
last_start = prange->start + offset;
 
@@ -1305,12 +1305,8 @@ svm_range_map_to_gpu(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
if (fence)
*fence = dma_fence_get(vm->last_update);
 
-   if (table_freed) {
-   struct kfd_process *p;
-
-   p = container_of(prange->svms, struct kfd_process, svms);
-   amdgpu_amdkfd_flush_gpu_tlb_pasid(adev, p->pasid, 
TLB_FLUSH_LEGACY);
-   }
+   if (table_freed)
+   kfd_flush_tlb(pdd, TLB_FLUSH_LEGACY);
 out:
return r;
 }
@@ -1351,7 +1347,7 @@ svm_range_map_to_gpus(struct svm_range *prange, unsigned 
long offset,
continue;
}
 
-   r = svm_range_map_to_gpu(pdd->dev->adev, 
drm_priv_to_vm(pdd->drm_priv),
+   r = svm_range_map_to_gpu(pdd->dev->adev, pdd,
 prange, offset, npages, readonly,
 prange->dma_addr[gpuidx],
 bo_adev, wait ?  : NULL);
-- 
2.33.1

Re: [PATCH] Revert "i2c: core: support bus regulator controlling in adapter"

2022-02-04 Thread Hsin-Yi Wang

On Sun, Jan 16, 2022 at 2:44 AM Tareque Md.Hanif
 wrote:
>
> Hi Hsin-Yi,
>
> The issue still exists. I reverted a19f75de73c220b4496d2aefb7a605dd032f7c01 
> (the commit that reverted 5a7b95fb993ec399c8a685552aa6a8fc995c40bd) and 
> manually applied the patch (tags/v5.16). journalctl attached.

hi Tareque,

Can you apply the same setting[1] again and with this patch to see if
the issue is still there?
https://github.com/torvalds/linux/commit/6dc8265f9803ccb7e5da804e01601f0c14f270e0

[1] reverted a19f75de73c220b4496d2aefb7a605dd032f7c01 (the commit that
reverted 5a7b95fb993ec399c8a685552aa6a8fc995c40bd) and manually
applied the patch (tags/v5.16)

Thanks
>
> Regards,
>
> Tareque
>
> On Saturday, January 15, 2022, 11:27:07 PM GMT+6, Hsin-Yi Wang 
>  wrote:
>
>
> hi Tareque,
>
>
> On Fri, Jan 14, 2022 at 6:09 PM Tareque Md Hanif
>  wrote:
> >
> > Hi Hsin-Yi,
> >
> > On 1/12/22 16:58, Hsin-Yi Wang wrote:
> >
> > Can you help provide logs if we apply
> > 5a7b95fb993ec399c8a685552aa6a8fc995c40bd but revert
> > 8d35a2596164c1c9d34d4656fd42b445cd1e247f?
> >
> > Issue still exists. journalctl log attached in revert_8d.txt
> >
> >
> > > after apply 5a7b95fb993ec399c8a685552aa6a8fc995c40bd
> > > 1. delete SET_LATE_SYSTEM_SLEEP_PM_OPS(i2c_suspend_late,
> > > i2c_resume_early) and function i2c_suspend_late() and
> > > i2c_resume_early().
> >
> > No issues. journalctl log attached in test1.txt
> >
> >
> > > 2. delete SET_RUNTIME_PM_OPS(i2c_runtime_suspend, i2c_runtime_resume,
> > > NULL) and function i2c_runtime_suspend() and i2c_runtime_resume().
> >
> > Issue exists. journalctl log attached in test2.txt
>
>
> Thanks for the testing.
> Can you help us test if applying the following patch on top of
> 5a7b95fb993ec399c8a685552aa6a8fc995c40bd works? Thanks
>
> diff --git a/drivers/i2c/i2c-core-base.c b/drivers/i2c/i2c-core-base.c
> index 9eb4009cb250..6b046012aa08 100644
> --- a/drivers/i2c/i2c-core-base.c
> +++ b/drivers/i2c/i2c-core-base.c
> @@ -484,7 +484,7 @@ static int i2c_resume_early(struct device *dev)
> struct i2c_client *client = i2c_verify_client(dev);
> int err;
>
> -  if (!client)
> +  if (!client || dev_pm_skip_resume(dev))
> return 0;
>
> if (pm_runtime_status_suspended(>dev) &&
> @@ -502,7 +502,7 @@ static int i2c_suspend_late(struct device *dev)
> struct i2c_client *client = i2c_verify_client(dev);
> int err;
>
> -  if (!client)
> +  if (!client || dev_pm_skip_suspend(dev))
> return 0;
>
> err = pm_generic_suspend_late(>dev);
>

Re: [PATCH] drm/amdgpu: move lockdep assert to the right place.

2022-02-04 Thread Bhardwaj, Rajneesh




On 2/4/2022 1:50 PM, Christian König wrote:

Am 04.02.22 um 19:47 schrieb Bhardwaj, Rajneesh:


On 2/4/2022 1:32 PM, Christian König wrote:

Am 04.02.22 um 19:12 schrieb Bhardwaj, Rajneesh:

[Sorry for top posting]

Hi Christian

I think you forgot the below hunk, without which the issue is not 
fixed completely on a multi GPU system.


No, that is perfectly intentional. While removing a bo_va structure 
it can happen that there are still mappings attached to it (for 
example because the application crashed).



OK. but for me, at boot time, I see flood of warning messages coming 
from  amdgpu_vm_bo_del so the previous patch doesn't help.


Do you have a backtrace? That should not happen.

Could be that we have this buggy at a couple of different places.



This is on Ubuntu 18.04.


[    8.392405] WARNING: CPU: 13 PID: 2732 at 
/home/rajneesh/git/compute/kernel/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:2673 
amdgpu_vm_bo_del+0x386/0x3b

0 [amdgpu]
[    8.392586] Modules linked in: amdgpu ast iommu_v2 gpu_sched 
drm_vram_helper drm_ttm_helper ttm drm_kms_helper cfbfillrect 
syscopyarea cfbimgblt sy
sfillrect sysimgblt fb_sys_fops cfbcopyarea drm 
drm_panel_orientation_quirks k10temp nf_tables nfnetlink ip_tables 
x_tables i2c_piix4
[    8.392628] CPU: 13 PID: 2732 Comm: plymouthd Not tainted 
5.13.0-kfd-rajneesh #1055
[    8.392632] Hardware name: GIGABYTE MZ01-CE0-00/MZ01-CE0-00, BIOS F02 
08/29/2018

[    8.392635] RIP: 0010:amdgpu_vm_bo_del+0x386/0x3b0 [amdgpu]
[    8.392749] Code: 0f 85 56 fe ff ff 48 c7 c2 28 6b b3 c0 be 21 01 00 
00 48 c7 c7 58 6b b3 c0 c6 05 64 64 62 00 01 e8 5f be a4 d4 e9 32 fe ff 
ff <0f

> 0b eb 8a 49 8b be 88 01 00 00 e9 af fc ff ff be 03 00 00 00 e8
[    8.392752] RSP: 0018:af33471d7d98 EFLAGS: 00010246
[    8.392756] RAX:  RBX: a0877160 RCX: 
0001
[    8.392758] RDX: 0001 RSI: a08772ae01f8 RDI: 
a0800a426f68
[    8.392761] RBP: a087691fd980 R08: c0a4e2e0 R09: 
000a
[    8.392763] R10: af33471d7d68 R11: 0001 R12: 
a0801d540010
[    8.392765] R13: a08771615c00 R14: a08024166618 R15: 
a08771615e60
[    8.392768] FS:  7f7b80179740() GS:a08f3dec() 
knlGS:

[    8.392771] CS:  0010 DS:  ES:  CR0: 80050033
[    8.392773] CR2: 55839db51eb8 CR3: 00010f49c000 CR4: 
003506e0

[    8.392775] Call Trace:
[    8.392779]  ? _raw_spin_unlock_irqrestore+0x30/0x40
[    8.392787]  amdgpu_driver_postclose_kms+0x94/0x270 [amdgpu]
[    8.392897]  drm_file_free.part.14+0x1e3/0x230 [drm]
[    8.392918]  drm_release+0x6e/0xf0 [drm]
[    8.392937]  __fput+0xa3/0x250
[    8.392942]  task_work_run+0x6d/0xb0
[    8.392949]  exit_to_user_mode_prepare+0x1c9/0x1d0
[    8.392953]  syscall_exit_to_user_mode+0x19/0x50
[    8.392957]  do_syscall_64+0x42/0x70
[    8.392960]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[    8.392964] RIP: 0033:0x7f7b7f8679e4
[    8.392967] Code: eb 89 e8 6f 44 02 00 66 2e 0f 1f 84 00 00 00 00 00 
0f 1f 44 00 00 48 8d 05 21 ff 2d 00 8b 00 85 c0 75 13 b8 03 00 00 00 0f 
05 <48> 3d 00 f0 ff ff 77 3c f3 c3 66 90 53 89 fb 48 83 ec 10 e8 94 fe
[    8.392970] RSP: 002b:7ffe6bb0cdb8 EFLAGS: 0246 ORIG_RAX: 
0003
[    8.392973] RAX:  RBX: 55839db4b760 RCX: 
7f7b7f8679e4
[    8.392974] RDX: 55839db4aed0 RSI:  RDI: 
000b
[    8.392976] RBP: 000b R08: 55839db4aee0 R09: 
7f7b7fb42c40
[    8.392978] R10: 0007 R11: 0246 R12: 
e280
[    8.392979] R13: 000d R14: 7f7b7fb5b1e0 R15: 
7f7b7fb5b130

[    8.392988] irq event stamp: 1264799
[    8.392990] hardirqs last  enabled at (1264805): [] 
console_unlock+0x339/0x530
[    8.392994] hardirqs last disabled at (1264810): [] 
console_unlock+0x3a6/0x530





Regards,
Christian.






Because of this locking the VM before the remove is mandatory. Only 
while adding a bo_va structure we can avoid that.


Regards,
Christian.



diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

index dcc80d6e099e..6f68fc9da56a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2670,8 +2670,6 @@ void amdgpu_vm_bo_del(struct amdgpu_device 
*adev,

    struct amdgpu_vm *vm = bo_va->base.vm;
    struct amdgpu_vm_bo_base **base;

- dma_resv_assert_held(vm->root.bo->tbo.base.resv);
-
    if (bo) {
    dma_resv_assert_held(bo->tbo.base.resv);
    if (bo->tbo.base.resv == vm->root.bo->tbo.base.resv)


If you chose to include the above hunk, please feel free to add

Reviewed-and-tested-by: Rajneesh Bhardwaj 

On 2/4/2022 11:27 AM, Felix Kuehling wrote:


Am 2022-02-04 um 03:52 schrieb Christian König:

Since newly added BOs don't have any mappings it's ok to add them
without holding the VM lock. Only when we add per VM BOs the

[PATCH 03/13] drm/amd/display: limit unbounded requesting to 5k

2022-02-04 Thread Jasdeep Dhillon

From: Dmytro Laktyushkin 

Unbounded requesting is unsupported on pipe split modes
and this change prevents us running into such a situation
with wide modes.

Reviewed-by: Charlene Liu 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Dmytro Laktyushkin 
---
 drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c
index 7f9ceda4229b..007a7dc4f5be 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c
@@ -1836,7 +1836,8 @@ static int dcn31_populate_dml_pipes_from_context(
if (is_dual_plane(pipe->plane_state->format)
&& pipe->plane_state->src_rect.width <= 1920 && 
pipe->plane_state->src_rect.height <= 1080) {
dc->config.enable_4to1MPC = true;
-   } else if (!is_dual_plane(pipe->plane_state->format)) {
+   } else if (!is_dual_plane(pipe->plane_state->format) && 
pipe->plane_state->src_rect.width <= 5120) {
+   /* Limit to 5k max to avoid forced pipe split when 
there is not enough detile for swath */
context->bw_ctx.dml.ip.det_buffer_size_kbytes = 192;
pipes[0].pipe.src.unbounded_req_mode = true;
}
-- 
2.25.1

[PATCH 04/13] drm/amd/display: Basic support with device ID

2022-02-04 Thread Jasdeep Dhillon

From: Oliver Logush 

[why]
To get the the cyan_skillfish check working

Reviewed-by: Charlene Liu 
Reviewed-by: Charlene Liu 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Oliver Logush 
---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 24 +--
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |  2 +-
 .../gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c  |  2 +-
 .../gpu/drm/amd/display/dc/core/dc_resource.c |  2 +-
 .../gpu/drm/amd/display/include/dal_asic_id.h |  3 ++-
 5 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 8f53c9f6b267..f5941e59e5ad 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1014,6 +1014,14 @@ static  void amdgpu_dm_audio_eld_notify(struct 
amdgpu_device *adev, int pin)
}
 }
 
+bool is_skillfish_series(struct amdgpu_device *adev)
+{
+   if (adev->asic_type == CHIP_CYAN_SKILLFISH || adev->pdev->revision == 
0x143F) {
+   return true;
+   }
+   return false;
+}
+
 static int dm_dmub_hw_init(struct amdgpu_device *adev)
 {
const struct dmcub_firmware_header_v1_0 *hdr;
@@ -1049,7 +1057,7 @@ static int dm_dmub_hw_init(struct amdgpu_device *adev)
return -EINVAL;
}
 
-   if (!has_hw_support) {
+   if (is_skillfish_series(adev)) {
DRM_INFO("DMUB unsupported on ASIC\n");
return 0;
}
@@ -1471,6 +1479,10 @@ static int amdgpu_dm_init(struct amdgpu_device *adev)
default:
break;
}
+   if (is_skillfish_series(adev)) {
+   init_data.flags.disable_dmcu = true;
+   break;
+   }
break;
}
 
@@ -1777,7 +1789,6 @@ static int load_dmcu_fw(struct amdgpu_device *adev)
case CHIP_VEGA10:
case CHIP_VEGA12:
case CHIP_VEGA20:
-   return 0;
case CHIP_NAVI12:
fw_name_dmcu = FIRMWARE_NAVI12_DMCU;
break;
@@ -1805,6 +1816,9 @@ static int load_dmcu_fw(struct amdgpu_device *adev)
default:
break;
}
+   if (is_skillfish_series(adev)) {
+   return 0;
+   }
DRM_ERROR("Unsupported ASIC type: 0x%X\n", adev->asic_type);
return -EINVAL;
}
@@ -4515,6 +4529,12 @@ static int dm_early_init(void *handle)
adev->mode_info.num_dig = 6;
break;
default:
+   if (is_skillfish_series(adev)) {
+   adev->mode_info.num_crtc = 2;
+   adev->mode_info.num_hpd = 2;
+   adev->mode_info.num_dig = 2;
+   break;
+   }
 #if defined(CONFIG_DRM_AMD_DC_DCN)
switch (adev->ip_versions[DCE_HWIP][0]) {
case IP_VERSION(2, 0, 2):
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
index e35977fda5c1..13875d669acd 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
@@ -82,7 +82,7 @@ struct common_irq_params {
enum dc_irq_source irq_src;
atomic64_t previous_timestamp;
 };
-
+bool is_skillfish_series(struct amdgpu_device *adev);
 /**
  * struct dm_compressor_info - Buffer info used by frame buffer compression
  * @cpu_addr: MMIO cpu addr
diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c 
b/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c
index 9200c8ce02ba..54ef83fe5a9b 100644
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c
+++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c
@@ -259,7 +259,7 @@ struct clk_mgr *dc_clk_mgr_create(struct dc_context *ctx, 
struct pp_smu_funcs *p
dcn3_clk_mgr_construct(ctx, clk_mgr, pp_smu, dccg);
return _mgr->base;
}
-   if (asic_id.chip_id == DEVICE_ID_NV_13FE) {
+   if (asic_id.chip_id == DEVICE_ID_NV_NAVI10_LITE_P_13FE) {
dcn201_clk_mgr_construct(ctx, clk_mgr, pp_smu, dccg);
return _mgr->base;
}
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
index 9df66501a453..8a199d661a66 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
@@ -135,7 +135,7 @@ enum dce_version resource_parse_asic_id(struct hw_asic_id 
asic_id)
 
case FAMILY_NV:
dc_version = DCN_VERSION_2_0;
-   if (asic_id.chip_id == DEVICE_ID_NV_13FE) {
+   if (asic_id.chip_id == DEVICE_ID_NV_NAVI10_LITE_P_13FE || 
asic_id.chip_id == DEVICE_ID_NV_NAVI10_LITE_P_143F) {

[PATCH 05/13] drm/amd/display: remove static from optc31_set_drr

2022-02-04 Thread Jasdeep Dhillon

From: Eric Bernstein 

remove static from optc31_set_drr

Reviewed-by: Nevenko Stupar 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Eric Bernstein 
---
 drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c | 2 +-
 drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c 
b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c
index e8562fa11366..8afe2130d7c5 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c
@@ -161,7 +161,7 @@ static bool optc31_immediate_disable_crtc(struct 
timing_generator *optc)
return true;
 }
 
-static void optc31_set_drr(
+void optc31_set_drr(
struct timing_generator *optc,
const struct drr_params *params)
 {
diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h 
b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h
index d8ef2f0d0c95..a37b16040c1d 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h
@@ -256,4 +256,6 @@
 
 void dcn31_timing_generator_init(struct optc *optc1);
 
+void optc31_set_drr(struct timing_generator *optc, const struct drr_params 
*params);
+
 #endif /* __DC_OPTC_DCN31_H__ */
-- 
2.25.1

[PATCH 02/13] drm/amd/display: Fix stream->link_enc unassigned during stream removal

2022-02-04 Thread Jasdeep Dhillon

From: Nicholas Kazlauskas 

[Why]
Found when running igt@kms_atomic.

Userspace attempts to do a TEST_COMMIT when 0 streams which calls
dc_remove_stream_from_ctx. This in turn calls link_enc_unassign
which ends up modifying stream->link = NULL directly, causing the
global link_enc to be removed preventing further link activity
and future link validation from passing.

[How]
We take care of link_enc unassignment at the start of
link_enc_cfg_link_encs_assign so this call is no longer necessary.

Fixes global state from being modified while unlocked.

Reviewed-by: Jimmy Kizito 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Nicholas Kazlauskas 
---
 drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
index e82aa0559bdf..9df66501a453 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
@@ -1965,10 +1965,6 @@ enum dc_status dc_remove_stream_from_ctx(
dc->res_pool,
del_pipe->stream_res.stream_enc,
false);
-   /* Release link encoder from stream in new dc_state. */
-   if (dc->res_pool->funcs->link_enc_unassign)
-   dc->res_pool->funcs->link_enc_unassign(new_ctx, 
del_pipe->stream);
-
if (is_dp_128b_132b_signal(del_pipe)) {
update_hpo_dp_stream_engine_usage(
_ctx->res_ctx, dc->res_pool,
-- 
2.25.1

[PATCH 00/13] DC Patches Feburary 7, 2022

2022-02-04 Thread Jasdeep Dhillon

This DC patchset brings improvements in multiple areas. In summary, we have:
-fix for build failure uninitalized error
-Bug fix for DP2 using uncertified cable
-limit unbounded request to 5k
-fix DP LT sequence on EQ fail
-Bug fixes for S3/S4

Anthony Koo (1):
  drm/amd/display: [FW Promotion] Release 0.0.103.0

Aric Cyr (1):
  drm/amd/display: 3.2.172

Dmytro Laktyushkin (2):
  drm/amd/display: limit unbounded requesting to 5k
  drm/amd/display: fix yellow carp wm clamping

Eric Bernstein (2):
  drm/amd/display: Fix for variable may be used uninitialized error
  drm/amd/display: remove static from optc31_set_drr

Guo, Bing (1):
  dc: do blocked MST topology discovery at resume from S3/S4

Ilya (1):
  drm/amd/display: Fix DP LT sequence on EQ fail

Martin Tsai (1):
  drm/amd/display: handle null link encoder

Nicholas Kazlauskas (1):
  drm/amd/display: Fix stream->link_enc unassigned during stream removal

Oliver Logush (1):
  drm/amd/display: Basic support with device ID

Paul Hsieh (1):
  drm/amd/display: change fastboot timing validation

Zhan Liu (1):
  drm/amd/display: keep eDP Vdd on when eDP stream is already enabled

 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 24 +++-
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |  2 +-
 .../gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c  |  2 +-
 drivers/gpu/drm/amd/display/dc/core/dc.c  |  2 +-
 drivers/gpu/drm/amd/display/dc/core/dc_link.c | 27 ++--
 .../gpu/drm/amd/display/dc/core/dc_link_dp.c  |  4 +-
 .../gpu/drm/amd/display/dc/core/dc_resource.c |  8 +--
 drivers/gpu/drm/amd/display/dc/dc.h   |  4 +-
 drivers/gpu/drm/amd/display/dc/dc_link.h  |  1 +
 .../display/dc/dce110/dce110_hw_sequencer.c   | 27 +++-
 .../drm/amd/display/dc/dcn20/dcn20_resource.c | 11 +---
 .../drm/amd/display/dc/dcn31/dcn31_hubbub.c   | 61 ++-
 .../gpu/drm/amd/display/dc/dcn31/dcn31_optc.c |  2 +-
 .../gpu/drm/amd/display/dc/dcn31/dcn31_optc.h |  2 +
 .../drm/amd/display/dc/dcn31/dcn31_resource.c |  3 +-
 .../gpu/drm/amd/display/dmub/inc/dmub_cmd.h   |  4 +-
 .../gpu/drm/amd/display/include/dal_asic_id.h |  3 +-
 17 files changed, 104 insertions(+), 83 deletions(-)

-- 
2.25.1

[PATCH 01/13] drm/amd/display: Fix for variable may be used uninitialized error

2022-02-04 Thread Jasdeep Dhillon

From: Eric Bernstein 

[Why]
Build failure due to ‘status’ may be used uninitialized

[How]
Initialize status to LINK_TRAINING_SUCCESS

Reviewed-by: Wenjing Liu 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Eric Bernstein 
---
 drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
index 18fe8f77821a..d0cb40df60a4 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
@@ -3199,7 +3199,7 @@ static bool dp_verify_link_cap(
bool success = false;
bool skip_video_pattern;
enum clock_source_id dp_cs_id = get_clock_source_id(link);
-   enum link_training_result status;
+   enum link_training_result status = LINK_TRAINING_SUCCESS;
union hpd_irq_data irq_data;
struct link_resource link_res;
 
-- 
2.25.1

Re: [PATCH] drm/amdgpu: Set FRU bus for Aldebaran and Vega 20

2022-02-04 Thread Alex Deucher

On Fri, Feb 4, 2022 at 1:56 PM Luben Tuikov  wrote:
>
> On 2022-02-04 13:53, Alex Deucher wrote:
> > On Fri, Feb 4, 2022 at 1:48 PM Luben Tuikov  wrote:
> >>
> >> The FRU and RAS EEPROMs share the same I2C bus on Aldebaran and Vega 20
> >> ASICs. Set the FRU bus "pointer" to this single bus, as access to the FRU
> >> is sought through that bus "pointer" and not through the RAS bus "pointer".
> >>
> >> Cc: Roy Sun 
> >> Cc: Alex Deucher 
> >> Fixes: 9c6d2ba7057abb ("drm/amd: Expose the FRU SMU I2C bus")
> >> Signed-off-by: Luben Tuikov 
> >> ---
> >>  drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c | 3 ++-
> >>  drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 1 +
> >>  2 files changed, 3 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c 
> >> b/drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c
> >> index 87acb089cfccd5..dd2d66090d2374 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c
> >> @@ -741,7 +741,7 @@ int smu_v11_0_i2c_control_init(struct amdgpu_device 
> >> *adev)
> >> i2c_set_adapdata(control, smu_i2c);
> >>
> >> adev->pm.ras_eeprom_i2c_bus = >pm.smu_i2c[0].adapter;
> >> -   adev->pm.fru_eeprom_i2c_bus = NULL;
> >> +   adev->pm.fru_eeprom_i2c_bus = >pm.smu_i2c[0].adapter;
> >>
> >> res = i2c_add_adapter(control);
> >> if (res)
> >> @@ -756,6 +756,7 @@ void smu_v11_0_i2c_control_fini(struct amdgpu_device 
> >> *adev)
> >>
> >> i2c_del_adapter(control);
> >> adev->pm.ras_eeprom_i2c_bus = NULL;
> >> +   adev->pm.fru_eeprom_i2c_bus = NULL;
> >>  }
> >>
> >>  /*
> >> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c 
> >> b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
> >> index ba4c9771ffc56b..3d96d4af92d97f 100644
> >> --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
> >> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
> >> @@ -1583,6 +1583,7 @@ static int aldebaran_i2c_control_init(struct 
> >> smu_context *smu)
> >> }
> >>
> >> adev->pm.ras_eeprom_i2c_bus = >pm.smu_i2c[0].adapter;
> >> +   adev->pm.fru_eeprom_i2c_bus = >pm.smu_i2c[0].adapter;
> >
> > Does this need to be set to NULL in control_fini() for consistency?
> > Other than that, looks good to me.
>
> It already is--I didn't need to add it as I did above for Vega20.

In that case:
Reviewed-by: Alex Deucher 

>
> Regards,
> Luben
>
> >
> > Alex
> >
> >
> >>
> >> return 0;
> >>  Out_err:
> >>
> >> base-commit: 24b54e10043609a55194badff58bf91189c49793
> >> prerequisite-patch-id: d694f05bfcdf100c1d6957afa209e304dad98404
> >> prerequisite-patch-id: b106a5bd7e0d3a7c225a767ca4384162353cca9a
> >> prerequisite-patch-id: 77d04393fc872e4f4dd158659bd44ba40b749883
> >> --
> >> 2.35.0.3.gb23dac905b
> >>
>
> Regards,
> --
> Luben

Re: [PATCH] drm/amdgpu: Set FRU bus for Aldebaran and Vega 20

2022-02-04 Thread Luben Tuikov

On 2022-02-04 13:53, Alex Deucher wrote:
> On Fri, Feb 4, 2022 at 1:48 PM Luben Tuikov  wrote:
>>
>> The FRU and RAS EEPROMs share the same I2C bus on Aldebaran and Vega 20
>> ASICs. Set the FRU bus "pointer" to this single bus, as access to the FRU
>> is sought through that bus "pointer" and not through the RAS bus "pointer".
>>
>> Cc: Roy Sun 
>> Cc: Alex Deucher 
>> Fixes: 9c6d2ba7057abb ("drm/amd: Expose the FRU SMU I2C bus")
>> Signed-off-by: Luben Tuikov 
>> ---
>>  drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c | 3 ++-
>>  drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 1 +
>>  2 files changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c 
>> b/drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c
>> index 87acb089cfccd5..dd2d66090d2374 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c
>> @@ -741,7 +741,7 @@ int smu_v11_0_i2c_control_init(struct amdgpu_device 
>> *adev)
>> i2c_set_adapdata(control, smu_i2c);
>>
>> adev->pm.ras_eeprom_i2c_bus = >pm.smu_i2c[0].adapter;
>> -   adev->pm.fru_eeprom_i2c_bus = NULL;
>> +   adev->pm.fru_eeprom_i2c_bus = >pm.smu_i2c[0].adapter;
>>
>> res = i2c_add_adapter(control);
>> if (res)
>> @@ -756,6 +756,7 @@ void smu_v11_0_i2c_control_fini(struct amdgpu_device 
>> *adev)
>>
>> i2c_del_adapter(control);
>> adev->pm.ras_eeprom_i2c_bus = NULL;
>> +   adev->pm.fru_eeprom_i2c_bus = NULL;
>>  }
>>
>>  /*
>> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c 
>> b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
>> index ba4c9771ffc56b..3d96d4af92d97f 100644
>> --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
>> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
>> @@ -1583,6 +1583,7 @@ static int aldebaran_i2c_control_init(struct 
>> smu_context *smu)
>> }
>>
>> adev->pm.ras_eeprom_i2c_bus = >pm.smu_i2c[0].adapter;
>> +   adev->pm.fru_eeprom_i2c_bus = >pm.smu_i2c[0].adapter;
> 
> Does this need to be set to NULL in control_fini() for consistency?
> Other than that, looks good to me.

It already is--I didn't need to add it as I did above for Vega20.

Regards,
Luben

> 
> Alex
> 
> 
>>
>> return 0;
>>  Out_err:
>>
>> base-commit: 24b54e10043609a55194badff58bf91189c49793
>> prerequisite-patch-id: d694f05bfcdf100c1d6957afa209e304dad98404
>> prerequisite-patch-id: b106a5bd7e0d3a7c225a767ca4384162353cca9a
>> prerequisite-patch-id: 77d04393fc872e4f4dd158659bd44ba40b749883
>> --
>> 2.35.0.3.gb23dac905b
>>

Regards,
-- 
Luben

Re: [PATCH] drm/amdgpu: Set FRU bus for Aldebaran and Vega 20

2022-02-04 Thread Alex Deucher

On Fri, Feb 4, 2022 at 1:48 PM Luben Tuikov  wrote:
>
> The FRU and RAS EEPROMs share the same I2C bus on Aldebaran and Vega 20
> ASICs. Set the FRU bus "pointer" to this single bus, as access to the FRU
> is sought through that bus "pointer" and not through the RAS bus "pointer".
>
> Cc: Roy Sun 
> Cc: Alex Deucher 
> Fixes: 9c6d2ba7057abb ("drm/amd: Expose the FRU SMU I2C bus")
> Signed-off-by: Luben Tuikov 
> ---
>  drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c | 3 ++-
>  drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 1 +
>  2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c 
> b/drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c
> index 87acb089cfccd5..dd2d66090d2374 100644
> --- a/drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c
> +++ b/drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c
> @@ -741,7 +741,7 @@ int smu_v11_0_i2c_control_init(struct amdgpu_device *adev)
> i2c_set_adapdata(control, smu_i2c);
>
> adev->pm.ras_eeprom_i2c_bus = >pm.smu_i2c[0].adapter;
> -   adev->pm.fru_eeprom_i2c_bus = NULL;
> +   adev->pm.fru_eeprom_i2c_bus = >pm.smu_i2c[0].adapter;
>
> res = i2c_add_adapter(control);
> if (res)
> @@ -756,6 +756,7 @@ void smu_v11_0_i2c_control_fini(struct amdgpu_device 
> *adev)
>
> i2c_del_adapter(control);
> adev->pm.ras_eeprom_i2c_bus = NULL;
> +   adev->pm.fru_eeprom_i2c_bus = NULL;
>  }
>
>  /*
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
> index ba4c9771ffc56b..3d96d4af92d97f 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
> @@ -1583,6 +1583,7 @@ static int aldebaran_i2c_control_init(struct 
> smu_context *smu)
> }
>
> adev->pm.ras_eeprom_i2c_bus = >pm.smu_i2c[0].adapter;
> +   adev->pm.fru_eeprom_i2c_bus = >pm.smu_i2c[0].adapter;

Does this need to be set to NULL in control_fini() for consistency?
Other than that, looks good to me.

Alex


>
> return 0;
>  Out_err:
>
> base-commit: 24b54e10043609a55194badff58bf91189c49793
> prerequisite-patch-id: d694f05bfcdf100c1d6957afa209e304dad98404
> prerequisite-patch-id: b106a5bd7e0d3a7c225a767ca4384162353cca9a
> prerequisite-patch-id: 77d04393fc872e4f4dd158659bd44ba40b749883
> --
> 2.35.0.3.gb23dac905b
>

Re: [PATCH] drm/amdgpu: move lockdep assert to the right place.

2022-02-04 Thread Christian König


Am 04.02.22 um 19:47 schrieb Bhardwaj, Rajneesh:


On 2/4/2022 1:32 PM, Christian König wrote:

Am 04.02.22 um 19:12 schrieb Bhardwaj, Rajneesh:

[Sorry for top posting]

Hi Christian

I think you forgot the below hunk, without which the issue is not 
fixed completely on a multi GPU system.


No, that is perfectly intentional. While removing a bo_va structure 
it can happen that there are still mappings attached to it (for 
example because the application crashed).



OK. but for me, at boot time, I see flood of warning messages coming 
from  amdgpu_vm_bo_del so the previous patch doesn't help.


Do you have a backtrace? That should not happen.

Could be that we have this buggy at a couple of different places.

Regards,
Christian.






Because of this locking the VM before the remove is mandatory. Only 
while adding a bo_va structure we can avoid that.


Regards,
Christian.



diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

index dcc80d6e099e..6f68fc9da56a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2670,8 +2670,6 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev,
    struct amdgpu_vm *vm = bo_va->base.vm;
    struct amdgpu_vm_bo_base **base;

- dma_resv_assert_held(vm->root.bo->tbo.base.resv);
-
    if (bo) {
    dma_resv_assert_held(bo->tbo.base.resv);
    if (bo->tbo.base.resv == vm->root.bo->tbo.base.resv)


If you chose to include the above hunk, please feel free to add

Reviewed-and-tested-by: Rajneesh Bhardwaj 

On 2/4/2022 11:27 AM, Felix Kuehling wrote:


Am 2022-02-04 um 03:52 schrieb Christian König:

Since newly added BOs don't have any mappings it's ok to add them
without holding the VM lock. Only when we add per VM BOs the lock is
mandatory.

Signed-off-by: Christian König 
Reported-by: Bhardwaj, Rajneesh 


Reviewed-by: Felix Kuehling 



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

index fdc6a1fd74af..dcc80d6e099e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -375,6 +375,8 @@ static void amdgpu_vm_bo_base_init(struct 
amdgpu_vm_bo_base *base,

  if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv)
  return;
  + dma_resv_assert_held(vm->root.bo->tbo.base.resv);
+
  vm->bulk_moveable = false;
  if (bo->tbo.type == ttm_bo_type_kernel && bo->parent)
  amdgpu_vm_bo_relocated(base);
@@ -2260,8 +2262,6 @@ struct amdgpu_bo_va *amdgpu_vm_bo_add(struct 
amdgpu_device *adev,

  {
  struct amdgpu_bo_va *bo_va;
  - dma_resv_assert_held(vm->root.bo->tbo.base.resv);
-
  bo_va = kzalloc(sizeof(struct amdgpu_bo_va), GFP_KERNEL);
  if (bo_va == NULL) {
  return NULL;

[PATCH] drm/amdgpu: Set FRU bus for Aldebaran and Vega 20

2022-02-04 Thread Luben Tuikov

The FRU and RAS EEPROMs share the same I2C bus on Aldebaran and Vega 20
ASICs. Set the FRU bus "pointer" to this single bus, as access to the FRU
is sought through that bus "pointer" and not through the RAS bus "pointer".

Cc: Roy Sun 
Cc: Alex Deucher 
Fixes: 9c6d2ba7057abb ("drm/amd: Expose the FRU SMU I2C bus")
Signed-off-by: Luben Tuikov 
---
 drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c | 3 ++-
 drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c 
b/drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c
index 87acb089cfccd5..dd2d66090d2374 100644
--- a/drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c
+++ b/drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c
@@ -741,7 +741,7 @@ int smu_v11_0_i2c_control_init(struct amdgpu_device *adev)
i2c_set_adapdata(control, smu_i2c);
 
adev->pm.ras_eeprom_i2c_bus = >pm.smu_i2c[0].adapter;
-   adev->pm.fru_eeprom_i2c_bus = NULL;
+   adev->pm.fru_eeprom_i2c_bus = >pm.smu_i2c[0].adapter;
 
res = i2c_add_adapter(control);
if (res)
@@ -756,6 +756,7 @@ void smu_v11_0_i2c_control_fini(struct amdgpu_device *adev)
 
i2c_del_adapter(control);
adev->pm.ras_eeprom_i2c_bus = NULL;
+   adev->pm.fru_eeprom_i2c_bus = NULL;
 }
 
 /*
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
index ba4c9771ffc56b..3d96d4af92d97f 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
@@ -1583,6 +1583,7 @@ static int aldebaran_i2c_control_init(struct smu_context 
*smu)
}
 
adev->pm.ras_eeprom_i2c_bus = >pm.smu_i2c[0].adapter;
+   adev->pm.fru_eeprom_i2c_bus = >pm.smu_i2c[0].adapter;
 
return 0;
 Out_err:

base-commit: 24b54e10043609a55194badff58bf91189c49793
prerequisite-patch-id: d694f05bfcdf100c1d6957afa209e304dad98404
prerequisite-patch-id: b106a5bd7e0d3a7c225a767ca4384162353cca9a
prerequisite-patch-id: 77d04393fc872e4f4dd158659bd44ba40b749883
-- 
2.35.0.3.gb23dac905b

Re: [PATCH] drm/amdgpu: move lockdep assert to the right place.

2022-02-04 Thread Bhardwaj, Rajneesh




On 2/4/2022 1:32 PM, Christian König wrote:

Am 04.02.22 um 19:12 schrieb Bhardwaj, Rajneesh:

[Sorry for top posting]

Hi Christian

I think you forgot the below hunk, without which the issue is not 
fixed completely on a multi GPU system.


No, that is perfectly intentional. While removing a bo_va structure it 
can happen that there are still mappings attached to it (for example 
because the application crashed).



OK. but for me, at boot time, I see flood of warning messages coming 
from  amdgpu_vm_bo_del so the previous patch doesn't help.





Because of this locking the VM before the remove is mandatory. Only 
while adding a bo_va structure we can avoid that.


Regards,
Christian.



diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

index dcc80d6e099e..6f68fc9da56a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2670,8 +2670,6 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev,
    struct amdgpu_vm *vm = bo_va->base.vm;
    struct amdgpu_vm_bo_base **base;

-   dma_resv_assert_held(vm->root.bo->tbo.base.resv);
-
    if (bo) {
    dma_resv_assert_held(bo->tbo.base.resv);
    if (bo->tbo.base.resv == vm->root.bo->tbo.base.resv)


If you chose to include the above hunk, please feel free to add

Reviewed-and-tested-by: Rajneesh Bhardwaj 

On 2/4/2022 11:27 AM, Felix Kuehling wrote:


Am 2022-02-04 um 03:52 schrieb Christian König:

Since newly added BOs don't have any mappings it's ok to add them
without holding the VM lock. Only when we add per VM BOs the lock is
mandatory.

Signed-off-by: Christian König 
Reported-by: Bhardwaj, Rajneesh 


Reviewed-by: Felix Kuehling 



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

index fdc6a1fd74af..dcc80d6e099e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -375,6 +375,8 @@ static void amdgpu_vm_bo_base_init(struct 
amdgpu_vm_bo_base *base,

  if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv)
  return;
  + dma_resv_assert_held(vm->root.bo->tbo.base.resv);
+
  vm->bulk_moveable = false;
  if (bo->tbo.type == ttm_bo_type_kernel && bo->parent)
  amdgpu_vm_bo_relocated(base);
@@ -2260,8 +2262,6 @@ struct amdgpu_bo_va *amdgpu_vm_bo_add(struct 
amdgpu_device *adev,

  {
  struct amdgpu_bo_va *bo_va;
  - dma_resv_assert_held(vm->root.bo->tbo.base.resv);
-
  bo_va = kzalloc(sizeof(struct amdgpu_bo_va), GFP_KERNEL);
  if (bo_va == NULL) {
  return NULL;

Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler

2022-02-04 Thread Deucher, Alexander

[Public]

Seems like this functionality should be moved up into the callers.  Maybe add 
new IP callbacks (dump_reset_registers()) so that each IP can specify what 
registers are relevant for a reset debugging and then we can walk the IP list 
and call the callback before we call the asic_reset callbacks.

Alex

From: amd-gfx  on behalf of Deucher, 
Alexander 
Sent: Friday, February 4, 2022 1:41 PM
To: Sharma, Shashank ; Lazar, Lijo 
; amd-gfx@lists.freedesktop.org 

Cc: Somalapuram, Amaranath ; Koenig, Christian 

Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler

[Public]

[Public]

In the suspend and hibernate cases, we don't care.  In most cases the power 
rail will be cut once the system enters suspend so it doesn't really matter.  
That's why we call the asic reset callback directly rather than going through 
the whole recovery process.  The device is already quiescent at this point we 
just want to make sure the device is in a known state when we come out of 
suspend (in case suspend overall fails).

Alex

From: Sharma, Shashank 
Sent: Friday, February 4, 2022 12:22 PM
To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org 

Cc: Deucher, Alexander ; Somalapuram, Amaranath 
; Koenig, Christian 
Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler

On 2/4/2022 6:20 PM, Lazar, Lijo wrote:
> [AMD Official Use Only]
>
> One more thing
>In suspend-reset case, won't this cause to schedule a work item on 
> suspend? I don't know if that is a good idea, ideally we would like to clean 
> up all work items before going to suspend.
>
> Thanks,
> Lijo

Again, this opens scope for discussion. What if there is a GPU error
during suspend-reset, which is very probable case.

- Shashank

>
> -Original Message-
> From: Sharma, Shashank 
> Sent: Friday, February 4, 2022 10:47 PM
> To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Somalapuram, Amaranath 
> ; Koenig, Christian 
> Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler
>
>
>
> On 2/4/2022 6:11 PM, Lazar, Lijo wrote:
>> BTW, since this is already providing a set of values it would be useful to 
>> provide one more field as the reset reason - RAS error recovery, GPU hung 
>> recovery or something else.
>
> Adding this additional parameter instead of blocking something in kernel, 
> seems like a better idea. The app can filter out and read what it is 
> interested into.
>
> - Shashank

[PATCH 05/13] drm/amd/display: remove static from optc31_set_drr

2022-02-04 Thread Jasdeep Dhillon

From: Eric Bernstein 

remove static from optc31_set_drr

Reviewed-by: Nevenko Stupar 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Eric Bernstein 
---
 drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c | 2 +-
 drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c 
b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c
index e8562fa11366..8afe2130d7c5 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c
@@ -161,7 +161,7 @@ static bool optc31_immediate_disable_crtc(struct 
timing_generator *optc)
return true;
 }
 
-static void optc31_set_drr(
+void optc31_set_drr(
struct timing_generator *optc,
const struct drr_params *params)
 {
diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h 
b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h
index d8ef2f0d0c95..a37b16040c1d 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h
@@ -256,4 +256,6 @@
 
 void dcn31_timing_generator_init(struct optc *optc1);
 
+void optc31_set_drr(struct timing_generator *optc, const struct drr_params 
*params);
+
 #endif /* __DC_OPTC_DCN31_H__ */
-- 
2.25.1

[PATCH 04/13] drm/amd/display: Basic support with device ID

2022-02-04 Thread Jasdeep Dhillon

From: Oliver Logush 

[why]
To get the the cyan_skillfish check working

Reviewed-by: Charlene Liu 
Reviewed-by: Charlene Liu 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Oliver Logush 
---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 24 +--
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |  2 +-
 .../gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c  |  2 +-
 .../gpu/drm/amd/display/dc/core/dc_resource.c |  2 +-
 .../gpu/drm/amd/display/include/dal_asic_id.h |  3 ++-
 5 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 8f53c9f6b267..f5941e59e5ad 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1014,6 +1014,14 @@ static  void amdgpu_dm_audio_eld_notify(struct 
amdgpu_device *adev, int pin)
}
 }
 
+bool is_skillfish_series(struct amdgpu_device *adev)
+{
+   if (adev->asic_type == CHIP_CYAN_SKILLFISH || adev->pdev->revision == 
0x143F) {
+   return true;
+   }
+   return false;
+}
+
 static int dm_dmub_hw_init(struct amdgpu_device *adev)
 {
const struct dmcub_firmware_header_v1_0 *hdr;
@@ -1049,7 +1057,7 @@ static int dm_dmub_hw_init(struct amdgpu_device *adev)
return -EINVAL;
}
 
-   if (!has_hw_support) {
+   if (is_skillfish_series(adev)) {
DRM_INFO("DMUB unsupported on ASIC\n");
return 0;
}
@@ -1471,6 +1479,10 @@ static int amdgpu_dm_init(struct amdgpu_device *adev)
default:
break;
}
+   if (is_skillfish_series(adev)) {
+   init_data.flags.disable_dmcu = true;
+   break;
+   }
break;
}
 
@@ -1777,7 +1789,6 @@ static int load_dmcu_fw(struct amdgpu_device *adev)
case CHIP_VEGA10:
case CHIP_VEGA12:
case CHIP_VEGA20:
-   return 0;
case CHIP_NAVI12:
fw_name_dmcu = FIRMWARE_NAVI12_DMCU;
break;
@@ -1805,6 +1816,9 @@ static int load_dmcu_fw(struct amdgpu_device *adev)
default:
break;
}
+   if (is_skillfish_series(adev)) {
+   return 0;
+   }
DRM_ERROR("Unsupported ASIC type: 0x%X\n", adev->asic_type);
return -EINVAL;
}
@@ -4515,6 +4529,12 @@ static int dm_early_init(void *handle)
adev->mode_info.num_dig = 6;
break;
default:
+   if (is_skillfish_series(adev)) {
+   adev->mode_info.num_crtc = 2;
+   adev->mode_info.num_hpd = 2;
+   adev->mode_info.num_dig = 2;
+   break;
+   }
 #if defined(CONFIG_DRM_AMD_DC_DCN)
switch (adev->ip_versions[DCE_HWIP][0]) {
case IP_VERSION(2, 0, 2):
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
index e35977fda5c1..13875d669acd 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
@@ -82,7 +82,7 @@ struct common_irq_params {
enum dc_irq_source irq_src;
atomic64_t previous_timestamp;
 };
-
+bool is_skillfish_series(struct amdgpu_device *adev);
 /**
  * struct dm_compressor_info - Buffer info used by frame buffer compression
  * @cpu_addr: MMIO cpu addr
diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c 
b/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c
index 9200c8ce02ba..54ef83fe5a9b 100644
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c
+++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c
@@ -259,7 +259,7 @@ struct clk_mgr *dc_clk_mgr_create(struct dc_context *ctx, 
struct pp_smu_funcs *p
dcn3_clk_mgr_construct(ctx, clk_mgr, pp_smu, dccg);
return _mgr->base;
}
-   if (asic_id.chip_id == DEVICE_ID_NV_13FE) {
+   if (asic_id.chip_id == DEVICE_ID_NV_NAVI10_LITE_P_13FE) {
dcn201_clk_mgr_construct(ctx, clk_mgr, pp_smu, dccg);
return _mgr->base;
}
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
index 9df66501a453..8a199d661a66 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
@@ -135,7 +135,7 @@ enum dce_version resource_parse_asic_id(struct hw_asic_id 
asic_id)
 
case FAMILY_NV:
dc_version = DCN_VERSION_2_0;
-   if (asic_id.chip_id == DEVICE_ID_NV_13FE) {
+   if (asic_id.chip_id == DEVICE_ID_NV_NAVI10_LITE_P_13FE || 
asic_id.chip_id == DEVICE_ID_NV_NAVI10_LITE_P_143F) {

[PATCH 03/13] drm/amd/display: limit unbounded requesting to 5k

2022-02-04 Thread Jasdeep Dhillon

From: Dmytro Laktyushkin 

Unbounded requesting is unsupported on pipe split modes
and this change prevents us running into such a situation
with wide modes.

Reviewed-by: Charlene Liu 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Dmytro Laktyushkin 
---
 drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c
index 7f9ceda4229b..007a7dc4f5be 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c
@@ -1836,7 +1836,8 @@ static int dcn31_populate_dml_pipes_from_context(
if (is_dual_plane(pipe->plane_state->format)
&& pipe->plane_state->src_rect.width <= 1920 && 
pipe->plane_state->src_rect.height <= 1080) {
dc->config.enable_4to1MPC = true;
-   } else if (!is_dual_plane(pipe->plane_state->format)) {
+   } else if (!is_dual_plane(pipe->plane_state->format) && 
pipe->plane_state->src_rect.width <= 5120) {
+   /* Limit to 5k max to avoid forced pipe split when 
there is not enough detile for swath */
context->bw_ctx.dml.ip.det_buffer_size_kbytes = 192;
pipes[0].pipe.src.unbounded_req_mode = true;
}
-- 
2.25.1

[PATCH 02/13] drm/amd/display: Fix stream->link_enc unassigned during stream removal

2022-02-04 Thread Jasdeep Dhillon

From: Nicholas Kazlauskas 

[Why]
Found when running igt@kms_atomic.

Userspace attempts to do a TEST_COMMIT when 0 streams which calls
dc_remove_stream_from_ctx. This in turn calls link_enc_unassign
which ends up modifying stream->link = NULL directly, causing the
global link_enc to be removed preventing further link activity
and future link validation from passing.

[How]
We take care of link_enc unassignment at the start of
link_enc_cfg_link_encs_assign so this call is no longer necessary.

Fixes global state from being modified while unlocked.

Reviewed-by: Jimmy Kizito 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Nicholas Kazlauskas 
---
 drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
index e82aa0559bdf..9df66501a453 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
@@ -1965,10 +1965,6 @@ enum dc_status dc_remove_stream_from_ctx(
dc->res_pool,
del_pipe->stream_res.stream_enc,
false);
-   /* Release link encoder from stream in new dc_state. */
-   if (dc->res_pool->funcs->link_enc_unassign)
-   dc->res_pool->funcs->link_enc_unassign(new_ctx, 
del_pipe->stream);
-
if (is_dp_128b_132b_signal(del_pipe)) {
update_hpo_dp_stream_engine_usage(
_ctx->res_ctx, dc->res_pool,
-- 
2.25.1

[PATCH 01/13] drm/amd/display: Fix for variable may be used uninitialized error

2022-02-04 Thread Jasdeep Dhillon

From: Eric Bernstein 

[Why]
Build failure due to ‘status’ may be used uninitialized

[How]
Initialize status to LINK_TRAINING_SUCCESS

Reviewed-by: Wenjing Liu 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Eric Bernstein 
---
 drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
index 18fe8f77821a..d0cb40df60a4 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
@@ -3199,7 +3199,7 @@ static bool dp_verify_link_cap(
bool success = false;
bool skip_video_pattern;
enum clock_source_id dp_cs_id = get_clock_source_id(link);
-   enum link_training_result status;
+   enum link_training_result status = LINK_TRAINING_SUCCESS;
union hpd_irq_data irq_data;
struct link_resource link_res;
 
-- 
2.25.1

[PATCH 00/13] DC Patches Feb 7, 2022

2022-02-04 Thread Jasdeep Dhillon

This DC patchset brings improvements in multiple areas. In summary, we have:
-fix for build failure uninitalized error
-Bug fix for DP2 using uncertified cable
-limit unbounded request to 5k
-fix DP LT sequence on EQ fail
-Bug fixes for S3/S4

Anthony Koo (1):
  drm/amd/display: [FW Promotion] Release 0.0.103.0

Aric Cyr (1):
  drm/amd/display: 3.2.172

Dmytro Laktyushkin (2):
  drm/amd/display: limit unbounded requesting to 5k
  drm/amd/display: fix yellow carp wm clamping

Eric Bernstein (2):
  drm/amd/display: Fix for variable may be used uninitialized error
  drm/amd/display: remove static from optc31_set_drr

Guo, Bing (1):
  dc: do blocked MST topology discovery at resume from S3/S4

Ilya (1):
  drm/amd/display: Fix DP LT sequence on EQ fail

Martin Tsai (1):
  drm/amd/display: handle null link encoder

Nicholas Kazlauskas (1):
  drm/amd/display: Fix stream->link_enc unassigned during stream removal

Oliver Logush (1):
  drm/amd/display: Basic support with device ID

Paul Hsieh (1):
  drm/amd/display: change fastboot timing validation

Zhan Liu (1):
  drm/amd/display: keep eDP Vdd on when eDP stream is already enabled

 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 24 +++-
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |  2 +-
 .../gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c  |  2 +-
 drivers/gpu/drm/amd/display/dc/core/dc.c  |  2 +-
 drivers/gpu/drm/amd/display/dc/core/dc_link.c | 27 ++--
 .../gpu/drm/amd/display/dc/core/dc_link_dp.c  |  4 +-
 .../gpu/drm/amd/display/dc/core/dc_resource.c |  8 +--
 drivers/gpu/drm/amd/display/dc/dc.h   |  4 +-
 drivers/gpu/drm/amd/display/dc/dc_link.h  |  1 +
 .../display/dc/dce110/dce110_hw_sequencer.c   | 27 +++-
 .../drm/amd/display/dc/dcn20/dcn20_resource.c | 11 +---
 .../drm/amd/display/dc/dcn31/dcn31_hubbub.c   | 61 ++-
 .../gpu/drm/amd/display/dc/dcn31/dcn31_optc.c |  2 +-
 .../gpu/drm/amd/display/dc/dcn31/dcn31_optc.h |  2 +
 .../drm/amd/display/dc/dcn31/dcn31_resource.c |  3 +-
 .../gpu/drm/amd/display/dmub/inc/dmub_cmd.h   |  4 +-
 .../gpu/drm/amd/display/include/dal_asic_id.h |  3 +-
 17 files changed, 104 insertions(+), 83 deletions(-)

-- 
2.25.1

Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler

2022-02-04 Thread Deucher, Alexander

[Public]

In the suspend and hibernate cases, we don't care.  In most cases the power 
rail will be cut once the system enters suspend so it doesn't really matter.  
That's why we call the asic reset callback directly rather than going through 
the whole recovery process.  The device is already quiescent at this point we 
just want to make sure the device is in a known state when we come out of 
suspend (in case suspend overall fails).

Alex



From: Sharma, Shashank 
Sent: Friday, February 4, 2022 12:22 PM
To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org 

Cc: Deucher, Alexander ; Somalapuram, Amaranath 
; Koenig, Christian 
Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler



On 2/4/2022 6:20 PM, Lazar, Lijo wrote:
> [AMD Official Use Only]
>
> One more thing
>In suspend-reset case, won't this cause to schedule a work item on 
> suspend? I don't know if that is a good idea, ideally we would like to clean 
> up all work items before going to suspend.
>
> Thanks,
> Lijo

Again, this opens scope for discussion. What if there is a GPU error
during suspend-reset, which is very probable case.

- Shashank

>
> -Original Message-
> From: Sharma, Shashank 
> Sent: Friday, February 4, 2022 10:47 PM
> To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Somalapuram, Amaranath 
> ; Koenig, Christian 
> Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler
>
>
>
> On 2/4/2022 6:11 PM, Lazar, Lijo wrote:
>> BTW, since this is already providing a set of values it would be useful to 
>> provide one more field as the reset reason - RAS error recovery, GPU hung 
>> recovery or something else.
>
> Adding this additional parameter instead of blocking something in kernel, 
> seems like a better idea. The app can filter out and read what it is 
> interested into.
>
> - Shashank

Re: [PATCH] drm/amdgpu: move lockdep assert to the right place.

2022-02-04 Thread Christian König


Am 04.02.22 um 19:12 schrieb Bhardwaj, Rajneesh:

[Sorry for top posting]

Hi Christian

I think you forgot the below hunk, without which the issue is not 
fixed completely on a multi GPU system.


No, that is perfectly intentional. While removing a bo_va structure it 
can happen that there are still mappings attached to it (for example 
because the application crashed).


Because of this locking the VM before the remove is mandatory. Only 
while adding a bo_va structure we can avoid that.


Regards,
Christian.



diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

index dcc80d6e099e..6f68fc9da56a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2670,8 +2670,6 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev,
    struct amdgpu_vm *vm = bo_va->base.vm;
    struct amdgpu_vm_bo_base **base;

-   dma_resv_assert_held(vm->root.bo->tbo.base.resv);
-
    if (bo) {
    dma_resv_assert_held(bo->tbo.base.resv);
    if (bo->tbo.base.resv == vm->root.bo->tbo.base.resv)


If you chose to include the above hunk, please feel free to add

Reviewed-and-tested-by: Rajneesh Bhardwaj 

On 2/4/2022 11:27 AM, Felix Kuehling wrote:


Am 2022-02-04 um 03:52 schrieb Christian König:

Since newly added BOs don't have any mappings it's ok to add them
without holding the VM lock. Only when we add per VM BOs the lock is
mandatory.

Signed-off-by: Christian König 
Reported-by: Bhardwaj, Rajneesh 


Reviewed-by: Felix Kuehling 



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

index fdc6a1fd74af..dcc80d6e099e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -375,6 +375,8 @@ static void amdgpu_vm_bo_base_init(struct 
amdgpu_vm_bo_base *base,

  if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv)
  return;
  +    dma_resv_assert_held(vm->root.bo->tbo.base.resv);
+
  vm->bulk_moveable = false;
  if (bo->tbo.type == ttm_bo_type_kernel && bo->parent)
  amdgpu_vm_bo_relocated(base);
@@ -2260,8 +2262,6 @@ struct amdgpu_bo_va *amdgpu_vm_bo_add(struct 
amdgpu_device *adev,

  {
  struct amdgpu_bo_va *bo_va;
  -    dma_resv_assert_held(vm->root.bo->tbo.base.resv);
-
  bo_va = kzalloc(sizeof(struct amdgpu_bo_va), GFP_KERNEL);
  if (bo_va == NULL) {
  return NULL;

Re: [PATCH] drm/amdgpu: move lockdep assert to the right place.

2022-02-04 Thread Bhardwaj, Rajneesh


[Sorry for top posting]

Hi Christian

I think you forgot the below hunk, without which the issue is not fixed 
completely on a multi GPU system.


diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

index dcc80d6e099e..6f68fc9da56a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2670,8 +2670,6 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev,
    struct amdgpu_vm *vm = bo_va->base.vm;
    struct amdgpu_vm_bo_base **base;

-   dma_resv_assert_held(vm->root.bo->tbo.base.resv);
-
    if (bo) {
    dma_resv_assert_held(bo->tbo.base.resv);
    if (bo->tbo.base.resv == vm->root.bo->tbo.base.resv)


If you chose to include the above hunk, please feel free to add

Reviewed-and-tested-by: Rajneesh Bhardwaj 

On 2/4/2022 11:27 AM, Felix Kuehling wrote:


Am 2022-02-04 um 03:52 schrieb Christian König:

Since newly added BOs don't have any mappings it's ok to add them
without holding the VM lock. Only when we add per VM BOs the lock is
mandatory.

Signed-off-by: Christian König 
Reported-by: Bhardwaj, Rajneesh 


Reviewed-by: Felix Kuehling 



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

index fdc6a1fd74af..dcc80d6e099e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -375,6 +375,8 @@ static void amdgpu_vm_bo_base_init(struct 
amdgpu_vm_bo_base *base,

  if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv)
  return;
  +    dma_resv_assert_held(vm->root.bo->tbo.base.resv);
+
  vm->bulk_moveable = false;
  if (bo->tbo.type == ttm_bo_type_kernel && bo->parent)
  amdgpu_vm_bo_relocated(base);
@@ -2260,8 +2262,6 @@ struct amdgpu_bo_va *amdgpu_vm_bo_add(struct 
amdgpu_device *adev,

  {
  struct amdgpu_bo_va *bo_va;
  -    dma_resv_assert_held(vm->root.bo->tbo.base.resv);
-
  bo_va = kzalloc(sizeof(struct amdgpu_bo_va), GFP_KERNEL);
  if (bo_va == NULL) {
  return NULL;

[Important!] 2022 X.Org Foundation Membership deadline for voting in the election

2022-02-04 Thread Lyude Paul

The 2022 X.Org Foundation elections are rapidly approaching. We will be
forwarding instructions on the nomination process to membership in the
near future.

Please note that only current members can vote in the upcoming election,
and that the deadline for new memberships or renewals to vote in the
upcoming election is March 17th 2022 at 23:59 UTC.

If you are interested in joining the X.Org Foundation or in renewing
your membership, please visit the membership system site at:

https://members.x.org/

You can find the current election schedule here:

https://www.x.org/wiki/BoardOfDirectors/Elections/2022/

    Lyude Paul,
    On behalf of the X.Org elections committee

Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler

2022-02-04 Thread Sharma, Shashank





On 2/4/2022 6:20 PM, Lazar, Lijo wrote:

[AMD Official Use Only]

One more thing
In suspend-reset case, won't this cause to schedule a work item on 
suspend? I don't know if that is a good idea, ideally we would like to clean up 
all work items before going to suspend.

Thanks,
Lijo


Again, this opens scope for discussion. What if there is a GPU error 
during suspend-reset, which is very probable case.


- Shashank



-Original Message-
From: Sharma, Shashank 
Sent: Friday, February 4, 2022 10:47 PM
To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Somalapuram, Amaranath 
; Koenig, Christian 
Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler



On 2/4/2022 6:11 PM, Lazar, Lijo wrote:

BTW, since this is already providing a set of values it would be useful to 
provide one more field as the reset reason - RAS error recovery, GPU hung 
recovery or something else.


Adding this additional parameter instead of blocking something in kernel, seems 
like a better idea. The app can filter out and read what it is interested into.

- Shashank

RE: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler

2022-02-04 Thread Lazar, Lijo

[AMD Official Use Only]

One more thing 
In suspend-reset case, won't this cause to schedule a work item on 
suspend? I don't know if that is a good idea, ideally we would like to clean up 
all work items before going to suspend.

Thanks,
Lijo

-Original Message-
From: Sharma, Shashank  
Sent: Friday, February 4, 2022 10:47 PM
To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Somalapuram, Amaranath 
; Koenig, Christian 
Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler

On 2/4/2022 6:11 PM, Lazar, Lijo wrote:
> BTW, since this is already providing a set of values it would be useful to 
> provide one more field as the reset reason - RAS error recovery, GPU hung 
> recovery or something else.

Adding this additional parameter instead of blocking something in kernel, seems 
like a better idea. The app can filter out and read what it is interested into.

- Shashank

Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler

2022-02-04 Thread Sharma, Shashank





On 2/4/2022 6:11 PM, Lazar, Lijo wrote:

BTW, since this is already providing a set of values it would be useful to 
provide one more field as the reset reason - RAS error recovery, GPU hung 
recovery or something else.


Adding this additional parameter instead of blocking something in 
kernel, seems like a better idea. The app can filter out and read what 
it is interested into.


- Shashank

RE: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler

2022-02-04 Thread Lazar, Lijo

[AMD Official Use Only]

No, otherwise driver reset only for GPU recovery purpose. S3/S4 is not meant 
for recovery purpose. It's just a precautionary reset to make sure that 
everything works fine on resume.

BTW, since this is already providing a set of values it would be useful to 
provide one more field as the reset reason - RAS error recovery, GPU hung 
recovery or something else.

Thanks,
Lijo

-Original Message-
From: Sharma, Shashank  
Sent: Friday, February 4, 2022 10:37 PM
To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Somalapuram, Amaranath 
; Koenig, Christian 
Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler



On 2/4/2022 6:02 PM, Lazar, Lijo wrote:
> [Public]
> 
> The problem is app doesn't know why the reset happened. It just receives a 
> bunch of registers to be read. On what basis an app can filter this out?
>

Again, that is contextual analysis capability, which needs to be embedded in 
the reader app. Even if we filter out the S3/S4 resets in the kernel, the 
situation remains the same, isn't it ?

- Shashank

> Thanks,
> Lijo
> 
> -Original Message-
> From: Sharma, Shashank 
> Sent: Friday, February 4, 2022 10:29 PM
> To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Somalapuram, 
> Amaranath ; Koenig, Christian 
> 
> Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler
> 
> 
> 
> On 2/4/2022 5:50 PM, Lazar, Lijo wrote:
>> [AMD Official Use Only]
>>
>> To explain more -
>>  It's an unconditional reset done by the kernel on every suspend 
>> (S3/S4). In such a case which process is going to receive the trace events?
>>
>> Most likely use case would be related to gpu recovery. Triggering a trace on 
>> every reset doesn't look like a good idea.
>>
> 
> If you observer carefully, we are just providing an infrastructure, the 
> application's intention is unknown to us. In my opinion it's rather not a 
> good idea to apply a filter in kernel, with our interpretation of intention.
> 
> For example if an app just wants to count how many resets are happening due 
> to S3/S4 transition, this infra might become useless. It would rather be a 
> better idea for the app to learn and ignore these scenarios which it is not 
> interested in.
> 
> This could eventually be just difference in design philosophy maybe :)
> 
> - Shashank
> 
>> Thanks,
>> Lijo
>>
>> -Original Message-
>> From: Sharma, Shashank 
>> Sent: Friday, February 4, 2022 10:09 PM
>> To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org
>> Cc: Deucher, Alexander ; Somalapuram, 
>> Amaranath ; Koenig, Christian 
>> 
>> Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler
>>
>> Hey Lijo,
>> I somehow missed to respond on this comment, pls find inline:
>>
>> Regards
>> Shashank
>>
>> On 1/22/2022 7:42 AM, Lazar, Lijo wrote:
>>>
>>>
>>> On 1/22/2022 2:04 AM, Sharma, Shashank wrote:
    From 899ec6060eb7d8a3d4d56ab439e4e6cdd74190a4 Mon Sep 17 
 00:00:00
 2001
 From: Somalapuram Amaranath 
 Date: Fri, 21 Jan 2022 14:19:42 +0530
 Subject: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler

 This patch adds a GPU reset handler for Navi ASIC family, which 
 typically dumps some of the registersand sends a trace event.

 V2: Accomodated call to work function to send uevent

 Signed-off-by: Somalapuram Amaranath 
 
 Signed-off-by: Shashank Sharma 
 ---
     drivers/gpu/drm/amd/amdgpu/nv.c | 28 
     1 file changed, 28 insertions(+)

 diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c 
 b/drivers/gpu/drm/amd/amdgpu/nv.c index 01efda4398e5..ada35d4c5245
 100644
 --- a/drivers/gpu/drm/amd/amdgpu/nv.c
 +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
 @@ -528,10 +528,38 @@ nv_asic_reset_method(struct amdgpu_device
 *adev)
     }
     }

 +static void amdgpu_reset_dumps(struct amdgpu_device *adev) {
 +    int r = 0, i;
 +
 +    /* original raven doesn't have full asic reset */
 +    if ((adev->apu_flags & AMD_APU_IS_RAVEN) &&
 +    !(adev->apu_flags & AMD_APU_IS_RAVEN2))
 +    return;
 +    for (i = 0; i < adev->num_ip_blocks; i++) {
 +    if (!adev->ip_blocks[i].status.valid)
 +    continue;
 +    if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps)
 +    continue;
 +    r =
 +adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev);
 +
 +    if (r)
 +    DRM_ERROR("reset_reg_dumps of IP block <%s> failed 
 +%d\n",
 +    adev->ip_blocks[i].version->funcs->name, r);
 +    }
 +
 +    /* Schedule work to send uevent */
 +    if (!queue_work(system_unbound_wq, >gpu_reset_work))
 +    DRM_ERROR("failed to add GPU reset work\n");
 +
 +    dump_stack();
 +}
 +
     static int nv_asic_reset(struct amdgpu_device *adev)

Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler

2022-02-04 Thread Sharma, Shashank





On 2/4/2022 6:02 PM, Lazar, Lijo wrote:

[Public]

The problem is app doesn't know why the reset happened. It just receives a 
bunch of registers to be read. On what basis an app can filter this out?



Again, that is contextual analysis capability, which needs to be 
embedded in the reader app. Even if we filter out the S3/S4 resets in 
the kernel, the situation remains the same, isn't it ?


- Shashank


Thanks,
Lijo

-Original Message-
From: Sharma, Shashank 
Sent: Friday, February 4, 2022 10:29 PM
To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Somalapuram, Amaranath 
; Koenig, Christian 
Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler



On 2/4/2022 5:50 PM, Lazar, Lijo wrote:

[AMD Official Use Only]

To explain more -
It's an unconditional reset done by the kernel on every suspend 
(S3/S4). In such a case which process is going to receive the trace events?

Most likely use case would be related to gpu recovery. Triggering a trace on 
every reset doesn't look like a good idea.



If you observer carefully, we are just providing an infrastructure, the 
application's intention is unknown to us. In my opinion it's rather not a good 
idea to apply a filter in kernel, with our interpretation of intention.

For example if an app just wants to count how many resets are happening due to 
S3/S4 transition, this infra might become useless. It would rather be a better 
idea for the app to learn and ignore these scenarios which it is not interested 
in.

This could eventually be just difference in design philosophy maybe :)

- Shashank


Thanks,
Lijo

-Original Message-
From: Sharma, Shashank 
Sent: Friday, February 4, 2022 10:09 PM
To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Somalapuram,
Amaranath ; Koenig, Christian

Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler

Hey Lijo,
I somehow missed to respond on this comment, pls find inline:

Regards
Shashank

On 1/22/2022 7:42 AM, Lazar, Lijo wrote:



On 1/22/2022 2:04 AM, Sharma, Shashank wrote:

   From 899ec6060eb7d8a3d4d56ab439e4e6cdd74190a4 Mon Sep 17 00:00:00
2001
From: Somalapuram Amaranath 
Date: Fri, 21 Jan 2022 14:19:42 +0530
Subject: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler

This patch adds a GPU reset handler for Navi ASIC family, which
typically dumps some of the registersand sends a trace event.

V2: Accomodated call to work function to send uevent

Signed-off-by: Somalapuram Amaranath 
Signed-off-by: Shashank Sharma 
---
    drivers/gpu/drm/amd/amdgpu/nv.c | 28 
    1 file changed, 28 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c
b/drivers/gpu/drm/amd/amdgpu/nv.c index 01efda4398e5..ada35d4c5245
100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -528,10 +528,38 @@ nv_asic_reset_method(struct amdgpu_device
*adev)
    }
    }

+static void amdgpu_reset_dumps(struct amdgpu_device *adev) {
+    int r = 0, i;
+
+    /* original raven doesn't have full asic reset */
+    if ((adev->apu_flags & AMD_APU_IS_RAVEN) &&
+    !(adev->apu_flags & AMD_APU_IS_RAVEN2))
+    return;
+    for (i = 0; i < adev->num_ip_blocks; i++) {
+    if (!adev->ip_blocks[i].status.valid)
+    continue;
+    if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps)
+    continue;
+    r =
+adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev);
+
+    if (r)
+    DRM_ERROR("reset_reg_dumps of IP block <%s> failed
+%d\n",
+    adev->ip_blocks[i].version->funcs->name, r);
+    }
+
+    /* Schedule work to send uevent */
+    if (!queue_work(system_unbound_wq, >gpu_reset_work))
+    DRM_ERROR("failed to add GPU reset work\n");
+
+    dump_stack();
+}
+
    static int nv_asic_reset(struct amdgpu_device *adev)
    {
    int ret = 0;

+    amdgpu_reset_dumps(adev);


Had a comment on this before. Now there are different reasons (or
even no reason like a precautionary reset) to perform reset. A user
would be interested in a trace only if the reason is valid.

To clarify on why a work shouldn't be scheduled on every reset, check
here -

https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/am
d
gpu/amdgpu_drv.c#L2188

In the example you pointed to, they have a criteria to decide what is a valid 
reset in their context, in the kernel side itself. So they can take a call if 
they want to do something about it or not.

But, in our case, we want to send the trace_event to user with some register 
values on every reset, and it is actually up to the profiling app to interpret 
(along with what it wants to call a GPU reset). So I don't think this is 
causing a considerable overhead.

- Shashank




Thanks,
Lijo


    switch (nv_asic_reset_method(adev)) {
    case AMD_RESET_METHOD_PCI:
    dev_info(adev->dev, "PCI reset\n");

RE: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler

2022-02-04 Thread Lazar, Lijo

[Public]

The problem is app doesn't know why the reset happened. It just receives a 
bunch of registers to be read. On what basis an app can filter this out?

Thanks,
Lijo

-Original Message-
From: Sharma, Shashank  
Sent: Friday, February 4, 2022 10:29 PM
To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Somalapuram, Amaranath 
; Koenig, Christian 
Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler



On 2/4/2022 5:50 PM, Lazar, Lijo wrote:
> [AMD Official Use Only]
> 
> To explain more -
>   It's an unconditional reset done by the kernel on every suspend 
> (S3/S4). In such a case which process is going to receive the trace events?
> 
> Most likely use case would be related to gpu recovery. Triggering a trace on 
> every reset doesn't look like a good idea.
>

If you observer carefully, we are just providing an infrastructure, the 
application's intention is unknown to us. In my opinion it's rather not a good 
idea to apply a filter in kernel, with our interpretation of intention.

For example if an app just wants to count how many resets are happening due to 
S3/S4 transition, this infra might become useless. It would rather be a better 
idea for the app to learn and ignore these scenarios which it is not interested 
in.

This could eventually be just difference in design philosophy maybe :)

- Shashank

> Thanks,
> Lijo
> 
> -Original Message-
> From: Sharma, Shashank 
> Sent: Friday, February 4, 2022 10:09 PM
> To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Somalapuram, 
> Amaranath ; Koenig, Christian 
> 
> Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler
> 
> Hey Lijo,
> I somehow missed to respond on this comment, pls find inline:
> 
> Regards
> Shashank
> 
> On 1/22/2022 7:42 AM, Lazar, Lijo wrote:
>>
>>
>> On 1/22/2022 2:04 AM, Sharma, Shashank wrote:
>>>   From 899ec6060eb7d8a3d4d56ab439e4e6cdd74190a4 Mon Sep 17 00:00:00
>>> 2001
>>> From: Somalapuram Amaranath 
>>> Date: Fri, 21 Jan 2022 14:19:42 +0530
>>> Subject: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler
>>>
>>> This patch adds a GPU reset handler for Navi ASIC family, which 
>>> typically dumps some of the registersand sends a trace event.
>>>
>>> V2: Accomodated call to work function to send uevent
>>>
>>> Signed-off-by: Somalapuram Amaranath 
>>> Signed-off-by: Shashank Sharma 
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/nv.c | 28 
>>>    1 file changed, 28 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c 
>>> b/drivers/gpu/drm/amd/amdgpu/nv.c index 01efda4398e5..ada35d4c5245
>>> 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
>>> @@ -528,10 +528,38 @@ nv_asic_reset_method(struct amdgpu_device
>>> *adev)
>>>    }
>>>    }
>>>
>>> +static void amdgpu_reset_dumps(struct amdgpu_device *adev) {
>>> +    int r = 0, i;
>>> +
>>> +    /* original raven doesn't have full asic reset */
>>> +    if ((adev->apu_flags & AMD_APU_IS_RAVEN) &&
>>> +    !(adev->apu_flags & AMD_APU_IS_RAVEN2))
>>> +    return;
>>> +    for (i = 0; i < adev->num_ip_blocks; i++) {
>>> +    if (!adev->ip_blocks[i].status.valid)
>>> +    continue;
>>> +    if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps)
>>> +    continue;
>>> +    r =
>>> +adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev);
>>> +
>>> +    if (r)
>>> +    DRM_ERROR("reset_reg_dumps of IP block <%s> failed 
>>> +%d\n",
>>> +    adev->ip_blocks[i].version->funcs->name, r);
>>> +    }
>>> +
>>> +    /* Schedule work to send uevent */
>>> +    if (!queue_work(system_unbound_wq, >gpu_reset_work))
>>> +    DRM_ERROR("failed to add GPU reset work\n");
>>> +
>>> +    dump_stack();
>>> +}
>>> +
>>>    static int nv_asic_reset(struct amdgpu_device *adev)
>>>    {
>>>    int ret = 0;
>>>
>>> +    amdgpu_reset_dumps(adev);
>>
>> Had a comment on this before. Now there are different reasons (or 
>> even no reason like a precautionary reset) to perform reset. A user 
>> would be interested in a trace only if the reason is valid.
>>
>> To clarify on why a work shouldn't be scheduled on every reset, check 
>> here -
>>
>> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/am
>> d
>> gpu/amdgpu_drv.c#L2188
> In the example you pointed to, they have a criteria to decide what is a valid 
> reset in their context, in the kernel side itself. So they can take a call if 
> they want to do something about it or not.
> 
> But, in our case, we want to send the trace_event to user with some register 
> values on every reset, and it is actually up to the profiling app to 
> interpret (along with what it wants to call a GPU reset). So I don't think 
> this is causing a considerable overhead.
> 
> - Shashank
>>
>>
>>
>> Thanks,
>> Lijo
>>
>>>    switch (nv_asic_reset_method(adev)) {
>>>    case AMD_RESET_METHOD_PCI:
>>>

Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler

2022-02-04 Thread Sharma, Shashank





On 2/4/2022 5:50 PM, Lazar, Lijo wrote:

[AMD Official Use Only]

To explain more -
It's an unconditional reset done by the kernel on every suspend 
(S3/S4). In such a case which process is going to receive the trace events?

Most likely use case would be related to gpu recovery. Triggering a trace on 
every reset doesn't look like a good idea.



If you observer carefully, we are just providing an infrastructure, the 
application's intention is unknown to us. In my opinion it's rather not 
a good idea to apply a filter in kernel, with our interpretation of 
intention.


For example if an app just wants to count how many resets are happening 
due to S3/S4 transition, this infra might become useless. It would 
rather be a better idea for the app to learn and ignore these scenarios 
which it is not interested in.


This could eventually be just difference in design philosophy maybe :)

- Shashank


Thanks,
Lijo

-Original Message-
From: Sharma, Shashank 
Sent: Friday, February 4, 2022 10:09 PM
To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Somalapuram, Amaranath 
; Koenig, Christian 
Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler

Hey Lijo,
I somehow missed to respond on this comment, pls find inline:

Regards
Shashank

On 1/22/2022 7:42 AM, Lazar, Lijo wrote:



On 1/22/2022 2:04 AM, Sharma, Shashank wrote:

  From 899ec6060eb7d8a3d4d56ab439e4e6cdd74190a4 Mon Sep 17 00:00:00
2001
From: Somalapuram Amaranath 
Date: Fri, 21 Jan 2022 14:19:42 +0530
Subject: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler

This patch adds a GPU reset handler for Navi ASIC family, which
typically dumps some of the registersand sends a trace event.

V2: Accomodated call to work function to send uevent

Signed-off-by: Somalapuram Amaranath 
Signed-off-by: Shashank Sharma 
---
   drivers/gpu/drm/amd/amdgpu/nv.c | 28 
   1 file changed, 28 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c
b/drivers/gpu/drm/amd/amdgpu/nv.c index 01efda4398e5..ada35d4c5245
100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -528,10 +528,38 @@ nv_asic_reset_method(struct amdgpu_device
*adev)
   }
   }

+static void amdgpu_reset_dumps(struct amdgpu_device *adev) {
+    int r = 0, i;
+
+    /* original raven doesn't have full asic reset */
+    if ((adev->apu_flags & AMD_APU_IS_RAVEN) &&
+    !(adev->apu_flags & AMD_APU_IS_RAVEN2))
+    return;
+    for (i = 0; i < adev->num_ip_blocks; i++) {
+    if (!adev->ip_blocks[i].status.valid)
+    continue;
+    if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps)
+    continue;
+    r =
+adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev);
+
+    if (r)
+    DRM_ERROR("reset_reg_dumps of IP block <%s> failed
+%d\n",
+    adev->ip_blocks[i].version->funcs->name, r);
+    }
+
+    /* Schedule work to send uevent */
+    if (!queue_work(system_unbound_wq, >gpu_reset_work))
+    DRM_ERROR("failed to add GPU reset work\n");
+
+    dump_stack();
+}
+
   static int nv_asic_reset(struct amdgpu_device *adev)
   {
   int ret = 0;

+    amdgpu_reset_dumps(adev);


Had a comment on this before. Now there are different reasons (or even
no reason like a precautionary reset) to perform reset. A user would
be interested in a trace only if the reason is valid.

To clarify on why a work shouldn't be scheduled on every reset, check
here -

https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amd
gpu/amdgpu_drv.c#L2188

In the example you pointed to, they have a criteria to decide what is a valid 
reset in their context, in the kernel side itself. So they can take a call if 
they want to do something about it or not.

But, in our case, we want to send the trace_event to user with some register 
values on every reset, and it is actually up to the profiling app to interpret 
(along with what it wants to call a GPU reset). So I don't think this is 
causing a considerable overhead.

- Shashank




Thanks,
Lijo


   switch (nv_asic_reset_method(adev)) {
   case AMD_RESET_METHOD_PCI:
   dev_info(adev->dev, "PCI reset\n");

RE: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler

2022-02-04 Thread Lazar, Lijo

[AMD Official Use Only]

To explain more -
It's an unconditional reset done by the kernel on every suspend 
(S3/S4). In such a case which process is going to receive the trace events?

Most likely use case would be related to gpu recovery. Triggering a trace on 
every reset doesn't look like a good idea.

Thanks,
Lijo

-Original Message-
From: Sharma, Shashank  
Sent: Friday, February 4, 2022 10:09 PM
To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Somalapuram, Amaranath 
; Koenig, Christian 
Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler

Hey Lijo,
I somehow missed to respond on this comment, pls find inline:

Regards
Shashank

On 1/22/2022 7:42 AM, Lazar, Lijo wrote:
> 
> 
> On 1/22/2022 2:04 AM, Sharma, Shashank wrote:
>>  From 899ec6060eb7d8a3d4d56ab439e4e6cdd74190a4 Mon Sep 17 00:00:00 
>> 2001
>> From: Somalapuram Amaranath 
>> Date: Fri, 21 Jan 2022 14:19:42 +0530
>> Subject: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler
>>
>> This patch adds a GPU reset handler for Navi ASIC family, which 
>> typically dumps some of the registersand sends a trace event.
>>
>> V2: Accomodated call to work function to send uevent
>>
>> Signed-off-by: Somalapuram Amaranath 
>> Signed-off-by: Shashank Sharma 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/nv.c | 28 
>>   1 file changed, 28 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c 
>> b/drivers/gpu/drm/amd/amdgpu/nv.c index 01efda4398e5..ada35d4c5245 
>> 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
>> @@ -528,10 +528,38 @@ nv_asic_reset_method(struct amdgpu_device 
>> *adev)
>>   }
>>   }
>>
>> +static void amdgpu_reset_dumps(struct amdgpu_device *adev) {
>> +    int r = 0, i;
>> +
>> +    /* original raven doesn't have full asic reset */
>> +    if ((adev->apu_flags & AMD_APU_IS_RAVEN) &&
>> +    !(adev->apu_flags & AMD_APU_IS_RAVEN2))
>> +    return;
>> +    for (i = 0; i < adev->num_ip_blocks; i++) {
>> +    if (!adev->ip_blocks[i].status.valid)
>> +    continue;
>> +    if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps)
>> +    continue;
>> +    r = 
>> +adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev);
>> +
>> +    if (r)
>> +    DRM_ERROR("reset_reg_dumps of IP block <%s> failed 
>> +%d\n",
>> +    adev->ip_blocks[i].version->funcs->name, r);
>> +    }
>> +
>> +    /* Schedule work to send uevent */
>> +    if (!queue_work(system_unbound_wq, >gpu_reset_work))
>> +    DRM_ERROR("failed to add GPU reset work\n");
>> +
>> +    dump_stack();
>> +}
>> +
>>   static int nv_asic_reset(struct amdgpu_device *adev)
>>   {
>>   int ret = 0;
>>
>> +    amdgpu_reset_dumps(adev);
> 
> Had a comment on this before. Now there are different reasons (or even 
> no reason like a precautionary reset) to perform reset. A user would 
> be interested in a trace only if the reason is valid.
> 
> To clarify on why a work shouldn't be scheduled on every reset, check 
> here -
> 
> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amd
> gpu/amdgpu_drv.c#L2188
In the example you pointed to, they have a criteria to decide what is a valid 
reset in their context, in the kernel side itself. So they can take a call if 
they want to do something about it or not.

But, in our case, we want to send the trace_event to user with some register 
values on every reset, and it is actually up to the profiling app to interpret 
(along with what it wants to call a GPU reset). So I don't think this is 
causing a considerable overhead.

- Shashank
> 
> 
> 
> Thanks,
> Lijo
> 
>>   switch (nv_asic_reset_method(adev)) {
>>   case AMD_RESET_METHOD_PCI:
>>   dev_info(adev->dev, "PCI reset\n");

Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler

2022-02-04 Thread Sharma, Shashank


Hey Lijo,
I somehow missed to respond on this comment, pls find inline:

Regards
Shashank

On 1/22/2022 7:42 AM, Lazar, Lijo wrote:



On 1/22/2022 2:04 AM, Sharma, Shashank wrote:

 From 899ec6060eb7d8a3d4d56ab439e4e6cdd74190a4 Mon Sep 17 00:00:00 2001
From: Somalapuram Amaranath 
Date: Fri, 21 Jan 2022 14:19:42 +0530
Subject: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler

This patch adds a GPU reset handler for Navi ASIC family, which
typically dumps some of the registersand sends a trace event.

V2: Accomodated call to work function to send uevent

Signed-off-by: Somalapuram Amaranath 
Signed-off-by: Shashank Sharma 
---
  drivers/gpu/drm/amd/amdgpu/nv.c | 28 
  1 file changed, 28 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c 
b/drivers/gpu/drm/amd/amdgpu/nv.c

index 01efda4398e5..ada35d4c5245 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -528,10 +528,38 @@ nv_asic_reset_method(struct amdgpu_device *adev)
  }
  }

+static void amdgpu_reset_dumps(struct amdgpu_device *adev)
+{
+    int r = 0, i;
+
+    /* original raven doesn't have full asic reset */
+    if ((adev->apu_flags & AMD_APU_IS_RAVEN) &&
+    !(adev->apu_flags & AMD_APU_IS_RAVEN2))
+    return;
+    for (i = 0; i < adev->num_ip_blocks; i++) {
+    if (!adev->ip_blocks[i].status.valid)
+    continue;
+    if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps)
+    continue;
+    r = adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev);
+
+    if (r)
+    DRM_ERROR("reset_reg_dumps of IP block <%s> failed %d\n",
+    adev->ip_blocks[i].version->funcs->name, r);
+    }
+
+    /* Schedule work to send uevent */
+    if (!queue_work(system_unbound_wq, >gpu_reset_work))
+    DRM_ERROR("failed to add GPU reset work\n");
+
+    dump_stack();
+}
+
  static int nv_asic_reset(struct amdgpu_device *adev)
  {
  int ret = 0;

+    amdgpu_reset_dumps(adev);


Had a comment on this before. Now there are different reasons (or even 
no reason like a precautionary reset) to perform reset. A user would be 
interested in a trace only if the reason is valid.


To clarify on why a work shouldn't be scheduled on every reset, check 
here -


https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c#L2188 
In the example you pointed to, they have a criteria to decide what is a 
valid reset in their context, in the kernel side itself. So they can 
take a call if they want to do something about it or not.


But, in our case, we want to send the trace_event to user with some 
register values on every reset, and it is actually up to the profiling 
app to interpret (along with what it wants to call a GPU reset). So I 
don't think this is causing a considerable overhead.


- Shashank




Thanks,
Lijo


  switch (nv_asic_reset_method(adev)) {
  case AMD_RESET_METHOD_PCI:
  dev_info(adev->dev, "PCI reset\n");

Re: [PATCH] drm/amdgpu/display: change pipe policy for DCN 2.0

2022-02-04 Thread Harry Wentland

On 2022-02-03 13:58, Alex Deucher wrote:
> Fixes hangs on driver load on DCN 2.0 parts.
> 
> Bug: https://bugzilla.kernel.org/show_bug.cgi?id=215511
> Fixes: ee2698cf79cc ("drm/amd/display: Changed pipe split policy to allow for 
> multi-display pipe split")
> Signed-off-by: Alex Deucher 

Reviewed-by: Harry Wentland 

Harry

> ---
>  drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c 
> b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
> index fcf388b509db..d9b3f449bf9b 100644
> --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
> +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
> @@ -1069,7 +1069,7 @@ static const struct dc_debug_options debug_defaults_drv 
> = {
>   .timing_trace = false,
>   .clock_trace = true,
>   .disable_pplib_clock_request = true,
> - .pipe_split_policy = MPC_SPLIT_DYNAMIC,
> + .pipe_split_policy = MPC_SPLIT_AVOID_MULT_DISP,
>   .force_single_disp_pipe_split = false,
>   .disable_dcc = DCC_ENABLE,
>   .vsr_support = true,

Re: [PATCH] drm/amdgpu: move lockdep assert to the right place.

2022-02-04 Thread Felix Kuehling




Am 2022-02-04 um 03:52 schrieb Christian König:

Since newly added BOs don't have any mappings it's ok to add them
without holding the VM lock. Only when we add per VM BOs the lock is
mandatory.

Signed-off-by: Christian König 
Reported-by: Bhardwaj, Rajneesh 


Reviewed-by: Felix Kuehling 



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index fdc6a1fd74af..dcc80d6e099e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -375,6 +375,8 @@ static void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base 
*base,
if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv)
return;
  
+	dma_resv_assert_held(vm->root.bo->tbo.base.resv);

+
vm->bulk_moveable = false;
if (bo->tbo.type == ttm_bo_type_kernel && bo->parent)
amdgpu_vm_bo_relocated(base);
@@ -2260,8 +2262,6 @@ struct amdgpu_bo_va *amdgpu_vm_bo_add(struct 
amdgpu_device *adev,
  {
struct amdgpu_bo_va *bo_va;
  
-	dma_resv_assert_held(vm->root.bo->tbo.base.resv);

-
bo_va = kzalloc(sizeof(struct amdgpu_bo_va), GFP_KERNEL);
if (bo_va == NULL) {
return NULL;

Re: [PATCH 3/3] drm/amdgpu: move dpcs_3_0_3 headers from dcn to dpcs

2022-02-04 Thread Harry Wentland

Series is
Reviewed-by: Harry Wentland 

Harry

On 2022-02-03 13:58, Alex Deucher wrote:
> To align with other headers.
> 
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c   | 4 ++--
>  .../amd/include/asic_reg/{dcn => dpcs}/dpcs_3_0_3_offset.h| 0
>  .../amd/include/asic_reg/{dcn => dpcs}/dpcs_3_0_3_sh_mask.h   | 0
>  3 files changed, 2 insertions(+), 2 deletions(-)
>  rename drivers/gpu/drm/amd/include/asic_reg/{dcn => 
> dpcs}/dpcs_3_0_3_offset.h (100%)
>  rename drivers/gpu/drm/amd/include/asic_reg/{dcn => 
> dpcs}/dpcs_3_0_3_sh_mask.h (100%)
> 
> diff --git a/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c 
> b/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c
> index 2de687f64cf6..36649716e991 100644
> --- a/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c
> +++ b/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c
> @@ -48,8 +48,8 @@
>  #include "sienna_cichlid_ip_offset.h"
>  #include "dcn/dcn_3_0_3_offset.h"
>  #include "dcn/dcn_3_0_3_sh_mask.h"
> -#include "dcn/dpcs_3_0_3_offset.h"
> -#include "dcn/dpcs_3_0_3_sh_mask.h"
> +#include "dpcs/dpcs_3_0_3_offset.h"
> +#include "dpcs/dpcs_3_0_3_sh_mask.h"
>  #include "nbio/nbio_2_3_offset.h"
>  
>  #define DC_LOGGER_INIT(logger)
> diff --git a/drivers/gpu/drm/amd/include/asic_reg/dcn/dpcs_3_0_3_offset.h 
> b/drivers/gpu/drm/amd/include/asic_reg/dpcs/dpcs_3_0_3_offset.h
> similarity index 100%
> rename from drivers/gpu/drm/amd/include/asic_reg/dcn/dpcs_3_0_3_offset.h
> rename to drivers/gpu/drm/amd/include/asic_reg/dpcs/dpcs_3_0_3_offset.h
> diff --git a/drivers/gpu/drm/amd/include/asic_reg/dcn/dpcs_3_0_3_sh_mask.h 
> b/drivers/gpu/drm/amd/include/asic_reg/dpcs/dpcs_3_0_3_sh_mask.h
> similarity index 100%
> rename from drivers/gpu/drm/amd/include/asic_reg/dcn/dpcs_3_0_3_sh_mask.h
> rename to drivers/gpu/drm/amd/include/asic_reg/dpcs/dpcs_3_0_3_sh_mask.h

Re: [PATCH] drm/amdgpu: Fix recursive locking warning

2022-02-04 Thread Christian König


Am 04.02.22 um 17:23 schrieb Felix Kuehling:


Am 2022-02-04 um 02:13 schrieb Christian König:

Am 04.02.22 um 04:11 schrieb Rajneesh Bhardwaj:

Noticed the below warning while running a pytorch workload on vega10
GPUs. Change to trylock to avoid conflicts with already held 
reservation

locks.

[  +0.03] WARNING: possible recursive locking detected
[  +0.03] 5.13.0-kfd-rajneesh #1030 Not tainted
[  +0.04] 
[  +0.02] python/4822 is trying to acquire lock:
[  +0.04] 932cd9a259f8 
(reservation_ww_class_mutex){+.+.}-{3:3},

at: amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000203]
   but task is already holding lock:
[  +0.03] 932cbb7181f8 
(reservation_ww_class_mutex){+.+.}-{3:3},

at: ttm_eu_reserve_buffers+0x270/0x470 [ttm]
[  +0.17]
   other info that might help us debug this:
[  +0.02]  Possible unsafe locking scenario:

[  +0.03]    CPU0
[  +0.02]    
[  +0.02]   lock(reservation_ww_class_mutex);
[  +0.04]   lock(reservation_ww_class_mutex);
[  +0.03]
    *** DEADLOCK ***

[  +0.02]  May be due to missing lock nesting notation

[  +0.03] 7 locks held by python/4822:
[  +0.03]  #0: 932c4ac028d0 (>mutex){+.+.}-{3:3}, at:
kfd_ioctl_map_memory_to_gpu+0x10b/0x320 [amdgpu]
[  +0.000232]  #1: 932c55e830a8 (>lock#2){+.+.}-{3:3}, at:
amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x64/0xf60 [amdgpu]
[  +0.000241]  #2: 932cc45b5e68 (&(*mem)->lock){+.+.}-{3:3}, at:
amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0xdf/0xf60 [amdgpu]
[  +0.000236]  #3: b2b35606fd28
(reservation_ww_class_acquire){+.+.}-{0:0}, at:
amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x232/0xf60 [amdgpu]
[  +0.000235]  #4: 932cbb7181f8
(reservation_ww_class_mutex){+.+.}-{3:3}, at:
ttm_eu_reserve_buffers+0x270/0x470 [ttm]
[  +0.15]  #5: c045f700 (*(sspp++)){}-{0:0}, at:
drm_dev_enter+0x5/0xa0 [drm]
[  +0.38]  #6: 932c52da7078 (>eviction_lock){+.+.}-{3:3},
at: amdgpu_vm_bo_update_mapping+0xd5/0x4f0 [amdgpu]
[  +0.000195]
   stack backtrace:
[  +0.03] CPU: 11 PID: 4822 Comm: python Not tainted
5.13.0-kfd-rajneesh #1030
[  +0.05] Hardware name: GIGABYTE MZ01-CE0-00/MZ01-CE0-00, BIOS F02
08/29/2018
[  +0.03] Call Trace:
[  +0.03]  dump_stack+0x6d/0x89
[  +0.10]  __lock_acquire+0xb93/0x1a90
[  +0.09]  lock_acquire+0x25d/0x2d0
[  +0.05]  ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000184]  ? lock_is_held_type+0xa2/0x110
[  +0.06]  ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000184]  __ww_mutex_lock.constprop.17+0xca/0x1060
[  +0.07]  ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000183]  ? lock_release+0x13f/0x270
[  +0.05]  ? lock_is_held_type+0xa2/0x110
[  +0.06]  ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000183]  amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000185]  ttm_bo_release+0x4c6/0x580 [ttm]
[  +0.10]  amdgpu_bo_unref+0x1a/0x30 [amdgpu]
[  +0.000183]  amdgpu_vm_free_table+0x76/0xa0 [amdgpu]
[  +0.000189]  amdgpu_vm_free_pts+0xb8/0xf0 [amdgpu]
[  +0.000189]  amdgpu_vm_update_ptes+0x411/0x770 [amdgpu]
[  +0.000191]  amdgpu_vm_bo_update_mapping+0x324/0x4f0 [amdgpu]
[  +0.000191]  amdgpu_vm_bo_update+0x251/0x610 [amdgpu]
[  +0.000191]  update_gpuvm_pte+0xcc/0x290 [amdgpu]
[  +0.000229]  ? amdgpu_vm_bo_map+0xd7/0x130 [amdgpu]
[  +0.000190] amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x912/0xf60
[amdgpu]
[  +0.000234]  kfd_ioctl_map_memory_to_gpu+0x182/0x320 [amdgpu]
[  +0.000218]  kfd_ioctl+0x2b9/0x600 [amdgpu]
[  +0.000216]  ? kfd_ioctl_unmap_memory_from_gpu+0x270/0x270 [amdgpu]
[  +0.000216]  ? lock_release+0x13f/0x270
[  +0.06]  ? __fget_files+0x107/0x1e0
[  +0.07]  __x64_sys_ioctl+0x8b/0xd0
[  +0.07]  do_syscall_64+0x36/0x70
[  +0.04]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  +0.07] RIP: 0033:0x7fbff90a7317
[  +0.04] Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00
48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48
[  +0.05] RSP: 002b:7fbe301fe648 EFLAGS: 0246 ORIG_RAX:
0010
[  +0.06] RAX: ffda RBX: 7fbcc402d820 RCX:
7fbff90a7317
[  +0.03] RDX: 7fbe301fe690 RSI: c0184b18 RDI:
0004
[  +0.03] RBP: 7fbe301fe690 R08:  R09:
7fbcc402d880
[  +0.03] R10: 02001000 R11: 0246 R12:
c0184b18
[  +0.03] R13: 0004 R14: 7fbf689593a0 R15:
7fbcc402d820

Cc: Christian König 
Cc: Felix Kuehling 
Cc: Alex Deucher 

Fixes: 627b92ef9d7c ("drm/amdgpu: Wipe all VRAM on free when RAS is
enabled")
Signed-off-by: Rajneesh Bhardwaj 


The fixes tag is not necessarily correct, I would remove that.

But apart from that the patch is Reviewed-by: Christian König 
.


I suggested the Fixes tag since it was my patch

Re: [PATCH] drm/amdgpu: Fix recursive locking warning

2022-02-04 Thread Felix Kuehling




Am 2022-02-04 um 02:13 schrieb Christian König:

Am 04.02.22 um 04:11 schrieb Rajneesh Bhardwaj:

Noticed the below warning while running a pytorch workload on vega10
GPUs. Change to trylock to avoid conflicts with already held reservation
locks.

[  +0.03] WARNING: possible recursive locking detected
[  +0.03] 5.13.0-kfd-rajneesh #1030 Not tainted
[  +0.04] 
[  +0.02] python/4822 is trying to acquire lock:
[  +0.04] 932cd9a259f8 (reservation_ww_class_mutex){+.+.}-{3:3},
at: amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000203]
   but task is already holding lock:
[  +0.03] 932cbb7181f8 (reservation_ww_class_mutex){+.+.}-{3:3},
at: ttm_eu_reserve_buffers+0x270/0x470 [ttm]
[  +0.17]
   other info that might help us debug this:
[  +0.02]  Possible unsafe locking scenario:

[  +0.03]    CPU0
[  +0.02]    
[  +0.02]   lock(reservation_ww_class_mutex);
[  +0.04]   lock(reservation_ww_class_mutex);
[  +0.03]
    *** DEADLOCK ***

[  +0.02]  May be due to missing lock nesting notation

[  +0.03] 7 locks held by python/4822:
[  +0.03]  #0: 932c4ac028d0 (>mutex){+.+.}-{3:3}, at:
kfd_ioctl_map_memory_to_gpu+0x10b/0x320 [amdgpu]
[  +0.000232]  #1: 932c55e830a8 (>lock#2){+.+.}-{3:3}, at:
amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x64/0xf60 [amdgpu]
[  +0.000241]  #2: 932cc45b5e68 (&(*mem)->lock){+.+.}-{3:3}, at:
amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0xdf/0xf60 [amdgpu]
[  +0.000236]  #3: b2b35606fd28
(reservation_ww_class_acquire){+.+.}-{0:0}, at:
amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x232/0xf60 [amdgpu]
[  +0.000235]  #4: 932cbb7181f8
(reservation_ww_class_mutex){+.+.}-{3:3}, at:
ttm_eu_reserve_buffers+0x270/0x470 [ttm]
[  +0.15]  #5: c045f700 (*(sspp++)){}-{0:0}, at:
drm_dev_enter+0x5/0xa0 [drm]
[  +0.38]  #6: 932c52da7078 (>eviction_lock){+.+.}-{3:3},
at: amdgpu_vm_bo_update_mapping+0xd5/0x4f0 [amdgpu]
[  +0.000195]
   stack backtrace:
[  +0.03] CPU: 11 PID: 4822 Comm: python Not tainted
5.13.0-kfd-rajneesh #1030
[  +0.05] Hardware name: GIGABYTE MZ01-CE0-00/MZ01-CE0-00, BIOS F02
08/29/2018
[  +0.03] Call Trace:
[  +0.03]  dump_stack+0x6d/0x89
[  +0.10]  __lock_acquire+0xb93/0x1a90
[  +0.09]  lock_acquire+0x25d/0x2d0
[  +0.05]  ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000184]  ? lock_is_held_type+0xa2/0x110
[  +0.06]  ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000184]  __ww_mutex_lock.constprop.17+0xca/0x1060
[  +0.07]  ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000183]  ? lock_release+0x13f/0x270
[  +0.05]  ? lock_is_held_type+0xa2/0x110
[  +0.06]  ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000183]  amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000185]  ttm_bo_release+0x4c6/0x580 [ttm]
[  +0.10]  amdgpu_bo_unref+0x1a/0x30 [amdgpu]
[  +0.000183]  amdgpu_vm_free_table+0x76/0xa0 [amdgpu]
[  +0.000189]  amdgpu_vm_free_pts+0xb8/0xf0 [amdgpu]
[  +0.000189]  amdgpu_vm_update_ptes+0x411/0x770 [amdgpu]
[  +0.000191]  amdgpu_vm_bo_update_mapping+0x324/0x4f0 [amdgpu]
[  +0.000191]  amdgpu_vm_bo_update+0x251/0x610 [amdgpu]
[  +0.000191]  update_gpuvm_pte+0xcc/0x290 [amdgpu]
[  +0.000229]  ? amdgpu_vm_bo_map+0xd7/0x130 [amdgpu]
[  +0.000190]  amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x912/0xf60
[amdgpu]
[  +0.000234]  kfd_ioctl_map_memory_to_gpu+0x182/0x320 [amdgpu]
[  +0.000218]  kfd_ioctl+0x2b9/0x600 [amdgpu]
[  +0.000216]  ? kfd_ioctl_unmap_memory_from_gpu+0x270/0x270 [amdgpu]
[  +0.000216]  ? lock_release+0x13f/0x270
[  +0.06]  ? __fget_files+0x107/0x1e0
[  +0.07]  __x64_sys_ioctl+0x8b/0xd0
[  +0.07]  do_syscall_64+0x36/0x70
[  +0.04]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  +0.07] RIP: 0033:0x7fbff90a7317
[  +0.04] Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00
48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48
[  +0.05] RSP: 002b:7fbe301fe648 EFLAGS: 0246 ORIG_RAX:
0010
[  +0.06] RAX: ffda RBX: 7fbcc402d820 RCX:
7fbff90a7317
[  +0.03] RDX: 7fbe301fe690 RSI: c0184b18 RDI:
0004
[  +0.03] RBP: 7fbe301fe690 R08:  R09:
7fbcc402d880
[  +0.03] R10: 02001000 R11: 0246 R12:
c0184b18
[  +0.03] R13: 0004 R14: 7fbf689593a0 R15:
7fbcc402d820

Cc: Christian König 
Cc: Felix Kuehling 
Cc: Alex Deucher 

Fixes: 627b92ef9d7c ("drm/amdgpu: Wipe all VRAM on free when RAS is
enabled")
Signed-off-by: Rajneesh Bhardwaj 


The fixes tag is not necessarily correct, I would remove that.

But apart from that the patch is Reviewed-by: Christian König 
.


I suggested the Fixes tag since it was my patch that introduced the 
problem. Without my patch,

RE: [PATCH v1 3/3] drm/amdgpu: Prevent random memory access in FRU code

2022-02-04 Thread Russell, Kent

One tiny thing to fix before merging

> -Original Message-
> From: Kasiviswanathan, Harish 
> Sent: Friday, February 4, 2022 10:01 AM
> To: Tuikov, Luben ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Tuikov, Luben
> ; Russell, Kent 
> Subject: RE: [PATCH v1 3/3] drm/amdgpu: Prevent random memory access in FRU 
> code
> 
> [AMD Official Use Only]
> 
> This series acked-by: Harish Kasiviswanathan 
> 
> -Original Message-
> From: amd-gfx  On Behalf Of Luben 
> Tuikov
> Sent: Friday, February 4, 2022 12:27 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Tuikov, Luben
> ; Russell, Kent 
> Subject: [PATCH v1 3/3] drm/amdgpu: Prevent random memory access in FRU code
> 
> Prevent random memory access in the FRU EEPROM code by passing the size of
> the destination buffer to the reading routine, and reading no more than the
> size of the buffer.
> 
> Cc: Kent Russell 
> Cc: Alex Deucher 
> Signed-off-by: Luben Tuikov 
> ---
>  .../gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c| 21 +++
>  1 file changed, 12 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
> index 61c4e71e399855..07e045fae83a9a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
> @@ -77,9 +77,10 @@ static bool is_fru_eeprom_supported(struct amdgpu_device 
> *adev)
>  }
> 
>  static int amdgpu_fru_read_eeprom(struct amdgpu_device *adev, uint32_t 
> addrptr,
> -   unsigned char *buf)
> +   unsigned char *buf, size_t buf_size)
>  {
> - int ret, size;
> + int ret;
> + u8 size;
> 
>   ret = amdgpu_eeprom_read(adev->pm.fru_eeprom_i2c_bus, addrptr, buf, 1);
>   if (ret < 1) {
> @@ -90,9 +91,11 @@ static int amdgpu_fru_read_eeprom(struct amdgpu_device 
> *adev,
> uint32_t addrptr,
>   /* The size returned by the i2c requires subtraction of 0xC0 since the
>* size apparently always reports as 0xC0+actual size.
>*/
> - size = buf[0] - I2C_PRODUCT_INFO_OFFSET;


You can also remove this definition from the eeprom.h file, since it's no 
longer needed.

 Kent
> + size = buf[0] & 0x3F;
> + size = min_t(size_t, size, buf_size);
> 
> - ret = amdgpu_eeprom_read(adev->pm.fru_eeprom_i2c_bus, addrptr + 1, buf, 
> size);
> + ret = amdgpu_eeprom_read(adev->pm.fru_eeprom_i2c_bus, addrptr + 1,
> +  buf, size);
>   if (ret < 1) {
>   DRM_WARN("FRU: Failed to get data field");
>   return ret;
> @@ -129,7 +132,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device 
> *adev)
>* and the language field, so just start from 0xb, manufacturer size
>*/
>   addrptr = FRU_EEPROM_MADDR + 0xb;
> - size = amdgpu_fru_read_eeprom(adev, addrptr, buf);
> + size = amdgpu_fru_read_eeprom(adev, addrptr, buf, sizeof(buf));
>   if (size < 1) {
>   DRM_ERROR("Failed to read FRU Manufacturer, ret:%d", size);
>   return -EINVAL;
> @@ -139,7 +142,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device 
> *adev)
>* size field being 1 byte. This pattern continues below.
>*/
>   addrptr += size + 1;
> - size = amdgpu_fru_read_eeprom(adev, addrptr, buf);
> + size = amdgpu_fru_read_eeprom(adev, addrptr, buf, sizeof(buf));
>   if (size < 1) {
>   DRM_ERROR("Failed to read FRU product name, ret:%d", size);
>   return -EINVAL;
> @@ -155,7 +158,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device 
> *adev)
>   adev->product_name[len] = '\0';
> 
>   addrptr += size + 1;
> - size = amdgpu_fru_read_eeprom(adev, addrptr, buf);
> + size = amdgpu_fru_read_eeprom(adev, addrptr, buf, sizeof(buf));
>   if (size < 1) {
>   DRM_ERROR("Failed to read FRU product number, ret:%d", size);
>   return -EINVAL;
> @@ -173,7 +176,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device 
> *adev)
>   adev->product_number[len] = '\0';
> 
>   addrptr += size + 1;
> - size = amdgpu_fru_read_eeprom(adev, addrptr, buf);
> + size = amdgpu_fru_read_eeprom(adev, addrptr, buf, sizeof(buf));
> 
>   if (size < 1) {
>   DRM_ERROR("Failed to read FRU product version, ret:%d", size);
> @@ -181,7 +184,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device 
> *adev)
>   }
> 
>   addrptr += size + 1;
> - size = amdgpu_fru_read_eeprom(adev, addrptr, buf);
> + size = amdgpu_fru_read_eeprom(adev, addrptr, buf, sizeof(buf));
> 
>   if (size < 1) {
>   DRM_ERROR("Failed to read FRU serial number, ret:%d", size);
> --
> 2.35.0.3.gb23dac905b

RE: [PATCH v1 0/3] AMDGPU FRU fixes

2022-02-04 Thread Russell, Kent

[AMD Official Use Only]

Looks good, even if I am a proponent of "buff, don't nerf"

Series is  Reviewed-by: Kent Russell 



> -Original Message-
> From: Tuikov, Luben 
> Sent: Friday, February 4, 2022 12:27 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Tuikov, Luben ; Deucher, Alexander
> ; Russell, Kent ; Grodzovsky,
> Andrey 
> Subject: [PATCH v1 0/3] AMDGPU FRU fixes
>
> Reordered the patches; fixed some bugs.
>
> Luben Tuikov (3):
>   drm/amdgpu: Nerf "buff" to "buf"
>   drm/amdgpu: Don't offset by 2 in FRU EEPROM
>   drm/amdgpu: Prevent random memory access in FRU code
>
> Cc: Alex Deucher 
> Cc: Kent Russell 
> Cc: Andrey Grodzovsky 
>
>  .../gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c| 36 +--
>  1 file changed, 17 insertions(+), 19 deletions(-)
>
> base-commit: af42455918c42274f6f317a88c878d59c4564168
> --
> 2.35.0.3.gb23dac905b

RE: [PATCH v1 3/3] drm/amdgpu: Prevent random memory access in FRU code

2022-02-04 Thread Kasiviswanathan, Harish

[AMD Official Use Only]

This series acked-by: Harish Kasiviswanathan 

-Original Message-
From: amd-gfx  On Behalf Of Luben Tuikov
Sent: Friday, February 4, 2022 12:27 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Tuikov, Luben 
; Russell, Kent 
Subject: [PATCH v1 3/3] drm/amdgpu: Prevent random memory access in FRU code

Prevent random memory access in the FRU EEPROM code by passing the size of
the destination buffer to the reading routine, and reading no more than the
size of the buffer.

Cc: Kent Russell 
Cc: Alex Deucher 
Signed-off-by: Luben Tuikov 
---
 .../gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c| 21 +++
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
index 61c4e71e399855..07e045fae83a9a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
@@ -77,9 +77,10 @@ static bool is_fru_eeprom_supported(struct amdgpu_device 
*adev)
 }
 
 static int amdgpu_fru_read_eeprom(struct amdgpu_device *adev, uint32_t addrptr,
- unsigned char *buf)
+ unsigned char *buf, size_t buf_size)
 {
-   int ret, size;
+   int ret;
+   u8 size;
 
ret = amdgpu_eeprom_read(adev->pm.fru_eeprom_i2c_bus, addrptr, buf, 1);
if (ret < 1) {
@@ -90,9 +91,11 @@ static int amdgpu_fru_read_eeprom(struct amdgpu_device 
*adev, uint32_t addrptr,
/* The size returned by the i2c requires subtraction of 0xC0 since the
 * size apparently always reports as 0xC0+actual size.
 */
-   size = buf[0] - I2C_PRODUCT_INFO_OFFSET;
+   size = buf[0] & 0x3F;
+   size = min_t(size_t, size, buf_size);
 
-   ret = amdgpu_eeprom_read(adev->pm.fru_eeprom_i2c_bus, addrptr + 1, buf, 
size);
+   ret = amdgpu_eeprom_read(adev->pm.fru_eeprom_i2c_bus, addrptr + 1,
+buf, size);
if (ret < 1) {
DRM_WARN("FRU: Failed to get data field");
return ret;
@@ -129,7 +132,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device *adev)
 * and the language field, so just start from 0xb, manufacturer size
 */
addrptr = FRU_EEPROM_MADDR + 0xb;
-   size = amdgpu_fru_read_eeprom(adev, addrptr, buf);
+   size = amdgpu_fru_read_eeprom(adev, addrptr, buf, sizeof(buf));
if (size < 1) {
DRM_ERROR("Failed to read FRU Manufacturer, ret:%d", size);
return -EINVAL;
@@ -139,7 +142,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device *adev)
 * size field being 1 byte. This pattern continues below.
 */
addrptr += size + 1;
-   size = amdgpu_fru_read_eeprom(adev, addrptr, buf);
+   size = amdgpu_fru_read_eeprom(adev, addrptr, buf, sizeof(buf));
if (size < 1) {
DRM_ERROR("Failed to read FRU product name, ret:%d", size);
return -EINVAL;
@@ -155,7 +158,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device *adev)
adev->product_name[len] = '\0';
 
addrptr += size + 1;
-   size = amdgpu_fru_read_eeprom(adev, addrptr, buf);
+   size = amdgpu_fru_read_eeprom(adev, addrptr, buf, sizeof(buf));
if (size < 1) {
DRM_ERROR("Failed to read FRU product number, ret:%d", size);
return -EINVAL;
@@ -173,7 +176,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device *adev)
adev->product_number[len] = '\0';
 
addrptr += size + 1;
-   size = amdgpu_fru_read_eeprom(adev, addrptr, buf);
+   size = amdgpu_fru_read_eeprom(adev, addrptr, buf, sizeof(buf));
 
if (size < 1) {
DRM_ERROR("Failed to read FRU product version, ret:%d", size);
@@ -181,7 +184,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device *adev)
}
 
addrptr += size + 1;
-   size = amdgpu_fru_read_eeprom(adev, addrptr, buf);
+   size = amdgpu_fru_read_eeprom(adev, addrptr, buf, sizeof(buf));
 
if (size < 1) {
DRM_ERROR("Failed to read FRU serial number, ret:%d", size);
-- 
2.35.0.3.gb23dac905b

Re: [PATCH] drm/amdgpu: move lockdep assert to the right place.

2022-02-04 Thread Deucher, Alexander

[Public]

Acked-by: Alex Deucher 

From: amd-gfx  on behalf of Christian 
König 
Sent: Friday, February 4, 2022 3:52 AM
To: Bhardwaj, Rajneesh ; 
amd-gfx@lists.freedesktop.org 
Subject: [PATCH] drm/amdgpu: move lockdep assert to the right place.

Since newly added BOs don't have any mappings it's ok to add them
without holding the VM lock. Only when we add per VM BOs the lock is
mandatory.

Signed-off-by: Christian König 
Reported-by: Bhardwaj, Rajneesh 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index fdc6a1fd74af..dcc80d6e099e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -375,6 +375,8 @@ static void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base 
*base,
 if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv)
 return;

+   dma_resv_assert_held(vm->root.bo->tbo.base.resv);
+
 vm->bulk_moveable = false;
 if (bo->tbo.type == ttm_bo_type_kernel && bo->parent)
 amdgpu_vm_bo_relocated(base);
@@ -2260,8 +2262,6 @@ struct amdgpu_bo_va *amdgpu_vm_bo_add(struct 
amdgpu_device *adev,
 {
 struct amdgpu_bo_va *bo_va;

-   dma_resv_assert_held(vm->root.bo->tbo.base.resv);
-
 bo_va = kzalloc(sizeof(struct amdgpu_bo_va), GFP_KERNEL);
 if (bo_va == NULL) {
 return NULL;
--
2.25.1

Re: [PATCH v11 5/5] drm/amdgpu: add drm buddy support to amdgpu

2022-02-04 Thread Christian König


Am 04.02.22 um 12:22 schrieb Arunpravin:

On 28/01/22 7:48 pm, Matthew Auld wrote:

On Thu, 27 Jan 2022 at 14:11, Arunpravin
 wrote:

- Remove drm_mm references and replace with drm buddy functionalities
- Add res cursor support for drm buddy

v2(Matthew Auld):
   - replace spinlock with mutex as we call kmem_cache_zalloc
 (..., GFP_KERNEL) in drm_buddy_alloc() function

   - lock drm_buddy_block_trim() function as it calls
 mark_free/mark_split are all globally visible

v3(Matthew Auld):
   - remove trim method error handling as we address the failure case
 at drm_buddy_block_trim() function

v4:
   - fix warnings reported by kernel test robot 

v5:
   - fix merge conflict issue

v6:
   - fix warnings reported by kernel test robot 

Signed-off-by: Arunpravin 
---
  drivers/gpu/drm/Kconfig   |   1 +
  .../gpu/drm/amd/amdgpu/amdgpu_res_cursor.h|  97 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h   |   7 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c  | 259 ++
  4 files changed, 231 insertions(+), 133 deletions(-)




-/**
- * amdgpu_vram_mgr_virt_start - update virtual start address
- *
- * @mem: ttm_resource to update
- * @node: just allocated node
- *
- * Calculate a virtual BO start address to easily check if everything is CPU
- * accessible.
- */
-static void amdgpu_vram_mgr_virt_start(struct ttm_resource *mem,
-  struct drm_mm_node *node)
-{
-   unsigned long start;
-
-   start = node->start + node->size;
-   if (start > mem->num_pages)
-   start -= mem->num_pages;
-   else
-   start = 0;
-   mem->start = max(mem->start, start);
-}
-
  /**
   * amdgpu_vram_mgr_new - allocate new ranges
   *
@@ -366,13 +357,13 @@ static int amdgpu_vram_mgr_new(struct 
ttm_resource_manager *man,
const struct ttm_place *place,
struct ttm_resource **res)
  {
-   unsigned long lpfn, num_nodes, pages_per_node, pages_left, pages;
+   unsigned long lpfn, pages_per_node, pages_left, pages, n_pages;
+   u64 vis_usage = 0, mem_bytes, max_bytes, min_page_size;
 struct amdgpu_vram_mgr *mgr = to_vram_mgr(man);
 struct amdgpu_device *adev = to_amdgpu_device(mgr);
-   uint64_t vis_usage = 0, mem_bytes, max_bytes;
-   struct ttm_range_mgr_node *node;
-   struct drm_mm *mm = >mm;
-   enum drm_mm_insert_mode mode;
+   struct amdgpu_vram_mgr_node *node;
+   struct drm_buddy *mm = >mm;
+   struct drm_buddy_block *block;
 unsigned i;
 int r;

@@ -391,10 +382,9 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager 
*man,
 goto error_sub;
 }

-   if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
+   if (place->flags & TTM_PL_FLAG_CONTIGUOUS)
 pages_per_node = ~0ul;
-   num_nodes = 1;
-   } else {
+   else {
  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 pages_per_node = HPAGE_PMD_NR;
  #else
@@ -403,11 +393,9 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager 
*man,
  #endif
 pages_per_node = max_t(uint32_t, pages_per_node,
tbo->page_alignment);
-   num_nodes = DIV_ROUND_UP_ULL(PFN_UP(mem_bytes), pages_per_node);
 }

-   node = kvmalloc(struct_size(node, mm_nodes, num_nodes),
-   GFP_KERNEL | __GFP_ZERO);
+   node = kzalloc(sizeof(*node), GFP_KERNEL);
 if (!node) {
 r = -ENOMEM;
 goto error_sub;
@@ -415,9 +403,17 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager 
*man,

 ttm_resource_init(tbo, place, >base);

-   mode = DRM_MM_INSERT_BEST;
+   INIT_LIST_HEAD(>blocks);
+
 if (place->flags & TTM_PL_FLAG_TOPDOWN)
-   mode = DRM_MM_INSERT_HIGH;
+   node->flags |= DRM_BUDDY_TOPDOWN_ALLOCATION;
+
+   if (place->fpfn || lpfn != man->size)
+   /* Allocate blocks in desired range */
+   node->flags |= DRM_BUDDY_RANGE_ALLOCATION;
+
+   min_page_size = mgr->default_page_size;
+   BUG_ON(min_page_size < mm->chunk_size);

 pages_left = node->base.num_pages;

@@ -425,36 +421,61 @@ static int amdgpu_vram_mgr_new(struct 
ttm_resource_manager *man,
 pages = min(pages_left, 2UL << (30 - PAGE_SHIFT));

 i = 0;
-   spin_lock(>lock);
 while (pages_left) {
-   uint32_t alignment = tbo->page_alignment;
-
 if (pages >= pages_per_node)
-   alignment = pages_per_node;
-
-   r = drm_mm_insert_node_in_range(mm, >mm_nodes[i], pages,
-   alignment, 0, place->fpfn,
-   lpfn, mode);
-   if (unlikely(r)) {
-   if (pages > pages_per_node) {
-

Re: [PATCH v11 5/5] drm/amdgpu: add drm buddy support to amdgpu

2022-02-04 Thread Arunpravin




On 28/01/22 7:48 pm, Matthew Auld wrote:
> On Thu, 27 Jan 2022 at 14:11, Arunpravin
>  wrote:
>>
>> - Remove drm_mm references and replace with drm buddy functionalities
>> - Add res cursor support for drm buddy
>>
>> v2(Matthew Auld):
>>   - replace spinlock with mutex as we call kmem_cache_zalloc
>> (..., GFP_KERNEL) in drm_buddy_alloc() function
>>
>>   - lock drm_buddy_block_trim() function as it calls
>> mark_free/mark_split are all globally visible
>>
>> v3(Matthew Auld):
>>   - remove trim method error handling as we address the failure case
>> at drm_buddy_block_trim() function
>>
>> v4:
>>   - fix warnings reported by kernel test robot 
>>
>> v5:
>>   - fix merge conflict issue
>>
>> v6:
>>   - fix warnings reported by kernel test robot 
>>
>> Signed-off-by: Arunpravin 
>> ---
>>  drivers/gpu/drm/Kconfig   |   1 +
>>  .../gpu/drm/amd/amdgpu/amdgpu_res_cursor.h|  97 +--
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h   |   7 +-
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c  | 259 ++
>>  4 files changed, 231 insertions(+), 133 deletions(-)
> 
> 
> 
>>
>> -/**
>> - * amdgpu_vram_mgr_virt_start - update virtual start address
>> - *
>> - * @mem: ttm_resource to update
>> - * @node: just allocated node
>> - *
>> - * Calculate a virtual BO start address to easily check if everything is CPU
>> - * accessible.
>> - */
>> -static void amdgpu_vram_mgr_virt_start(struct ttm_resource *mem,
>> -  struct drm_mm_node *node)
>> -{
>> -   unsigned long start;
>> -
>> -   start = node->start + node->size;
>> -   if (start > mem->num_pages)
>> -   start -= mem->num_pages;
>> -   else
>> -   start = 0;
>> -   mem->start = max(mem->start, start);
>> -}
>> -
>>  /**
>>   * amdgpu_vram_mgr_new - allocate new ranges
>>   *
>> @@ -366,13 +357,13 @@ static int amdgpu_vram_mgr_new(struct 
>> ttm_resource_manager *man,
>>const struct ttm_place *place,
>>struct ttm_resource **res)
>>  {
>> -   unsigned long lpfn, num_nodes, pages_per_node, pages_left, pages;
>> +   unsigned long lpfn, pages_per_node, pages_left, pages, n_pages;
>> +   u64 vis_usage = 0, mem_bytes, max_bytes, min_page_size;
>> struct amdgpu_vram_mgr *mgr = to_vram_mgr(man);
>> struct amdgpu_device *adev = to_amdgpu_device(mgr);
>> -   uint64_t vis_usage = 0, mem_bytes, max_bytes;
>> -   struct ttm_range_mgr_node *node;
>> -   struct drm_mm *mm = >mm;
>> -   enum drm_mm_insert_mode mode;
>> +   struct amdgpu_vram_mgr_node *node;
>> +   struct drm_buddy *mm = >mm;
>> +   struct drm_buddy_block *block;
>> unsigned i;
>> int r;
>>
>> @@ -391,10 +382,9 @@ static int amdgpu_vram_mgr_new(struct 
>> ttm_resource_manager *man,
>> goto error_sub;
>> }
>>
>> -   if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
>> +   if (place->flags & TTM_PL_FLAG_CONTIGUOUS)
>> pages_per_node = ~0ul;
>> -   num_nodes = 1;
>> -   } else {
>> +   else {
>>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> pages_per_node = HPAGE_PMD_NR;
>>  #else
>> @@ -403,11 +393,9 @@ static int amdgpu_vram_mgr_new(struct 
>> ttm_resource_manager *man,
>>  #endif
>> pages_per_node = max_t(uint32_t, pages_per_node,
>>tbo->page_alignment);
>> -   num_nodes = DIV_ROUND_UP_ULL(PFN_UP(mem_bytes), 
>> pages_per_node);
>> }
>>
>> -   node = kvmalloc(struct_size(node, mm_nodes, num_nodes),
>> -   GFP_KERNEL | __GFP_ZERO);
>> +   node = kzalloc(sizeof(*node), GFP_KERNEL);
>> if (!node) {
>> r = -ENOMEM;
>> goto error_sub;
>> @@ -415,9 +403,17 @@ static int amdgpu_vram_mgr_new(struct 
>> ttm_resource_manager *man,
>>
>> ttm_resource_init(tbo, place, >base);
>>
>> -   mode = DRM_MM_INSERT_BEST;
>> +   INIT_LIST_HEAD(>blocks);
>> +
>> if (place->flags & TTM_PL_FLAG_TOPDOWN)
>> -   mode = DRM_MM_INSERT_HIGH;
>> +   node->flags |= DRM_BUDDY_TOPDOWN_ALLOCATION;
>> +
>> +   if (place->fpfn || lpfn != man->size)
>> +   /* Allocate blocks in desired range */
>> +   node->flags |= DRM_BUDDY_RANGE_ALLOCATION;
>> +
>> +   min_page_size = mgr->default_page_size;
>> +   BUG_ON(min_page_size < mm->chunk_size);
>>
>> pages_left = node->base.num_pages;
>>
>> @@ -425,36 +421,61 @@ static int amdgpu_vram_mgr_new(struct 
>> ttm_resource_manager *man,
>> pages = min(pages_left, 2UL << (30 - PAGE_SHIFT));
>>
>> i = 0;
>> -   spin_lock(>lock);
>> while (pages_left) {
>> -   uint32_t alignment = tbo->page_alignment;
>> -
>> if (pages >= pages_per_node)
>> -   alignment = pages_per_node;
>>

[PATCH] drm/amdgpu: move lockdep assert to the right place.

2022-02-04 Thread Christian König

Since newly added BOs don't have any mappings it's ok to add them
without holding the VM lock. Only when we add per VM BOs the lock is
mandatory.

Signed-off-by: Christian König 
Reported-by: Bhardwaj, Rajneesh 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index fdc6a1fd74af..dcc80d6e099e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -375,6 +375,8 @@ static void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base 
*base,
if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv)
return;
 
+   dma_resv_assert_held(vm->root.bo->tbo.base.resv);
+
vm->bulk_moveable = false;
if (bo->tbo.type == ttm_bo_type_kernel && bo->parent)
amdgpu_vm_bo_relocated(base);
@@ -2260,8 +2262,6 @@ struct amdgpu_bo_va *amdgpu_vm_bo_add(struct 
amdgpu_device *adev,
 {
struct amdgpu_bo_va *bo_va;
 
-   dma_resv_assert_held(vm->root.bo->tbo.base.resv);
-
bo_va = kzalloc(sizeof(struct amdgpu_bo_va), GFP_KERNEL);
if (bo_va == NULL) {
return NULL;
-- 
2.25.1

79 matches

Mail list logo