Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler
Hey Alex, Agree, we are moving it above, Christian also had the same feedback. - Shashank On 2/4/2022 7:44 PM, Deucher, Alexander wrote: [Public] Seems like this functionality should be moved up into the callers. Maybe add new IP callbacks (dump_reset_registers()) so that each IP can specify what registers are relevant for a reset debugging and then we can walk the IP list and call the callback before we call the asic_reset callbacks. Alex *From:* amd-gfx on behalf of Deucher, Alexander *Sent:* Friday, February 4, 2022 1:41 PM *To:* Sharma, Shashank ; Lazar, Lijo ; amd-gfx@lists.freedesktop.org *Cc:* Somalapuram, Amaranath ; Koenig, Christian *Subject:* Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler [Public] [Public] In the suspend and hibernate cases, we don't care. In most cases the power rail will be cut once the system enters suspend so it doesn't really matter. That's why we call the asic reset callback directly rather than going through the whole recovery process. The device is already quiescent at this point we just want to make sure the device is in a known state when we come out of suspend (in case suspend overall fails). Alex *From:* Sharma, Shashank *Sent:* Friday, February 4, 2022 12:22 PM *To:* Lazar, Lijo ; amd-gfx@lists.freedesktop.org *Cc:* Deucher, Alexander ; Somalapuram, Amaranath ; Koenig, Christian *Subject:* Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler On 2/4/2022 6:20 PM, Lazar, Lijo wrote: [AMD Official Use Only] One more thing In suspend-reset case, won't this cause to schedule a work item on suspend? I don't know if that is a good idea, ideally we would like to clean up all work items before going to suspend. Thanks, Lijo Again, this opens scope for discussion. What if there is a GPU error during suspend-reset, which is very probable case. - Shashank -Original Message- From: Sharma, Shashank Sent: Friday, February 4, 2022 10:47 PM To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Somalapuram, Amaranath ; Koenig, Christian Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler On 2/4/2022 6:11 PM, Lazar, Lijo wrote: BTW, since this is already providing a set of values it would be useful to provide one more field as the reset reason - RAS error recovery, GPU hung recovery or something else. Adding this additional parameter instead of blocking something in kernel, seems like a better idea. The app can filter out and read what it is interested into. - Shashank
[PATCH 13/13] drm/amd/display: Basic support with device ID
From: Oliver Logush [why] To get the the cyan_skillfish check working Reviewed-by: Charlene Liu Reviewed-by: Charlene Liu Acked-by: Jasdeep Dhillon Signed-off-by: Oliver Logush --- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 24 +-- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 2 +- .../gpu/drm/amd/display/dc/core/dc_resource.c | 2 +- .../gpu/drm/amd/display/include/dal_asic_id.h | 3 ++- 4 files changed, 26 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 8f53c9f6b267..f5941e59e5ad 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -1014,6 +1014,14 @@ static void amdgpu_dm_audio_eld_notify(struct amdgpu_device *adev, int pin) } } +bool is_skillfish_series(struct amdgpu_device *adev) +{ + if (adev->asic_type == CHIP_CYAN_SKILLFISH || adev->pdev->revision == 0x143F) { + return true; + } + return false; +} + static int dm_dmub_hw_init(struct amdgpu_device *adev) { const struct dmcub_firmware_header_v1_0 *hdr; @@ -1049,7 +1057,7 @@ static int dm_dmub_hw_init(struct amdgpu_device *adev) return -EINVAL; } - if (!has_hw_support) { + if (is_skillfish_series(adev)) { DRM_INFO("DMUB unsupported on ASIC\n"); return 0; } @@ -1471,6 +1479,10 @@ static int amdgpu_dm_init(struct amdgpu_device *adev) default: break; } + if (is_skillfish_series(adev)) { + init_data.flags.disable_dmcu = true; + break; + } break; } @@ -1777,7 +1789,6 @@ static int load_dmcu_fw(struct amdgpu_device *adev) case CHIP_VEGA10: case CHIP_VEGA12: case CHIP_VEGA20: - return 0; case CHIP_NAVI12: fw_name_dmcu = FIRMWARE_NAVI12_DMCU; break; @@ -1805,6 +1816,9 @@ static int load_dmcu_fw(struct amdgpu_device *adev) default: break; } + if (is_skillfish_series(adev)) { + return 0; + } DRM_ERROR("Unsupported ASIC type: 0x%X\n", adev->asic_type); return -EINVAL; } @@ -4515,6 +4529,12 @@ static int dm_early_init(void *handle) adev->mode_info.num_dig = 6; break; default: + if (is_skillfish_series(adev)) { + adev->mode_info.num_crtc = 2; + adev->mode_info.num_hpd = 2; + adev->mode_info.num_dig = 2; + break; + } #if defined(CONFIG_DRM_AMD_DC_DCN) switch (adev->ip_versions[DCE_HWIP][0]) { case IP_VERSION(2, 0, 2): diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h index e35977fda5c1..13875d669acd 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h @@ -82,7 +82,7 @@ struct common_irq_params { enum dc_irq_source irq_src; atomic64_t previous_timestamp; }; - +bool is_skillfish_series(struct amdgpu_device *adev); /** * struct dm_compressor_info - Buffer info used by frame buffer compression * @cpu_addr: MMIO cpu addr diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c index b36bae4b5bc9..318d381e2910 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c @@ -135,7 +135,7 @@ enum dce_version resource_parse_asic_id(struct hw_asic_id asic_id) case FAMILY_NV: dc_version = DCN_VERSION_2_0; - if (asic_id.chip_id == DEVICE_ID_NV_13FE) { + if (asic_id.chip_id == DEVICE_ID_NV_NAVI10_LITE_P_13FE || asic_id.chip_id == DEVICE_ID_NV_NAVI10_LITE_P_143F) { dc_version = DCN_VERSION_2_01; break; } diff --git a/drivers/gpu/drm/amd/display/include/dal_asic_id.h b/drivers/gpu/drm/amd/display/include/dal_asic_id.h index e4a2dfacab4c..37ec6343dbd6 100644 --- a/drivers/gpu/drm/amd/display/include/dal_asic_id.h +++ b/drivers/gpu/drm/amd/display/include/dal_asic_id.h @@ -211,7 +211,8 @@ enum { #ifndef ASICREV_IS_GREEN_SARDINE #define ASICREV_IS_GREEN_SARDINE(eChipRev) ((eChipRev >= GREEN_SARDINE_A0) && (eChipRev < 0xFF)) #endif -#define DEVICE_ID_NV_13FE 0x13FE // CYAN_SKILLFISH +#define DEVICE_ID_NV_NAVI10_LITE_P_13FE 0x13FE // CYAN_SKILLFISH +#define DEVICE_ID_NV_NAVI10_LITE_P_143F0x143F #define FAMILY_VGH 144 #define DEVICE_ID_VGH_163F 0x163F #define VANGOGH_A0 0x01 --
[PATCH 12/13] drm/amd/display: handle null link encoder
From: Martin Tsai [Why] The link encoder mapping could return a null one and causes system crash. [How] Let the mapping can get an available link encoder without endpoint identification check. Reviewed-by: Wenjing Liu Acked-by: Jasdeep Dhillon Signed-off-by: Martin Tsai --- drivers/gpu/drm/amd/display/dc/core/dc_link.c | 25 +++ .../drm/amd/display/dc/dcn20/dcn20_resource.c | 11 +--- 2 files changed, 5 insertions(+), 31 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link.c b/drivers/gpu/drm/amd/display/dc/core/dc_link.c index ec4b300ec067..b1718600fa02 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_link.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_link.c @@ -3530,11 +3530,7 @@ enum dc_status dc_link_allocate_mst_payload(struct pipe_ctx *pipe_ctx) const struct link_hwss *link_hwss = get_link_hwss(link, _ctx->link_res); DC_LOGGER_INIT(link->ctx->logger); - /* Link encoder may have been dynamically assigned to non-physical display endpoint. */ - if (link->ep_type == DISPLAY_ENDPOINT_PHY) - link_encoder = link->link_enc; - else if (link->dc->res_pool->funcs->link_encs_assign) - link_encoder = link_enc_cfg_get_link_enc_used_by_stream(pipe_ctx->stream->ctx->dc, stream); + link_encoder = link_enc_cfg_get_link_enc(link); ASSERT(link_encoder); /* enable_link_dp_mst already check link->enabled_stream_count @@ -3823,11 +3819,7 @@ static enum dc_status deallocate_mst_payload(struct pipe_ctx *pipe_ctx) const struct dc_link_settings empty_link_settings = {0}; DC_LOGGER_INIT(link->ctx->logger); - /* Link encoder may have been dynamically assigned to non-physical display endpoint. */ - if (link->ep_type == DISPLAY_ENDPOINT_PHY) - link_encoder = link->link_enc; - else if (link->dc->res_pool->funcs->link_encs_assign) - link_encoder = link_enc_cfg_get_link_enc_used_by_stream(pipe_ctx->stream->ctx->dc, stream); + link_encoder = link_enc_cfg_get_link_enc(link); ASSERT(link_encoder); /* deallocate_mst_payload is called before disable link. When mode or @@ -3944,13 +3936,7 @@ static void update_psp_stream_config(struct pipe_ctx *pipe_ctx, bool dpms_off) if (cp_psp == NULL || cp_psp->funcs.update_stream_config == NULL) return; - if (pipe_ctx->stream->link->ep_type == DISPLAY_ENDPOINT_PHY) - link_enc = pipe_ctx->stream->link->link_enc; - else if (pipe_ctx->stream->link->ep_type == DISPLAY_ENDPOINT_USB4_DPIA && - pipe_ctx->stream->link->dc->res_pool->funcs->link_encs_assign) - link_enc = link_enc_cfg_get_link_enc_used_by_stream( - pipe_ctx->stream->ctx->dc, - pipe_ctx->stream); + link_enc = link_enc_cfg_get_link_enc(pipe_ctx->stream->link); ASSERT(link_enc); if (link_enc == NULL) return; @@ -4100,10 +4086,7 @@ void core_link_enable_stream( dc_is_virtual_signal(pipe_ctx->stream->signal)) return; - if (dc->res_pool->funcs->link_encs_assign && stream->link->ep_type != DISPLAY_ENDPOINT_PHY) - link_enc = link_enc_cfg_get_link_enc_used_by_stream(dc, stream); - else - link_enc = stream->link->link_enc; + link_enc = link_enc_cfg_get_link_enc(link); ASSERT(link_enc); if (!dc_is_virtual_signal(pipe_ctx->stream->signal) diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c index fcf388b509db..b55868a0e0df 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c @@ -1605,16 +1605,7 @@ static void get_pixel_clock_parameters( pixel_clk_params->requested_pix_clk_100hz = stream->timing.pix_clk_100hz; - /* Links supporting dynamically assigned link encoder will be assigned next -* available encoder if one not already assigned. -*/ - if (link->is_dig_mapping_flexible && - link->dc->res_pool->funcs->link_encs_assign) { - link_enc = link_enc_cfg_get_link_enc_used_by_stream(stream->ctx->dc, stream); - if (link_enc == NULL) - link_enc = link_enc_cfg_get_next_avail_link_enc(stream->ctx->dc); - } else - link_enc = stream->link->link_enc; + link_enc = link_enc_cfg_get_link_enc(link); ASSERT(link_enc); if (link_enc) -- 2.25.1
[PATCH 07/13] drm/amd/display: change fastboot timing validation
From: Paul Hsieh [Why] VBIOS light up eDP with 6bpc but driver use 8bpc without disable valid stream then re-enable valid stream. Some panels can't runtime change color depth. [How] Change fastboot timing validation function. Not only check LANE_COUNT, LINK_RATE...etc Reviewed-by: Anthony Koo Acked-by: Jasdeep Dhillon Signed-off-by: Paul Hsieh --- drivers/gpu/drm/amd/display/dc/core/dc.c| 2 +- drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 2 +- drivers/gpu/drm/amd/display/dc/dc.h | 2 +- drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c | 3 ++- 4 files changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index 1d9404ff29ed..997eb7e2d2b3 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -1467,7 +1467,7 @@ static bool context_changed( return false; } -bool dc_validate_seamless_boot_timing(const struct dc *dc, +bool dc_validate_boot_timing(const struct dc *dc, const struct dc_sink *sink, struct dc_crtc_timing *crtc_timing) { diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c index 9df66501a453..b36bae4b5bc9 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c @@ -2168,7 +2168,7 @@ static void mark_seamless_boot_stream( if (dc->config.allow_seamless_boot_optimization && !dcb->funcs->is_accelerated_mode(dcb)) { - if (dc_validate_seamless_boot_timing(dc, stream->sink, >timing)) + if (dc_validate_boot_timing(dc, stream->sink, >timing)) stream->apply_seamless_boot_optimization = true; } } diff --git a/drivers/gpu/drm/amd/display/dc/dc.h b/drivers/gpu/drm/amd/display/dc/dc.h index 69d264dd69a7..8248d4b75066 100644 --- a/drivers/gpu/drm/amd/display/dc/dc.h +++ b/drivers/gpu/drm/amd/display/dc/dc.h @@ -1125,7 +1125,7 @@ struct dc_validation_set { uint8_t plane_count; }; -bool dc_validate_seamless_boot_timing(const struct dc *dc, +bool dc_validate_boot_timing(const struct dc *dc, const struct dc_sink *sink, struct dc_crtc_timing *crtc_timing); diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c index 8c32b9cb3b49..52b22a944f94 100644 --- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c +++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c @@ -1761,7 +1761,8 @@ void dce110_enable_accelerated_mode(struct dc *dc, struct dc_state *context) edp_link->link_status.link_active) { struct dc_stream_state *edp_stream = edp_streams[0]; - can_apply_edp_fast_boot = !is_edp_ilr_optimization_required(edp_stream->link, _stream->timing); + can_apply_edp_fast_boot = dc_validate_boot_timing(dc, + edp_stream->sink, _stream->timing); edp_stream->apply_edp_fast_boot_optimization = can_apply_edp_fast_boot; if (can_apply_edp_fast_boot) DC_LOG_EVENT_LINK_TRAINING("eDP fast boot disabled to optimize link rate\n"); -- 2.25.1
[PATCH 10/13] drm/amd/display: [FW Promotion] Release 0.0.103.0
From: Anthony Koo Reviewed-by: Aric Cyr Acked-by: Jasdeep Dhillon Signed-off-by: Anthony Koo --- drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h b/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h index a01814631911..c5cbbb0470ee 100644 --- a/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h +++ b/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h @@ -47,10 +47,10 @@ /* Firmware versioning. */ #ifdef DMUB_EXPOSE_VERSION -#define DMUB_FW_VERSION_GIT_HASH 0xab0ae3c8 +#define DMUB_FW_VERSION_GIT_HASH 0x5189adbf #define DMUB_FW_VERSION_MAJOR 0 #define DMUB_FW_VERSION_MINOR 0 -#define DMUB_FW_VERSION_REVISION 102 +#define DMUB_FW_VERSION_REVISION 103 #define DMUB_FW_VERSION_TEST 0 #define DMUB_FW_VERSION_VBIOS 0 #define DMUB_FW_VERSION_HOTFIX 0 -- 2.25.1
[PATCH 11/13] drm/amd/display: 3.2.172
From: Aric Cyr This version brings along the following fixes: -fix for build failure uninitalized error -Bug fix for DP2 using uncertified cable -limit unbounded request to 5k -fix DP LT sequence on EQ fail -Bug fixes for S3/S4 Acked-by: Jasdeep Dhillon Signed-off-by: Aric Cyr --- drivers/gpu/drm/amd/display/dc/dc.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/dc.h b/drivers/gpu/drm/amd/display/dc/dc.h index 8248d4b75066..89ca672866f6 100644 --- a/drivers/gpu/drm/amd/display/dc/dc.h +++ b/drivers/gpu/drm/amd/display/dc/dc.h @@ -47,7 +47,7 @@ struct aux_payload; struct set_config_cmd_payload; struct dmub_notification; -#define DC_VER "3.2.171" +#define DC_VER "3.2.172" #define MAX_SURFACES 3 #define MAX_PLANES 6 -- 2.25.1
[PATCH 09/13] drm/amd/display: Fix DP LT sequence on EQ fail
From: Ilya [Why] The number of lanes wasn't being reset to maximum when reducing link rate due to an EQ failure. This could result in having fewer lanes in the verified link capabilities, a lower maximum link bandwidth, and fewer modes being supported. [How] Reset the number of lanes to max when dropping link rate due to EQ failure during link training. Reviewed-by: Aric Cyr Acked-by: Jasdeep Dhillon Signed-off-by: Ilya --- drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c index d0cb40df60a4..cd9c31b5e55d 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c @@ -3504,6 +3504,7 @@ static bool decide_fallback_link_setting( current_link_setting->link_rate = reduce_link_rate( current_link_setting->link_rate); + current_link_setting->lane_count = initial_link_settings.lane_count; } else { return false; } @@ -3516,6 +3517,7 @@ static bool decide_fallback_link_setting( current_link_setting->link_rate = reduce_link_rate( current_link_setting->link_rate); + current_link_setting->lane_count = initial_link_settings.lane_count; } else { return false; } -- 2.25.1
[PATCH 06/13] drm/amd/display: fix yellow carp wm clamping
From: Dmytro Laktyushkin Fix clamping to match register field size Reviewed-by: Charlene Liu Acked-by: Jasdeep Dhillon Signed-off-by: Dmytro Laktyushkin --- .../drm/amd/display/dc/dcn31/dcn31_hubbub.c | 61 ++- 1 file changed, 32 insertions(+), 29 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hubbub.c b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hubbub.c index 90c73a1cb986..5e3bcaf12cac 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hubbub.c +++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hubbub.c @@ -138,8 +138,11 @@ static uint32_t convert_and_clamp( ret_val = wm_ns * refclk_mhz; ret_val /= 1000; - if (ret_val > clamp_value) + if (ret_val > clamp_value) { + /* clamping WMs is abnormal, unexpected and may lead to underflow*/ + ASSERT(0); ret_val = clamp_value; + } return ret_val; } @@ -159,7 +162,7 @@ static bool hubbub31_program_urgent_watermarks( if (safe_to_lower || watermarks->a.urgent_ns > hubbub2->watermarks.a.urgent_ns) { hubbub2->watermarks.a.urgent_ns = watermarks->a.urgent_ns; prog_wm_value = convert_and_clamp(watermarks->a.urgent_ns, - refclk_mhz, 0x1f); + refclk_mhz, 0x3fff); REG_SET(DCHUBBUB_ARB_DATA_URGENCY_WATERMARK_A, 0, DCHUBBUB_ARB_DATA_URGENCY_WATERMARK_A, prog_wm_value); @@ -193,7 +196,7 @@ static bool hubbub31_program_urgent_watermarks( if (safe_to_lower || watermarks->a.urgent_latency_ns > hubbub2->watermarks.a.urgent_latency_ns) { hubbub2->watermarks.a.urgent_latency_ns = watermarks->a.urgent_latency_ns; prog_wm_value = convert_and_clamp(watermarks->a.urgent_latency_ns, - refclk_mhz, 0x1f); + refclk_mhz, 0x3fff); REG_SET(DCHUBBUB_ARB_REFCYC_PER_TRIP_TO_MEMORY_A, 0, DCHUBBUB_ARB_REFCYC_PER_TRIP_TO_MEMORY_A, prog_wm_value); } else if (watermarks->a.urgent_latency_ns < hubbub2->watermarks.a.urgent_latency_ns) @@ -203,7 +206,7 @@ static bool hubbub31_program_urgent_watermarks( if (safe_to_lower || watermarks->b.urgent_ns > hubbub2->watermarks.b.urgent_ns) { hubbub2->watermarks.b.urgent_ns = watermarks->b.urgent_ns; prog_wm_value = convert_and_clamp(watermarks->b.urgent_ns, - refclk_mhz, 0x1f); + refclk_mhz, 0x3fff); REG_SET(DCHUBBUB_ARB_DATA_URGENCY_WATERMARK_B, 0, DCHUBBUB_ARB_DATA_URGENCY_WATERMARK_B, prog_wm_value); @@ -237,7 +240,7 @@ static bool hubbub31_program_urgent_watermarks( if (safe_to_lower || watermarks->b.urgent_latency_ns > hubbub2->watermarks.b.urgent_latency_ns) { hubbub2->watermarks.b.urgent_latency_ns = watermarks->b.urgent_latency_ns; prog_wm_value = convert_and_clamp(watermarks->b.urgent_latency_ns, - refclk_mhz, 0x1f); + refclk_mhz, 0x3fff); REG_SET(DCHUBBUB_ARB_REFCYC_PER_TRIP_TO_MEMORY_B, 0, DCHUBBUB_ARB_REFCYC_PER_TRIP_TO_MEMORY_B, prog_wm_value); } else if (watermarks->b.urgent_latency_ns < hubbub2->watermarks.b.urgent_latency_ns) @@ -247,7 +250,7 @@ static bool hubbub31_program_urgent_watermarks( if (safe_to_lower || watermarks->c.urgent_ns > hubbub2->watermarks.c.urgent_ns) { hubbub2->watermarks.c.urgent_ns = watermarks->c.urgent_ns; prog_wm_value = convert_and_clamp(watermarks->c.urgent_ns, - refclk_mhz, 0x1f); + refclk_mhz, 0x3fff); REG_SET(DCHUBBUB_ARB_DATA_URGENCY_WATERMARK_C, 0, DCHUBBUB_ARB_DATA_URGENCY_WATERMARK_C, prog_wm_value); @@ -281,7 +284,7 @@ static bool hubbub31_program_urgent_watermarks( if (safe_to_lower || watermarks->c.urgent_latency_ns > hubbub2->watermarks.c.urgent_latency_ns) { hubbub2->watermarks.c.urgent_latency_ns = watermarks->c.urgent_latency_ns; prog_wm_value = convert_and_clamp(watermarks->c.urgent_latency_ns, - refclk_mhz, 0x1f); + refclk_mhz, 0x3fff); REG_SET(DCHUBBUB_ARB_REFCYC_PER_TRIP_TO_MEMORY_C, 0, DCHUBBUB_ARB_REFCYC_PER_TRIP_TO_MEMORY_C, prog_wm_value); } else if (watermarks->c.urgent_latency_ns < hubbub2->watermarks.c.urgent_latency_ns) @@ -291,7 +294,7 @@ static bool hubbub31_program_urgent_watermarks( if (safe_to_lower || watermarks->d.urgent_ns > hubbub2->watermarks.d.urgent_ns) {
[PATCH 03/13] drm/amd/display: limit unbounded requesting to 5k
From: Dmytro Laktyushkin Unbounded requesting is unsupported on pipe split modes and this change prevents us running into such a situation with wide modes. Reviewed-by: Charlene Liu Acked-by: Jasdeep Dhillon Signed-off-by: Dmytro Laktyushkin --- drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c index 7f9ceda4229b..007a7dc4f5be 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c @@ -1836,7 +1836,8 @@ static int dcn31_populate_dml_pipes_from_context( if (is_dual_plane(pipe->plane_state->format) && pipe->plane_state->src_rect.width <= 1920 && pipe->plane_state->src_rect.height <= 1080) { dc->config.enable_4to1MPC = true; - } else if (!is_dual_plane(pipe->plane_state->format)) { + } else if (!is_dual_plane(pipe->plane_state->format) && pipe->plane_state->src_rect.width <= 5120) { + /* Limit to 5k max to avoid forced pipe split when there is not enough detile for swath */ context->bw_ctx.dml.ip.det_buffer_size_kbytes = 192; pipes[0].pipe.src.unbounded_req_mode = true; } -- 2.25.1
[PATCH 08/13] drm/amd/display: keep eDP Vdd on when eDP stream is already enabled
From: Zhan Liu [Why] Even if can_apply_edp_fast_boot is set to 1 at boot, this flag will be cleared to 0 at S3 resume. [How] Keep eDP Vdd on when eDP stream is already enabled. Reviewed-by: Charlene Liu Acked-by: Jasdeep Dhillon Signed-off-by: Zhan Liu --- .../display/dc/dce110/dce110_hw_sequencer.c | 24 +-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c index 52b22a944f94..ace04e2ed34e 100644 --- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c +++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c @@ -1770,9 +1770,29 @@ void dce110_enable_accelerated_mode(struct dc *dc, struct dc_state *context) break; } } - // We are trying to enable eDP, don't power down VDD - if (can_apply_edp_fast_boot) + + /* +* TO-DO: So far the code logic below only addresses single eDP case. +* For dual eDP case, there are a few things that need to be +* implemented first: +* +* 1. Change the fastboot logic above, so eDP link[0 or 1]'s +* stream[0 or 1] will all be checked. +* +* 2. Change keep_edp_vdd_on to an array, and maintain keep_edp_vdd_on +* for each eDP. +* +* Once above 2 things are completed, we can then change the logic below +* correspondingly, so dual eDP case will be fully covered. +*/ + + // We are trying to enable eDP, don't power down VDD if eDP stream is existing + if ((edp_stream_num == 1 && edp_streams[0] != NULL) || can_apply_edp_fast_boot) { keep_edp_vdd_on = true; + DC_LOG_EVENT_LINK_TRAINING("Keep eDP Vdd on\n"); + } else { + DC_LOG_EVENT_LINK_TRAINING("No eDP stream enabled, turn eDP Vdd off\n"); + } } // Check seamless boot support -- 2.25.1
[PATCH 05/13] dc: do blocked MST topology discovery at resume from S3/S4
From: "Guo, Bing" Why: When resume from sleep or hiberation, blocked MST Topology discovery might need to be used. How: Added "DETECT_REASON_RESUMEFROMS3S4" to enum dc_detect_reason; use it to require blocked MST Topology discovery. Reviewed-by: Wenjing Liu Acked-by: Jasdeep Dhillon Signed-off-by: "Bing Guo" --- drivers/gpu/drm/amd/display/dc/core/dc_link.c | 2 +- drivers/gpu/drm/amd/display/dc/dc_link.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link.c b/drivers/gpu/drm/amd/display/dc/core/dc_link.c index cb6c91cd6e83..ec4b300ec067 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_link.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_link.c @@ -831,7 +831,7 @@ static bool discover_dp_mst_topology(struct dc_link *link, enum dc_detect_reason link->type = dc_connection_mst_branch; dm_helpers_dp_update_branch_info(link->ctx, link); if (dm_helpers_dp_mst_start_top_mgr(link->ctx, - link, reason == DETECT_REASON_BOOT)) { + link, (reason == DETECT_REASON_BOOT || reason == DETECT_REASON_RESUMEFROMS3S4))) { link_disconnect_sink(link); } else { link->type = dc_connection_sst_branch; diff --git a/drivers/gpu/drm/amd/display/dc/dc_link.h b/drivers/gpu/drm/amd/display/dc/dc_link.h index e7ead9e25318..9ad3ee4079c3 100644 --- a/drivers/gpu/drm/amd/display/dc/dc_link.h +++ b/drivers/gpu/drm/amd/display/dc/dc_link.h @@ -312,6 +312,7 @@ void dc_link_blank_dp_stream(struct dc_link *link, bool hw_init); */ enum dc_detect_reason { DETECT_REASON_BOOT, + DETECT_REASON_RESUMEFROMS3S4, DETECT_REASON_HPD, DETECT_REASON_HPDRX, DETECT_REASON_FALLBACK, -- 2.25.1
[PATCH 01/13] drm/amd/display: Fix for variable may be used uninitialized error
From: Eric Bernstein [Why] Build failure due to ‘status’ may be used uninitialized [How] Initialize status to LINK_TRAINING_SUCCESS Reviewed-by: Wenjing Liu Acked-by: Jasdeep Dhillon Signed-off-by: Eric Bernstein --- drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c index 18fe8f77821a..d0cb40df60a4 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c @@ -3199,7 +3199,7 @@ static bool dp_verify_link_cap( bool success = false; bool skip_video_pattern; enum clock_source_id dp_cs_id = get_clock_source_id(link); - enum link_training_result status; + enum link_training_result status = LINK_TRAINING_SUCCESS; union hpd_irq_data irq_data; struct link_resource link_res; -- 2.25.1
[PATCH 04/13] drm/amd/display: remove static from optc31_set_drr
From: Eric Bernstein remove static from optc31_set_drr Reviewed-by: Nevenko Stupar Acked-by: Jasdeep Dhillon Signed-off-by: Eric Bernstein --- drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c | 2 +- drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c index e8562fa11366..8afe2130d7c5 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c +++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c @@ -161,7 +161,7 @@ static bool optc31_immediate_disable_crtc(struct timing_generator *optc) return true; } -static void optc31_set_drr( +void optc31_set_drr( struct timing_generator *optc, const struct drr_params *params) { diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h index d8ef2f0d0c95..a37b16040c1d 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h +++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h @@ -256,4 +256,6 @@ void dcn31_timing_generator_init(struct optc *optc1); +void optc31_set_drr(struct timing_generator *optc, const struct drr_params *params); + #endif /* __DC_OPTC_DCN31_H__ */ -- 2.25.1
[PATCH 02/13] drm/amd/display: Fix stream->link_enc unassigned during stream removal
From: Nicholas Kazlauskas [Why] Found when running igt@kms_atomic. Userspace attempts to do a TEST_COMMIT when 0 streams which calls dc_remove_stream_from_ctx. This in turn calls link_enc_unassign which ends up modifying stream->link = NULL directly, causing the global link_enc to be removed preventing further link activity and future link validation from passing. [How] We take care of link_enc unassignment at the start of link_enc_cfg_link_encs_assign so this call is no longer necessary. Fixes global state from being modified while unlocked. Reviewed-by: Jimmy Kizito Acked-by: Jasdeep Dhillon Signed-off-by: Nicholas Kazlauskas --- drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c index e82aa0559bdf..9df66501a453 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c @@ -1965,10 +1965,6 @@ enum dc_status dc_remove_stream_from_ctx( dc->res_pool, del_pipe->stream_res.stream_enc, false); - /* Release link encoder from stream in new dc_state. */ - if (dc->res_pool->funcs->link_enc_unassign) - dc->res_pool->funcs->link_enc_unassign(new_ctx, del_pipe->stream); - if (is_dp_128b_132b_signal(del_pipe)) { update_hpo_dp_stream_engine_usage( _ctx->res_ctx, dc->res_pool, -- 2.25.1
[PATCH 00/13] DC Patchset, Feb 7 2022 v3
This DC patchset brings improvements in multiple areas. In summary, we have: -fix for build failure uninitalized error -Bug fix for DP2 using uncertified cable -limit unbounded request to 5k -fix DP LT sequence on EQ fail -Bug fixes for S3/S4 Anthony Koo (1): drm/amd/display: [FW Promotion] Release 0.0.103.0 Aric Cyr (1): drm/amd/display: 3.2.172 Dmytro Laktyushkin (2): drm/amd/display: limit unbounded requesting to 5k drm/amd/display: fix yellow carp wm clamping Eric Bernstein (2): drm/amd/display: Fix for variable may be used uninitialized error drm/amd/display: remove static from optc31_set_drr Guo, Bing (1): dc: do blocked MST topology discovery at resume from S3/S4 Ilya (1): drm/amd/display: Fix DP LT sequence on EQ fail Martin Tsai (1): drm/amd/display: handle null link encoder Nicholas Kazlauskas (1): drm/amd/display: Fix stream->link_enc unassigned during stream removal Oliver Logush (1): drm/amd/display: Basic support with device ID Paul Hsieh (1): drm/amd/display: change fastboot timing validation Zhan Liu (1): drm/amd/display: keep eDP Vdd on when eDP stream is already enabled .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 24 +++- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 2 +- drivers/gpu/drm/amd/display/dc/core/dc.c | 2 +- drivers/gpu/drm/amd/display/dc/core/dc_link.c | 27 ++-- .../gpu/drm/amd/display/dc/core/dc_link_dp.c | 4 +- .../gpu/drm/amd/display/dc/core/dc_resource.c | 8 +-- drivers/gpu/drm/amd/display/dc/dc.h | 4 +- drivers/gpu/drm/amd/display/dc/dc_link.h | 1 + .../display/dc/dce110/dce110_hw_sequencer.c | 27 +++- .../drm/amd/display/dc/dcn20/dcn20_resource.c | 11 +--- .../drm/amd/display/dc/dcn31/dcn31_hubbub.c | 61 ++- .../gpu/drm/amd/display/dc/dcn31/dcn31_optc.c | 2 +- .../gpu/drm/amd/display/dc/dcn31/dcn31_optc.h | 2 + .../drm/amd/display/dc/dcn31/dcn31_resource.c | 3 +- .../gpu/drm/amd/display/dmub/inc/dmub_cmd.h | 4 +- .../gpu/drm/amd/display/include/dal_asic_id.h | 3 +- 16 files changed, 103 insertions(+), 82 deletions(-) -- 2.25.1
[PATCH 14/14] SWDEV-321758 - dc: Code clean
From: Oliver Logush Signed-off-by: Oliver Logush --- drivers/gpu/drm/amd/display/include/dal_asic_id.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/include/dal_asic_id.h b/drivers/gpu/drm/amd/display/include/dal_asic_id.h index c25bc4d9cd4b..16c5c929e8d5 100644 --- a/drivers/gpu/drm/amd/display/include/dal_asic_id.h +++ b/drivers/gpu/drm/amd/display/include/dal_asic_id.h @@ -212,7 +212,7 @@ enum { #define ASICREV_IS_GREEN_SARDINE(eChipRev) ((eChipRev >= GREEN_SARDINE_A0) && (eChipRev < 0xFF)) #endif #define DEVICE_ID_NV_NAVI10_LITE_P_13FE 0x13FE // CYAN_SKILLFISH -#define DEVICE_ID_NV_NAVI10_LITE_P_143F0x143F // ROBIN+ +#define DEVICE_ID_NV_NAVI10_LITE_P_143F0x143F #define FAMILY_VGH 144 #define DEVICE_ID_VGH_163F 0x163F #define VANGOGH_A0 0x01 -- 2.25.1
[PATCH 11/14] drm/amd/display: [FW Promotion] Release 0.0.103.0
From: Anthony Koo Reviewed-by: Aric Cyr Acked-by: Jasdeep Dhillon Signed-off-by: Anthony Koo --- drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h b/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h index a01814631911..c5cbbb0470ee 100644 --- a/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h +++ b/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h @@ -47,10 +47,10 @@ /* Firmware versioning. */ #ifdef DMUB_EXPOSE_VERSION -#define DMUB_FW_VERSION_GIT_HASH 0xab0ae3c8 +#define DMUB_FW_VERSION_GIT_HASH 0x5189adbf #define DMUB_FW_VERSION_MAJOR 0 #define DMUB_FW_VERSION_MINOR 0 -#define DMUB_FW_VERSION_REVISION 102 +#define DMUB_FW_VERSION_REVISION 103 #define DMUB_FW_VERSION_TEST 0 #define DMUB_FW_VERSION_VBIOS 0 #define DMUB_FW_VERSION_HOTFIX 0 -- 2.25.1
[PATCH 13/14] drm/amd/display: handle null link encoder
From: Martin Tsai [Why] The link encoder mapping could return a null one and causes system crash. [How] Let the mapping can get an available link encoder without endpoint identification check. Reviewed-by: Wenjing Liu Acked-by: Jasdeep Dhillon Signed-off-by: Martin Tsai --- drivers/gpu/drm/amd/display/dc/core/dc_link.c | 25 +++ .../drm/amd/display/dc/dcn20/dcn20_resource.c | 11 +--- 2 files changed, 5 insertions(+), 31 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link.c b/drivers/gpu/drm/amd/display/dc/core/dc_link.c index ec4b300ec067..b1718600fa02 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_link.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_link.c @@ -3530,11 +3530,7 @@ enum dc_status dc_link_allocate_mst_payload(struct pipe_ctx *pipe_ctx) const struct link_hwss *link_hwss = get_link_hwss(link, _ctx->link_res); DC_LOGGER_INIT(link->ctx->logger); - /* Link encoder may have been dynamically assigned to non-physical display endpoint. */ - if (link->ep_type == DISPLAY_ENDPOINT_PHY) - link_encoder = link->link_enc; - else if (link->dc->res_pool->funcs->link_encs_assign) - link_encoder = link_enc_cfg_get_link_enc_used_by_stream(pipe_ctx->stream->ctx->dc, stream); + link_encoder = link_enc_cfg_get_link_enc(link); ASSERT(link_encoder); /* enable_link_dp_mst already check link->enabled_stream_count @@ -3823,11 +3819,7 @@ static enum dc_status deallocate_mst_payload(struct pipe_ctx *pipe_ctx) const struct dc_link_settings empty_link_settings = {0}; DC_LOGGER_INIT(link->ctx->logger); - /* Link encoder may have been dynamically assigned to non-physical display endpoint. */ - if (link->ep_type == DISPLAY_ENDPOINT_PHY) - link_encoder = link->link_enc; - else if (link->dc->res_pool->funcs->link_encs_assign) - link_encoder = link_enc_cfg_get_link_enc_used_by_stream(pipe_ctx->stream->ctx->dc, stream); + link_encoder = link_enc_cfg_get_link_enc(link); ASSERT(link_encoder); /* deallocate_mst_payload is called before disable link. When mode or @@ -3944,13 +3936,7 @@ static void update_psp_stream_config(struct pipe_ctx *pipe_ctx, bool dpms_off) if (cp_psp == NULL || cp_psp->funcs.update_stream_config == NULL) return; - if (pipe_ctx->stream->link->ep_type == DISPLAY_ENDPOINT_PHY) - link_enc = pipe_ctx->stream->link->link_enc; - else if (pipe_ctx->stream->link->ep_type == DISPLAY_ENDPOINT_USB4_DPIA && - pipe_ctx->stream->link->dc->res_pool->funcs->link_encs_assign) - link_enc = link_enc_cfg_get_link_enc_used_by_stream( - pipe_ctx->stream->ctx->dc, - pipe_ctx->stream); + link_enc = link_enc_cfg_get_link_enc(pipe_ctx->stream->link); ASSERT(link_enc); if (link_enc == NULL) return; @@ -4100,10 +4086,7 @@ void core_link_enable_stream( dc_is_virtual_signal(pipe_ctx->stream->signal)) return; - if (dc->res_pool->funcs->link_encs_assign && stream->link->ep_type != DISPLAY_ENDPOINT_PHY) - link_enc = link_enc_cfg_get_link_enc_used_by_stream(dc, stream); - else - link_enc = stream->link->link_enc; + link_enc = link_enc_cfg_get_link_enc(link); ASSERT(link_enc); if (!dc_is_virtual_signal(pipe_ctx->stream->signal) diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c index fcf388b509db..b55868a0e0df 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c @@ -1605,16 +1605,7 @@ static void get_pixel_clock_parameters( pixel_clk_params->requested_pix_clk_100hz = stream->timing.pix_clk_100hz; - /* Links supporting dynamically assigned link encoder will be assigned next -* available encoder if one not already assigned. -*/ - if (link->is_dig_mapping_flexible && - link->dc->res_pool->funcs->link_encs_assign) { - link_enc = link_enc_cfg_get_link_enc_used_by_stream(stream->ctx->dc, stream); - if (link_enc == NULL) - link_enc = link_enc_cfg_get_next_avail_link_enc(stream->ctx->dc); - } else - link_enc = stream->link->link_enc; + link_enc = link_enc_cfg_get_link_enc(link); ASSERT(link_enc); if (link_enc) -- 2.25.1
[PATCH 12/14] drm/amd/display: 3.2.172
From: Aric Cyr This version brings along the following fixes: -fix for build failure uninitalized error -Bug fix for DP2 using uncertified cable -limit unbounded request to 5k -fix DP LT sequence on EQ fail -Bug fixes for S3/S4 Acked-by: Jasdeep Dhillon Signed-off-by: Aric Cyr --- drivers/gpu/drm/amd/display/dc/dc.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/dc.h b/drivers/gpu/drm/amd/display/dc/dc.h index 8248d4b75066..89ca672866f6 100644 --- a/drivers/gpu/drm/amd/display/dc/dc.h +++ b/drivers/gpu/drm/amd/display/dc/dc.h @@ -47,7 +47,7 @@ struct aux_payload; struct set_config_cmd_payload; struct dmub_notification; -#define DC_VER "3.2.171" +#define DC_VER "3.2.172" #define MAX_SURFACES 3 #define MAX_PLANES 6 -- 2.25.1
[PATCH 09/14] drm/amd/display: keep eDP Vdd on when eDP stream is already enabled
From: Zhan Liu [Why] Even if can_apply_edp_fast_boot is set to 1 at boot, this flag will be cleared to 0 at S3 resume. [How] Keep eDP Vdd on when eDP stream is already enabled. Reviewed-by: Charlene Liu Acked-by: Jasdeep Dhillon Signed-off-by: Zhan Liu --- .../display/dc/dce110/dce110_hw_sequencer.c | 24 +-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c index 52b22a944f94..ace04e2ed34e 100644 --- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c +++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c @@ -1770,9 +1770,29 @@ void dce110_enable_accelerated_mode(struct dc *dc, struct dc_state *context) break; } } - // We are trying to enable eDP, don't power down VDD - if (can_apply_edp_fast_boot) + + /* +* TO-DO: So far the code logic below only addresses single eDP case. +* For dual eDP case, there are a few things that need to be +* implemented first: +* +* 1. Change the fastboot logic above, so eDP link[0 or 1]'s +* stream[0 or 1] will all be checked. +* +* 2. Change keep_edp_vdd_on to an array, and maintain keep_edp_vdd_on +* for each eDP. +* +* Once above 2 things are completed, we can then change the logic below +* correspondingly, so dual eDP case will be fully covered. +*/ + + // We are trying to enable eDP, don't power down VDD if eDP stream is existing + if ((edp_stream_num == 1 && edp_streams[0] != NULL) || can_apply_edp_fast_boot) { keep_edp_vdd_on = true; + DC_LOG_EVENT_LINK_TRAINING("Keep eDP Vdd on\n"); + } else { + DC_LOG_EVENT_LINK_TRAINING("No eDP stream enabled, turn eDP Vdd off\n"); + } } // Check seamless boot support -- 2.25.1
[PATCH 10/14] drm/amd/display: Fix DP LT sequence on EQ fail
From: Ilya [Why] The number of lanes wasn't being reset to maximum when reducing link rate due to an EQ failure. This could result in having fewer lanes in the verified link capabilities, a lower maximum link bandwidth, and fewer modes being supported. [How] Reset the number of lanes to max when dropping link rate due to EQ failure during link training. Reviewed-by: Aric Cyr Acked-by: Jasdeep Dhillon Signed-off-by: Ilya --- drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c index d0cb40df60a4..cd9c31b5e55d 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c @@ -3504,6 +3504,7 @@ static bool decide_fallback_link_setting( current_link_setting->link_rate = reduce_link_rate( current_link_setting->link_rate); + current_link_setting->lane_count = initial_link_settings.lane_count; } else { return false; } @@ -3516,6 +3517,7 @@ static bool decide_fallback_link_setting( current_link_setting->link_rate = reduce_link_rate( current_link_setting->link_rate); + current_link_setting->lane_count = initial_link_settings.lane_count; } else { return false; } -- 2.25.1
[PATCH 08/14] drm/amd/display: change fastboot timing validation
From: Paul Hsieh [Why] VBIOS light up eDP with 6bpc but driver use 8bpc without disable valid stream then re-enable valid stream. Some panels can't runtime change color depth. [How] Change fastboot timing validation function. Not only check LANE_COUNT, LINK_RATE...etc Reviewed-by: Anthony Koo Acked-by: Jasdeep Dhillon Signed-off-by: Paul Hsieh --- drivers/gpu/drm/amd/display/dc/core/dc.c| 2 +- drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 2 +- drivers/gpu/drm/amd/display/dc/dc.h | 2 +- drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c | 3 ++- 4 files changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index 1d9404ff29ed..997eb7e2d2b3 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -1467,7 +1467,7 @@ static bool context_changed( return false; } -bool dc_validate_seamless_boot_timing(const struct dc *dc, +bool dc_validate_boot_timing(const struct dc *dc, const struct dc_sink *sink, struct dc_crtc_timing *crtc_timing) { diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c index 8a199d661a66..318d381e2910 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c @@ -2168,7 +2168,7 @@ static void mark_seamless_boot_stream( if (dc->config.allow_seamless_boot_optimization && !dcb->funcs->is_accelerated_mode(dcb)) { - if (dc_validate_seamless_boot_timing(dc, stream->sink, >timing)) + if (dc_validate_boot_timing(dc, stream->sink, >timing)) stream->apply_seamless_boot_optimization = true; } } diff --git a/drivers/gpu/drm/amd/display/dc/dc.h b/drivers/gpu/drm/amd/display/dc/dc.h index 69d264dd69a7..8248d4b75066 100644 --- a/drivers/gpu/drm/amd/display/dc/dc.h +++ b/drivers/gpu/drm/amd/display/dc/dc.h @@ -1125,7 +1125,7 @@ struct dc_validation_set { uint8_t plane_count; }; -bool dc_validate_seamless_boot_timing(const struct dc *dc, +bool dc_validate_boot_timing(const struct dc *dc, const struct dc_sink *sink, struct dc_crtc_timing *crtc_timing); diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c index 8c32b9cb3b49..52b22a944f94 100644 --- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c +++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c @@ -1761,7 +1761,8 @@ void dce110_enable_accelerated_mode(struct dc *dc, struct dc_state *context) edp_link->link_status.link_active) { struct dc_stream_state *edp_stream = edp_streams[0]; - can_apply_edp_fast_boot = !is_edp_ilr_optimization_required(edp_stream->link, _stream->timing); + can_apply_edp_fast_boot = dc_validate_boot_timing(dc, + edp_stream->sink, _stream->timing); edp_stream->apply_edp_fast_boot_optimization = can_apply_edp_fast_boot; if (can_apply_edp_fast_boot) DC_LOG_EVENT_LINK_TRAINING("eDP fast boot disabled to optimize link rate\n"); -- 2.25.1
[PATCH 06/14] dc: do blocked MST topology discovery at resume from S3/S4
From: "Guo, Bing" Why: When resume from sleep or hiberation, blocked MST Topology discovery might need to be used. How: Added "DETECT_REASON_RESUMEFROMS3S4" to enum dc_detect_reason; use it to require blocked MST Topology discovery. Reviewed-by: Wenjing Liu Acked-by: Jasdeep Dhillon Signed-off-by: "Bing Guo" --- drivers/gpu/drm/amd/display/dc/core/dc_link.c | 2 +- drivers/gpu/drm/amd/display/dc/dc_link.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link.c b/drivers/gpu/drm/amd/display/dc/core/dc_link.c index cb6c91cd6e83..ec4b300ec067 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_link.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_link.c @@ -831,7 +831,7 @@ static bool discover_dp_mst_topology(struct dc_link *link, enum dc_detect_reason link->type = dc_connection_mst_branch; dm_helpers_dp_update_branch_info(link->ctx, link); if (dm_helpers_dp_mst_start_top_mgr(link->ctx, - link, reason == DETECT_REASON_BOOT)) { + link, (reason == DETECT_REASON_BOOT || reason == DETECT_REASON_RESUMEFROMS3S4))) { link_disconnect_sink(link); } else { link->type = dc_connection_sst_branch; diff --git a/drivers/gpu/drm/amd/display/dc/dc_link.h b/drivers/gpu/drm/amd/display/dc/dc_link.h index e7ead9e25318..9ad3ee4079c3 100644 --- a/drivers/gpu/drm/amd/display/dc/dc_link.h +++ b/drivers/gpu/drm/amd/display/dc/dc_link.h @@ -312,6 +312,7 @@ void dc_link_blank_dp_stream(struct dc_link *link, bool hw_init); */ enum dc_detect_reason { DETECT_REASON_BOOT, + DETECT_REASON_RESUMEFROMS3S4, DETECT_REASON_HPD, DETECT_REASON_HPDRX, DETECT_REASON_FALLBACK, -- 2.25.1
[PATCH 07/14] drm/amd/display: fix yellow carp wm clamping
From: Dmytro Laktyushkin Fix clamping to match register field size Reviewed-by: Charlene Liu Acked-by: Jasdeep Dhillon Signed-off-by: Dmytro Laktyushkin --- .../drm/amd/display/dc/dcn31/dcn31_hubbub.c | 61 ++- 1 file changed, 32 insertions(+), 29 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hubbub.c b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hubbub.c index 90c73a1cb986..5e3bcaf12cac 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hubbub.c +++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hubbub.c @@ -138,8 +138,11 @@ static uint32_t convert_and_clamp( ret_val = wm_ns * refclk_mhz; ret_val /= 1000; - if (ret_val > clamp_value) + if (ret_val > clamp_value) { + /* clamping WMs is abnormal, unexpected and may lead to underflow*/ + ASSERT(0); ret_val = clamp_value; + } return ret_val; } @@ -159,7 +162,7 @@ static bool hubbub31_program_urgent_watermarks( if (safe_to_lower || watermarks->a.urgent_ns > hubbub2->watermarks.a.urgent_ns) { hubbub2->watermarks.a.urgent_ns = watermarks->a.urgent_ns; prog_wm_value = convert_and_clamp(watermarks->a.urgent_ns, - refclk_mhz, 0x1f); + refclk_mhz, 0x3fff); REG_SET(DCHUBBUB_ARB_DATA_URGENCY_WATERMARK_A, 0, DCHUBBUB_ARB_DATA_URGENCY_WATERMARK_A, prog_wm_value); @@ -193,7 +196,7 @@ static bool hubbub31_program_urgent_watermarks( if (safe_to_lower || watermarks->a.urgent_latency_ns > hubbub2->watermarks.a.urgent_latency_ns) { hubbub2->watermarks.a.urgent_latency_ns = watermarks->a.urgent_latency_ns; prog_wm_value = convert_and_clamp(watermarks->a.urgent_latency_ns, - refclk_mhz, 0x1f); + refclk_mhz, 0x3fff); REG_SET(DCHUBBUB_ARB_REFCYC_PER_TRIP_TO_MEMORY_A, 0, DCHUBBUB_ARB_REFCYC_PER_TRIP_TO_MEMORY_A, prog_wm_value); } else if (watermarks->a.urgent_latency_ns < hubbub2->watermarks.a.urgent_latency_ns) @@ -203,7 +206,7 @@ static bool hubbub31_program_urgent_watermarks( if (safe_to_lower || watermarks->b.urgent_ns > hubbub2->watermarks.b.urgent_ns) { hubbub2->watermarks.b.urgent_ns = watermarks->b.urgent_ns; prog_wm_value = convert_and_clamp(watermarks->b.urgent_ns, - refclk_mhz, 0x1f); + refclk_mhz, 0x3fff); REG_SET(DCHUBBUB_ARB_DATA_URGENCY_WATERMARK_B, 0, DCHUBBUB_ARB_DATA_URGENCY_WATERMARK_B, prog_wm_value); @@ -237,7 +240,7 @@ static bool hubbub31_program_urgent_watermarks( if (safe_to_lower || watermarks->b.urgent_latency_ns > hubbub2->watermarks.b.urgent_latency_ns) { hubbub2->watermarks.b.urgent_latency_ns = watermarks->b.urgent_latency_ns; prog_wm_value = convert_and_clamp(watermarks->b.urgent_latency_ns, - refclk_mhz, 0x1f); + refclk_mhz, 0x3fff); REG_SET(DCHUBBUB_ARB_REFCYC_PER_TRIP_TO_MEMORY_B, 0, DCHUBBUB_ARB_REFCYC_PER_TRIP_TO_MEMORY_B, prog_wm_value); } else if (watermarks->b.urgent_latency_ns < hubbub2->watermarks.b.urgent_latency_ns) @@ -247,7 +250,7 @@ static bool hubbub31_program_urgent_watermarks( if (safe_to_lower || watermarks->c.urgent_ns > hubbub2->watermarks.c.urgent_ns) { hubbub2->watermarks.c.urgent_ns = watermarks->c.urgent_ns; prog_wm_value = convert_and_clamp(watermarks->c.urgent_ns, - refclk_mhz, 0x1f); + refclk_mhz, 0x3fff); REG_SET(DCHUBBUB_ARB_DATA_URGENCY_WATERMARK_C, 0, DCHUBBUB_ARB_DATA_URGENCY_WATERMARK_C, prog_wm_value); @@ -281,7 +284,7 @@ static bool hubbub31_program_urgent_watermarks( if (safe_to_lower || watermarks->c.urgent_latency_ns > hubbub2->watermarks.c.urgent_latency_ns) { hubbub2->watermarks.c.urgent_latency_ns = watermarks->c.urgent_latency_ns; prog_wm_value = convert_and_clamp(watermarks->c.urgent_latency_ns, - refclk_mhz, 0x1f); + refclk_mhz, 0x3fff); REG_SET(DCHUBBUB_ARB_REFCYC_PER_TRIP_TO_MEMORY_C, 0, DCHUBBUB_ARB_REFCYC_PER_TRIP_TO_MEMORY_C, prog_wm_value); } else if (watermarks->c.urgent_latency_ns < hubbub2->watermarks.c.urgent_latency_ns) @@ -291,7 +294,7 @@ static bool hubbub31_program_urgent_watermarks( if (safe_to_lower || watermarks->d.urgent_ns > hubbub2->watermarks.d.urgent_ns) {
[PATCH 03/14] drm/amd/display: limit unbounded requesting to 5k
From: Dmytro Laktyushkin Unbounded requesting is unsupported on pipe split modes and this change prevents us running into such a situation with wide modes. Reviewed-by: Charlene Liu Acked-by: Jasdeep Dhillon Signed-off-by: Dmytro Laktyushkin --- drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c index 7f9ceda4229b..007a7dc4f5be 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c @@ -1836,7 +1836,8 @@ static int dcn31_populate_dml_pipes_from_context( if (is_dual_plane(pipe->plane_state->format) && pipe->plane_state->src_rect.width <= 1920 && pipe->plane_state->src_rect.height <= 1080) { dc->config.enable_4to1MPC = true; - } else if (!is_dual_plane(pipe->plane_state->format)) { + } else if (!is_dual_plane(pipe->plane_state->format) && pipe->plane_state->src_rect.width <= 5120) { + /* Limit to 5k max to avoid forced pipe split when there is not enough detile for swath */ context->bw_ctx.dml.ip.det_buffer_size_kbytes = 192; pipes[0].pipe.src.unbounded_req_mode = true; } -- 2.25.1
[PATCH 05/14] drm/amd/display: remove static from optc31_set_drr
From: Eric Bernstein remove static from optc31_set_drr Reviewed-by: Nevenko Stupar Acked-by: Jasdeep Dhillon Signed-off-by: Eric Bernstein --- drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c | 2 +- drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c index e8562fa11366..8afe2130d7c5 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c +++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c @@ -161,7 +161,7 @@ static bool optc31_immediate_disable_crtc(struct timing_generator *optc) return true; } -static void optc31_set_drr( +void optc31_set_drr( struct timing_generator *optc, const struct drr_params *params) { diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h index d8ef2f0d0c95..a37b16040c1d 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h +++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h @@ -256,4 +256,6 @@ void dcn31_timing_generator_init(struct optc *optc1); +void optc31_set_drr(struct timing_generator *optc, const struct drr_params *params); + #endif /* __DC_OPTC_DCN31_H__ */ -- 2.25.1
[PATCH 04/14] drm/amd/display: Basic support with device ID
From: Oliver Logush [why] To get the the cyan_skillfish check working Reviewed-by: Charlene Liu Reviewed-by: Charlene Liu Acked-by: Jasdeep Dhillon Signed-off-by: Oliver Logush --- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 24 +-- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 2 +- .../gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c | 2 +- .../gpu/drm/amd/display/dc/core/dc_resource.c | 2 +- .../gpu/drm/amd/display/include/dal_asic_id.h | 3 ++- 5 files changed, 27 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 8f53c9f6b267..f5941e59e5ad 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -1014,6 +1014,14 @@ static void amdgpu_dm_audio_eld_notify(struct amdgpu_device *adev, int pin) } } +bool is_skillfish_series(struct amdgpu_device *adev) +{ + if (adev->asic_type == CHIP_CYAN_SKILLFISH || adev->pdev->revision == 0x143F) { + return true; + } + return false; +} + static int dm_dmub_hw_init(struct amdgpu_device *adev) { const struct dmcub_firmware_header_v1_0 *hdr; @@ -1049,7 +1057,7 @@ static int dm_dmub_hw_init(struct amdgpu_device *adev) return -EINVAL; } - if (!has_hw_support) { + if (is_skillfish_series(adev)) { DRM_INFO("DMUB unsupported on ASIC\n"); return 0; } @@ -1471,6 +1479,10 @@ static int amdgpu_dm_init(struct amdgpu_device *adev) default: break; } + if (is_skillfish_series(adev)) { + init_data.flags.disable_dmcu = true; + break; + } break; } @@ -1777,7 +1789,6 @@ static int load_dmcu_fw(struct amdgpu_device *adev) case CHIP_VEGA10: case CHIP_VEGA12: case CHIP_VEGA20: - return 0; case CHIP_NAVI12: fw_name_dmcu = FIRMWARE_NAVI12_DMCU; break; @@ -1805,6 +1816,9 @@ static int load_dmcu_fw(struct amdgpu_device *adev) default: break; } + if (is_skillfish_series(adev)) { + return 0; + } DRM_ERROR("Unsupported ASIC type: 0x%X\n", adev->asic_type); return -EINVAL; } @@ -4515,6 +4529,12 @@ static int dm_early_init(void *handle) adev->mode_info.num_dig = 6; break; default: + if (is_skillfish_series(adev)) { + adev->mode_info.num_crtc = 2; + adev->mode_info.num_hpd = 2; + adev->mode_info.num_dig = 2; + break; + } #if defined(CONFIG_DRM_AMD_DC_DCN) switch (adev->ip_versions[DCE_HWIP][0]) { case IP_VERSION(2, 0, 2): diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h index e35977fda5c1..13875d669acd 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h @@ -82,7 +82,7 @@ struct common_irq_params { enum dc_irq_source irq_src; atomic64_t previous_timestamp; }; - +bool is_skillfish_series(struct amdgpu_device *adev); /** * struct dm_compressor_info - Buffer info used by frame buffer compression * @cpu_addr: MMIO cpu addr diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c index 9200c8ce02ba..54ef83fe5a9b 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c @@ -259,7 +259,7 @@ struct clk_mgr *dc_clk_mgr_create(struct dc_context *ctx, struct pp_smu_funcs *p dcn3_clk_mgr_construct(ctx, clk_mgr, pp_smu, dccg); return _mgr->base; } - if (asic_id.chip_id == DEVICE_ID_NV_13FE) { + if (asic_id.chip_id == DEVICE_ID_NV_NAVI10_LITE_P_13FE) { dcn201_clk_mgr_construct(ctx, clk_mgr, pp_smu, dccg); return _mgr->base; } diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c index 9df66501a453..8a199d661a66 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c @@ -135,7 +135,7 @@ enum dce_version resource_parse_asic_id(struct hw_asic_id asic_id) case FAMILY_NV: dc_version = DCN_VERSION_2_0; - if (asic_id.chip_id == DEVICE_ID_NV_13FE) { + if (asic_id.chip_id == DEVICE_ID_NV_NAVI10_LITE_P_13FE || asic_id.chip_id == DEVICE_ID_NV_NAVI10_LITE_P_143F) {
[PATCH 02/14] drm/amd/display: Fix stream->link_enc unassigned during stream removal
From: Nicholas Kazlauskas [Why] Found when running igt@kms_atomic. Userspace attempts to do a TEST_COMMIT when 0 streams which calls dc_remove_stream_from_ctx. This in turn calls link_enc_unassign which ends up modifying stream->link = NULL directly, causing the global link_enc to be removed preventing further link activity and future link validation from passing. [How] We take care of link_enc unassignment at the start of link_enc_cfg_link_encs_assign so this call is no longer necessary. Fixes global state from being modified while unlocked. Reviewed-by: Jimmy Kizito Acked-by: Jasdeep Dhillon Signed-off-by: Nicholas Kazlauskas --- drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c index e82aa0559bdf..9df66501a453 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c @@ -1965,10 +1965,6 @@ enum dc_status dc_remove_stream_from_ctx( dc->res_pool, del_pipe->stream_res.stream_enc, false); - /* Release link encoder from stream in new dc_state. */ - if (dc->res_pool->funcs->link_enc_unassign) - dc->res_pool->funcs->link_enc_unassign(new_ctx, del_pipe->stream); - if (is_dp_128b_132b_signal(del_pipe)) { update_hpo_dp_stream_engine_usage( _ctx->res_ctx, dc->res_pool, -- 2.25.1
[PATCH 01/14] drm/amd/display: Fix for variable may be used uninitialized error
From: Eric Bernstein [Why] Build failure due to ‘status’ may be used uninitialized [How] Initialize status to LINK_TRAINING_SUCCESS Reviewed-by: Wenjing Liu Acked-by: Jasdeep Dhillon Signed-off-by: Eric Bernstein --- drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c index 18fe8f77821a..d0cb40df60a4 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c @@ -3199,7 +3199,7 @@ static bool dp_verify_link_cap( bool success = false; bool skip_video_pattern; enum clock_source_id dp_cs_id = get_clock_source_id(link); - enum link_training_result status; + enum link_training_result status = LINK_TRAINING_SUCCESS; union hpd_irq_data irq_data; struct link_resource link_res; -- 2.25.1
[PATCH 00/14] DC Patchset, Feb 7 2022 v2
This DC patchset brings improvements in multiple areas. In summary, we have: -fix for build failure uninitalized error -Bug fix for DP2 using uncertified cable -limit unbounded request to 5k -fix DP LT sequence on EQ fail -Bug fixes for S3/S4 Anthony Koo (1): drm/amd/display: [FW Promotion] Release 0.0.103.0 Aric Cyr (1): drm/amd/display: 3.2.172 Dmytro Laktyushkin (2): drm/amd/display: limit unbounded requesting to 5k drm/amd/display: fix yellow carp wm clamping Eric Bernstein (2): drm/amd/display: Fix for variable may be used uninitialized error drm/amd/display: remove static from optc31_set_drr Guo, Bing (1): dc: do blocked MST topology discovery at resume from S3/S4 Ilya (1): drm/amd/display: Fix DP LT sequence on EQ fail Martin Tsai (1): drm/amd/display: handle null link encoder Nicholas Kazlauskas (1): drm/amd/display: Fix stream->link_enc unassigned during stream removal Oliver Logush (2): drm/amd/display: Basic support with device ID SWDEV-321758 - dc: Code clean Paul Hsieh (1): drm/amd/display: change fastboot timing validation Zhan Liu (1): drm/amd/display: keep eDP Vdd on when eDP stream is already enabled .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 24 +++- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 2 +- .../gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c | 2 +- drivers/gpu/drm/amd/display/dc/core/dc.c | 2 +- drivers/gpu/drm/amd/display/dc/core/dc_link.c | 27 ++-- .../gpu/drm/amd/display/dc/core/dc_link_dp.c | 4 +- .../gpu/drm/amd/display/dc/core/dc_resource.c | 8 +-- drivers/gpu/drm/amd/display/dc/dc.h | 4 +- drivers/gpu/drm/amd/display/dc/dc_link.h | 1 + .../display/dc/dce110/dce110_hw_sequencer.c | 27 +++- .../drm/amd/display/dc/dcn20/dcn20_resource.c | 11 +--- .../drm/amd/display/dc/dcn31/dcn31_hubbub.c | 61 ++- .../gpu/drm/amd/display/dc/dcn31/dcn31_optc.c | 2 +- .../gpu/drm/amd/display/dc/dcn31/dcn31_optc.h | 2 + .../drm/amd/display/dc/dcn31/dcn31_resource.c | 3 +- .../gpu/drm/amd/display/dmub/inc/dmub_cmd.h | 4 +- .../gpu/drm/amd/display/include/dal_asic_id.h | 3 +- 17 files changed, 104 insertions(+), 83 deletions(-) -- 2.25.1
[PATCH 2/3] drm/amdkfd: Remove unused old debugger implementation
Cleanup the kfd code by removing the unused old debugger implementation. Only a small piece of resetting wavefronts is kept and is moved to kfd_device_queue_manager.c Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdkfd/Makefile | 2 - drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 282 +- drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c | 845 -- drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.h | 230 - drivers/gpu/drm/amd/amdkfd/kfd_dbgmgr.c | 158 drivers/gpu/drm/amd/amdkfd/kfd_dbgmgr.h | 293 -- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 2 - .../drm/amd/amdkfd/kfd_device_queue_manager.c | 59 ++ .../drm/amd/amdkfd/kfd_device_queue_manager.h | 35 + drivers/gpu/drm/amd/amdkfd/kfd_iommu.c| 12 - drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 5 - drivers/gpu/drm/amd/amdkfd/kfd_process.c | 19 - 12 files changed, 98 insertions(+), 1844 deletions(-) delete mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c delete mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.h delete mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_dbgmgr.c delete mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_dbgmgr.h diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile b/drivers/gpu/drm/amd/amdkfd/Makefile index c4f3aff11072..19cfbf9577b4 100644 --- a/drivers/gpu/drm/amd/amdkfd/Makefile +++ b/drivers/gpu/drm/amd/amdkfd/Makefile @@ -51,8 +51,6 @@ AMDKFD_FILES := $(AMDKFD_PATH)/kfd_module.o \ $(AMDKFD_PATH)/kfd_events.o \ $(AMDKFD_PATH)/cik_event_interrupt.o \ $(AMDKFD_PATH)/kfd_int_process_v9.o \ - $(AMDKFD_PATH)/kfd_dbgdev.o \ - $(AMDKFD_PATH)/kfd_dbgmgr.o \ $(AMDKFD_PATH)/kfd_smi_events.o \ $(AMDKFD_PATH)/kfd_crat.o diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index 64e3b4e3a712..cfe12525165f 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -39,7 +39,6 @@ #include #include "kfd_priv.h" #include "kfd_device_queue_manager.h" -#include "kfd_dbgmgr.h" #include "kfd_svm.h" #include "amdgpu_amdkfd.h" #include "kfd_smi_events.h" @@ -580,299 +579,26 @@ static int kfd_ioctl_set_trap_handler(struct file *filep, static int kfd_ioctl_dbg_register(struct file *filep, struct kfd_process *p, void *data) { - struct kfd_ioctl_dbg_register_args *args = data; - struct kfd_dev *dev; - struct kfd_dbgmgr *dbgmgr_ptr; - struct kfd_process_device *pdd; - bool create_ok; - long status = 0; - - mutex_lock(>mutex); - pdd = kfd_process_device_data_by_id(p, args->gpu_id); - if (!pdd) { - status = -EINVAL; - goto err_pdd; - } - dev = pdd->dev; - - if (dev->adev->asic_type == CHIP_CARRIZO) { - pr_debug("kfd_ioctl_dbg_register not supported on CZ\n"); - status = -EINVAL; - goto err_chip_unsupp; - } - - mutex_lock(kfd_get_dbgmgr_mutex()); - - /* -* make sure that we have pdd, if this the first queue created for -* this process -*/ - pdd = kfd_bind_process_to_device(dev, p); - if (IS_ERR(pdd)) { - status = PTR_ERR(pdd); - goto out; - } - - if (!dev->dbgmgr) { - /* In case of a legal call, we have no dbgmgr yet */ - create_ok = kfd_dbgmgr_create(_ptr, dev); - if (create_ok) { - status = kfd_dbgmgr_register(dbgmgr_ptr, p); - if (status != 0) - kfd_dbgmgr_destroy(dbgmgr_ptr); - else - dev->dbgmgr = dbgmgr_ptr; - } - } else { - pr_debug("debugger already registered\n"); - status = -EINVAL; - } - -out: - mutex_unlock(kfd_get_dbgmgr_mutex()); -err_pdd: -err_chip_unsupp: - mutex_unlock(>mutex); - - return status; + return -EPERM; } static int kfd_ioctl_dbg_unregister(struct file *filep, struct kfd_process *p, void *data) { - struct kfd_ioctl_dbg_unregister_args *args = data; - struct kfd_process_device *pdd; - struct kfd_dev *dev; - long status; - - mutex_lock(>mutex); - pdd = kfd_process_device_data_by_id(p, args->gpu_id); - mutex_unlock(>mutex); - if (!pdd || !pdd->dev->dbgmgr) - return -EINVAL; - - dev = pdd->dev; - - if (dev->adev->asic_type == CHIP_CARRIZO) { - pr_debug("kfd_ioctl_dbg_unregister not supported on CZ\n"); - return -EINVAL; - } - - mutex_lock(kfd_get_dbgmgr_mutex()); - - status = kfd_dbgmgr_unregister(dev->dbgmgr, p); - if (!status) { - kfd_dbgmgr_destroy(dev->dbgmgr); -
[PATCH 3/3] drm/amdkfd: Consolidate MQD manager functions
A few MQD manager functions are duplicated for all versions of MQD manager. Remove this duplication by moving the common functions into kfd_mqd_manager.c file. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c | 63 + drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h | 27 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c | 54 --- .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c | 61 - .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c | 68 --- .../gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c | 53 --- 6 files changed, 90 insertions(+), 236 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c index e2825ad4d699..f4a6af98db2d 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c @@ -173,3 +173,66 @@ void mqd_symmetrically_map_cu_mask(struct mqd_manager *mm, } } } + +int hiq_load_mqd_kiq(struct mqd_manager *mm, void *mqd, +uint32_t pipe_id, uint32_t queue_id, +struct queue_properties *p, struct mm_struct *mms) +{ + return mm->dev->kfd2kgd->hiq_mqd_load(mm->dev->adev, mqd, pipe_id, + queue_id, p->doorbell_off); +} + +int destroy_mqd(struct mqd_manager *mm, void *mqd, + enum kfd_preempt_type type, unsigned int timeout, + uint32_t pipe_id,uint32_t queue_id) +{ + return mm->dev->kfd2kgd->hqd_destroy(mm->dev->adev, mqd, type, timeout, + pipe_id, queue_id); +} + +void free_mqd(struct mqd_manager *mm, void *mqd, + struct kfd_mem_obj *mqd_mem_obj) +{ + if (mqd_mem_obj->gtt_mem) { + amdgpu_amdkfd_free_gtt_mem(mm->dev->adev, mqd_mem_obj->gtt_mem); + kfree(mqd_mem_obj); + } else { + kfd_gtt_sa_free(mm->dev, mqd_mem_obj); + } +} + +bool is_occupied(struct mqd_manager *mm, void *mqd, +uint64_t queue_address, uint32_t pipe_id, +uint32_t queue_id) +{ + return mm->dev->kfd2kgd->hqd_is_occupied(mm->dev->adev, queue_address, + pipe_id, queue_id); +} + +int load_mqd_sdma(struct mqd_manager *mm, void *mqd, + uint32_t pipe_id, uint32_t queue_id, + struct queue_properties *p, struct mm_struct *mms) +{ + return mm->dev->kfd2kgd->hqd_sdma_load(mm->dev->adev, mqd, + (uint32_t __user *)p->write_ptr, + mms); +} + +/* + * preempt type here is ignored because there is only one way + * to preempt sdma queue + */ +int destroy_mqd_sdma(struct mqd_manager *mm, void *mqd, +enum kfd_preempt_type type, +unsigned int timeout, uint32_t pipe_id, +uint32_t queue_id) +{ + return mm->dev->kfd2kgd->hqd_sdma_destroy(mm->dev->adev, mqd, timeout); +} + +bool is_occupied_sdma(struct mqd_manager *mm, void *mqd, + uint64_t queue_address, uint32_t pipe_id, + uint32_t queue_id) +{ + return mm->dev->kfd2kgd->hqd_sdma_is_occupied(mm->dev->adev, mqd); +} diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h index 23486a23df84..76f20637b938 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h @@ -136,4 +136,31 @@ void mqd_symmetrically_map_cu_mask(struct mqd_manager *mm, const uint32_t *cu_mask, uint32_t cu_mask_count, uint32_t *se_mask); +int hiq_load_mqd_kiq(struct mqd_manager *mm, void *mqd, + uint32_t pipe_id, uint32_t queue_id, + struct queue_properties *p, struct mm_struct *mms); + +int destroy_mqd(struct mqd_manager *mm, void *mqd, + enum kfd_preempt_type type, unsigned int timeout, + uint32_t pipe_id,uint32_t queue_id); + +void free_mqd(struct mqd_manager *mm, void *mqd, + struct kfd_mem_obj *mqd_mem_obj); + +bool is_occupied(struct mqd_manager *mm, void *mqd, +uint64_t queue_address, uint32_t pipe_id, +uint32_t queue_id); + +int load_mqd_sdma(struct mqd_manager *mm, void *mqd, + uint32_t pipe_id, uint32_t queue_id, + struct queue_properties *p, struct mm_struct *mms); + +int destroy_mqd_sdma(struct mqd_manager *mm, void *mqd, + enum kfd_preempt_type type,unsigned int timeout, + uint32_t pipe_id, uint32_t queue_id); + +bool is_occupied_sdma(struct mqd_manager *mm, void *mqd, + uint64_t queue_address, uint32_t pipe_id, + uint32_t queue_id); + #endif /* KFD_MQD_MANAGER_H_ */ diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
[PATCH 1/3] drm/amdkfd: Fix TLB flushing in KFD SVM with no HWS
With no HWS, TLB flushing will not work in SVM code. Fix this by calling kfd_flush_tlb() which works for both HWS and no HWS case. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 16 ++-- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 41f03d165bad..b1315c97b952 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -1229,15 +1229,14 @@ svm_range_unmap_from_gpus(struct svm_range *prange, unsigned long start, if (r) break; } - amdgpu_amdkfd_flush_gpu_tlb_pasid(pdd->dev->adev, - p->pasid, TLB_FLUSH_HEAVYWEIGHT); + kfd_flush_tlb(pdd, TLB_FLUSH_HEAVYWEIGHT); } return r; } static int -svm_range_map_to_gpu(struct amdgpu_device *adev, struct amdgpu_vm *vm, +svm_range_map_to_gpu(struct amdgpu_device *adev, struct kfd_process_device *pdd, struct svm_range *prange, unsigned long offset, unsigned long npages, bool readonly, dma_addr_t *dma_addr, struct amdgpu_device *bo_adev, struct dma_fence **fence) @@ -1248,6 +1247,7 @@ svm_range_map_to_gpu(struct amdgpu_device *adev, struct amdgpu_vm *vm, int last_domain; int r = 0; int64_t i, j; + struct amdgpu_vm *vm = drm_priv_to_vm(pdd->drm_priv); last_start = prange->start + offset; @@ -1305,12 +1305,8 @@ svm_range_map_to_gpu(struct amdgpu_device *adev, struct amdgpu_vm *vm, if (fence) *fence = dma_fence_get(vm->last_update); - if (table_freed) { - struct kfd_process *p; - - p = container_of(prange->svms, struct kfd_process, svms); - amdgpu_amdkfd_flush_gpu_tlb_pasid(adev, p->pasid, TLB_FLUSH_LEGACY); - } + if (table_freed) + kfd_flush_tlb(pdd, TLB_FLUSH_LEGACY); out: return r; } @@ -1351,7 +1347,7 @@ svm_range_map_to_gpus(struct svm_range *prange, unsigned long offset, continue; } - r = svm_range_map_to_gpu(pdd->dev->adev, drm_priv_to_vm(pdd->drm_priv), + r = svm_range_map_to_gpu(pdd->dev->adev, pdd, prange, offset, npages, readonly, prange->dma_addr[gpuidx], bo_adev, wait ? : NULL); -- 2.33.1
Re: [PATCH] Revert "i2c: core: support bus regulator controlling in adapter"
On Sun, Jan 16, 2022 at 2:44 AM Tareque Md.Hanif wrote: > > Hi Hsin-Yi, > > The issue still exists. I reverted a19f75de73c220b4496d2aefb7a605dd032f7c01 > (the commit that reverted 5a7b95fb993ec399c8a685552aa6a8fc995c40bd) and > manually applied the patch (tags/v5.16). journalctl attached. hi Tareque, Can you apply the same setting[1] again and with this patch to see if the issue is still there? https://github.com/torvalds/linux/commit/6dc8265f9803ccb7e5da804e01601f0c14f270e0 [1] reverted a19f75de73c220b4496d2aefb7a605dd032f7c01 (the commit that reverted 5a7b95fb993ec399c8a685552aa6a8fc995c40bd) and manually applied the patch (tags/v5.16) Thanks > > Regards, > > Tareque > > On Saturday, January 15, 2022, 11:27:07 PM GMT+6, Hsin-Yi Wang > wrote: > > > hi Tareque, > > > On Fri, Jan 14, 2022 at 6:09 PM Tareque Md Hanif > wrote: > > > > Hi Hsin-Yi, > > > > On 1/12/22 16:58, Hsin-Yi Wang wrote: > > > > Can you help provide logs if we apply > > 5a7b95fb993ec399c8a685552aa6a8fc995c40bd but revert > > 8d35a2596164c1c9d34d4656fd42b445cd1e247f? > > > > Issue still exists. journalctl log attached in revert_8d.txt > > > > > > > after apply 5a7b95fb993ec399c8a685552aa6a8fc995c40bd > > > 1. delete SET_LATE_SYSTEM_SLEEP_PM_OPS(i2c_suspend_late, > > > i2c_resume_early) and function i2c_suspend_late() and > > > i2c_resume_early(). > > > > No issues. journalctl log attached in test1.txt > > > > > > > 2. delete SET_RUNTIME_PM_OPS(i2c_runtime_suspend, i2c_runtime_resume, > > > NULL) and function i2c_runtime_suspend() and i2c_runtime_resume(). > > > > Issue exists. journalctl log attached in test2.txt > > > Thanks for the testing. > Can you help us test if applying the following patch on top of > 5a7b95fb993ec399c8a685552aa6a8fc995c40bd works? Thanks > > diff --git a/drivers/i2c/i2c-core-base.c b/drivers/i2c/i2c-core-base.c > index 9eb4009cb250..6b046012aa08 100644 > --- a/drivers/i2c/i2c-core-base.c > +++ b/drivers/i2c/i2c-core-base.c > @@ -484,7 +484,7 @@ static int i2c_resume_early(struct device *dev) > struct i2c_client *client = i2c_verify_client(dev); > int err; > > - if (!client) > + if (!client || dev_pm_skip_resume(dev)) > return 0; > > if (pm_runtime_status_suspended(>dev) && > @@ -502,7 +502,7 @@ static int i2c_suspend_late(struct device *dev) > struct i2c_client *client = i2c_verify_client(dev); > int err; > > - if (!client) > + if (!client || dev_pm_skip_suspend(dev)) > return 0; > > err = pm_generic_suspend_late(>dev); >
Re: [PATCH] drm/amdgpu: move lockdep assert to the right place.
On 2/4/2022 1:50 PM, Christian König wrote: Am 04.02.22 um 19:47 schrieb Bhardwaj, Rajneesh: On 2/4/2022 1:32 PM, Christian König wrote: Am 04.02.22 um 19:12 schrieb Bhardwaj, Rajneesh: [Sorry for top posting] Hi Christian I think you forgot the below hunk, without which the issue is not fixed completely on a multi GPU system. No, that is perfectly intentional. While removing a bo_va structure it can happen that there are still mappings attached to it (for example because the application crashed). OK. but for me, at boot time, I see flood of warning messages coming from amdgpu_vm_bo_del so the previous patch doesn't help. Do you have a backtrace? That should not happen. Could be that we have this buggy at a couple of different places. This is on Ubuntu 18.04. [ 8.392405] WARNING: CPU: 13 PID: 2732 at /home/rajneesh/git/compute/kernel/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:2673 amdgpu_vm_bo_del+0x386/0x3b 0 [amdgpu] [ 8.392586] Modules linked in: amdgpu ast iommu_v2 gpu_sched drm_vram_helper drm_ttm_helper ttm drm_kms_helper cfbfillrect syscopyarea cfbimgblt sy sfillrect sysimgblt fb_sys_fops cfbcopyarea drm drm_panel_orientation_quirks k10temp nf_tables nfnetlink ip_tables x_tables i2c_piix4 [ 8.392628] CPU: 13 PID: 2732 Comm: plymouthd Not tainted 5.13.0-kfd-rajneesh #1055 [ 8.392632] Hardware name: GIGABYTE MZ01-CE0-00/MZ01-CE0-00, BIOS F02 08/29/2018 [ 8.392635] RIP: 0010:amdgpu_vm_bo_del+0x386/0x3b0 [amdgpu] [ 8.392749] Code: 0f 85 56 fe ff ff 48 c7 c2 28 6b b3 c0 be 21 01 00 00 48 c7 c7 58 6b b3 c0 c6 05 64 64 62 00 01 e8 5f be a4 d4 e9 32 fe ff ff <0f > 0b eb 8a 49 8b be 88 01 00 00 e9 af fc ff ff be 03 00 00 00 e8 [ 8.392752] RSP: 0018:af33471d7d98 EFLAGS: 00010246 [ 8.392756] RAX: RBX: a0877160 RCX: 0001 [ 8.392758] RDX: 0001 RSI: a08772ae01f8 RDI: a0800a426f68 [ 8.392761] RBP: a087691fd980 R08: c0a4e2e0 R09: 000a [ 8.392763] R10: af33471d7d68 R11: 0001 R12: a0801d540010 [ 8.392765] R13: a08771615c00 R14: a08024166618 R15: a08771615e60 [ 8.392768] FS: 7f7b80179740() GS:a08f3dec() knlGS: [ 8.392771] CS: 0010 DS: ES: CR0: 80050033 [ 8.392773] CR2: 55839db51eb8 CR3: 00010f49c000 CR4: 003506e0 [ 8.392775] Call Trace: [ 8.392779] ? _raw_spin_unlock_irqrestore+0x30/0x40 [ 8.392787] amdgpu_driver_postclose_kms+0x94/0x270 [amdgpu] [ 8.392897] drm_file_free.part.14+0x1e3/0x230 [drm] [ 8.392918] drm_release+0x6e/0xf0 [drm] [ 8.392937] __fput+0xa3/0x250 [ 8.392942] task_work_run+0x6d/0xb0 [ 8.392949] exit_to_user_mode_prepare+0x1c9/0x1d0 [ 8.392953] syscall_exit_to_user_mode+0x19/0x50 [ 8.392957] do_syscall_64+0x42/0x70 [ 8.392960] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 8.392964] RIP: 0033:0x7f7b7f8679e4 [ 8.392967] Code: eb 89 e8 6f 44 02 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8d 05 21 ff 2d 00 8b 00 85 c0 75 13 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 3c f3 c3 66 90 53 89 fb 48 83 ec 10 e8 94 fe [ 8.392970] RSP: 002b:7ffe6bb0cdb8 EFLAGS: 0246 ORIG_RAX: 0003 [ 8.392973] RAX: RBX: 55839db4b760 RCX: 7f7b7f8679e4 [ 8.392974] RDX: 55839db4aed0 RSI: RDI: 000b [ 8.392976] RBP: 000b R08: 55839db4aee0 R09: 7f7b7fb42c40 [ 8.392978] R10: 0007 R11: 0246 R12: e280 [ 8.392979] R13: 000d R14: 7f7b7fb5b1e0 R15: 7f7b7fb5b130 [ 8.392988] irq event stamp: 1264799 [ 8.392990] hardirqs last enabled at (1264805): [] console_unlock+0x339/0x530 [ 8.392994] hardirqs last disabled at (1264810): [] console_unlock+0x3a6/0x530 Regards, Christian. Because of this locking the VM before the remove is mandatory. Only while adding a bo_va structure we can avoid that. Regards, Christian. diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index dcc80d6e099e..6f68fc9da56a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -2670,8 +2670,6 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev, struct amdgpu_vm *vm = bo_va->base.vm; struct amdgpu_vm_bo_base **base; - dma_resv_assert_held(vm->root.bo->tbo.base.resv); - if (bo) { dma_resv_assert_held(bo->tbo.base.resv); if (bo->tbo.base.resv == vm->root.bo->tbo.base.resv) If you chose to include the above hunk, please feel free to add Reviewed-and-tested-by: Rajneesh Bhardwaj On 2/4/2022 11:27 AM, Felix Kuehling wrote: Am 2022-02-04 um 03:52 schrieb Christian König: Since newly added BOs don't have any mappings it's ok to add them without holding the VM lock. Only when we add per VM BOs the
[PATCH 03/13] drm/amd/display: limit unbounded requesting to 5k
From: Dmytro Laktyushkin Unbounded requesting is unsupported on pipe split modes and this change prevents us running into such a situation with wide modes. Reviewed-by: Charlene Liu Acked-by: Jasdeep Dhillon Signed-off-by: Dmytro Laktyushkin --- drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c index 7f9ceda4229b..007a7dc4f5be 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c @@ -1836,7 +1836,8 @@ static int dcn31_populate_dml_pipes_from_context( if (is_dual_plane(pipe->plane_state->format) && pipe->plane_state->src_rect.width <= 1920 && pipe->plane_state->src_rect.height <= 1080) { dc->config.enable_4to1MPC = true; - } else if (!is_dual_plane(pipe->plane_state->format)) { + } else if (!is_dual_plane(pipe->plane_state->format) && pipe->plane_state->src_rect.width <= 5120) { + /* Limit to 5k max to avoid forced pipe split when there is not enough detile for swath */ context->bw_ctx.dml.ip.det_buffer_size_kbytes = 192; pipes[0].pipe.src.unbounded_req_mode = true; } -- 2.25.1
[PATCH 04/13] drm/amd/display: Basic support with device ID
From: Oliver Logush [why] To get the the cyan_skillfish check working Reviewed-by: Charlene Liu Reviewed-by: Charlene Liu Acked-by: Jasdeep Dhillon Signed-off-by: Oliver Logush --- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 24 +-- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 2 +- .../gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c | 2 +- .../gpu/drm/amd/display/dc/core/dc_resource.c | 2 +- .../gpu/drm/amd/display/include/dal_asic_id.h | 3 ++- 5 files changed, 27 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 8f53c9f6b267..f5941e59e5ad 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -1014,6 +1014,14 @@ static void amdgpu_dm_audio_eld_notify(struct amdgpu_device *adev, int pin) } } +bool is_skillfish_series(struct amdgpu_device *adev) +{ + if (adev->asic_type == CHIP_CYAN_SKILLFISH || adev->pdev->revision == 0x143F) { + return true; + } + return false; +} + static int dm_dmub_hw_init(struct amdgpu_device *adev) { const struct dmcub_firmware_header_v1_0 *hdr; @@ -1049,7 +1057,7 @@ static int dm_dmub_hw_init(struct amdgpu_device *adev) return -EINVAL; } - if (!has_hw_support) { + if (is_skillfish_series(adev)) { DRM_INFO("DMUB unsupported on ASIC\n"); return 0; } @@ -1471,6 +1479,10 @@ static int amdgpu_dm_init(struct amdgpu_device *adev) default: break; } + if (is_skillfish_series(adev)) { + init_data.flags.disable_dmcu = true; + break; + } break; } @@ -1777,7 +1789,6 @@ static int load_dmcu_fw(struct amdgpu_device *adev) case CHIP_VEGA10: case CHIP_VEGA12: case CHIP_VEGA20: - return 0; case CHIP_NAVI12: fw_name_dmcu = FIRMWARE_NAVI12_DMCU; break; @@ -1805,6 +1816,9 @@ static int load_dmcu_fw(struct amdgpu_device *adev) default: break; } + if (is_skillfish_series(adev)) { + return 0; + } DRM_ERROR("Unsupported ASIC type: 0x%X\n", adev->asic_type); return -EINVAL; } @@ -4515,6 +4529,12 @@ static int dm_early_init(void *handle) adev->mode_info.num_dig = 6; break; default: + if (is_skillfish_series(adev)) { + adev->mode_info.num_crtc = 2; + adev->mode_info.num_hpd = 2; + adev->mode_info.num_dig = 2; + break; + } #if defined(CONFIG_DRM_AMD_DC_DCN) switch (adev->ip_versions[DCE_HWIP][0]) { case IP_VERSION(2, 0, 2): diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h index e35977fda5c1..13875d669acd 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h @@ -82,7 +82,7 @@ struct common_irq_params { enum dc_irq_source irq_src; atomic64_t previous_timestamp; }; - +bool is_skillfish_series(struct amdgpu_device *adev); /** * struct dm_compressor_info - Buffer info used by frame buffer compression * @cpu_addr: MMIO cpu addr diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c index 9200c8ce02ba..54ef83fe5a9b 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c @@ -259,7 +259,7 @@ struct clk_mgr *dc_clk_mgr_create(struct dc_context *ctx, struct pp_smu_funcs *p dcn3_clk_mgr_construct(ctx, clk_mgr, pp_smu, dccg); return _mgr->base; } - if (asic_id.chip_id == DEVICE_ID_NV_13FE) { + if (asic_id.chip_id == DEVICE_ID_NV_NAVI10_LITE_P_13FE) { dcn201_clk_mgr_construct(ctx, clk_mgr, pp_smu, dccg); return _mgr->base; } diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c index 9df66501a453..8a199d661a66 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c @@ -135,7 +135,7 @@ enum dce_version resource_parse_asic_id(struct hw_asic_id asic_id) case FAMILY_NV: dc_version = DCN_VERSION_2_0; - if (asic_id.chip_id == DEVICE_ID_NV_13FE) { + if (asic_id.chip_id == DEVICE_ID_NV_NAVI10_LITE_P_13FE || asic_id.chip_id == DEVICE_ID_NV_NAVI10_LITE_P_143F) {
[PATCH 05/13] drm/amd/display: remove static from optc31_set_drr
From: Eric Bernstein remove static from optc31_set_drr Reviewed-by: Nevenko Stupar Acked-by: Jasdeep Dhillon Signed-off-by: Eric Bernstein --- drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c | 2 +- drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c index e8562fa11366..8afe2130d7c5 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c +++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c @@ -161,7 +161,7 @@ static bool optc31_immediate_disable_crtc(struct timing_generator *optc) return true; } -static void optc31_set_drr( +void optc31_set_drr( struct timing_generator *optc, const struct drr_params *params) { diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h index d8ef2f0d0c95..a37b16040c1d 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h +++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h @@ -256,4 +256,6 @@ void dcn31_timing_generator_init(struct optc *optc1); +void optc31_set_drr(struct timing_generator *optc, const struct drr_params *params); + #endif /* __DC_OPTC_DCN31_H__ */ -- 2.25.1
[PATCH 02/13] drm/amd/display: Fix stream->link_enc unassigned during stream removal
From: Nicholas Kazlauskas [Why] Found when running igt@kms_atomic. Userspace attempts to do a TEST_COMMIT when 0 streams which calls dc_remove_stream_from_ctx. This in turn calls link_enc_unassign which ends up modifying stream->link = NULL directly, causing the global link_enc to be removed preventing further link activity and future link validation from passing. [How] We take care of link_enc unassignment at the start of link_enc_cfg_link_encs_assign so this call is no longer necessary. Fixes global state from being modified while unlocked. Reviewed-by: Jimmy Kizito Acked-by: Jasdeep Dhillon Signed-off-by: Nicholas Kazlauskas --- drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c index e82aa0559bdf..9df66501a453 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c @@ -1965,10 +1965,6 @@ enum dc_status dc_remove_stream_from_ctx( dc->res_pool, del_pipe->stream_res.stream_enc, false); - /* Release link encoder from stream in new dc_state. */ - if (dc->res_pool->funcs->link_enc_unassign) - dc->res_pool->funcs->link_enc_unassign(new_ctx, del_pipe->stream); - if (is_dp_128b_132b_signal(del_pipe)) { update_hpo_dp_stream_engine_usage( _ctx->res_ctx, dc->res_pool, -- 2.25.1
[PATCH 00/13] DC Patches Feburary 7, 2022
This DC patchset brings improvements in multiple areas. In summary, we have: -fix for build failure uninitalized error -Bug fix for DP2 using uncertified cable -limit unbounded request to 5k -fix DP LT sequence on EQ fail -Bug fixes for S3/S4 Anthony Koo (1): drm/amd/display: [FW Promotion] Release 0.0.103.0 Aric Cyr (1): drm/amd/display: 3.2.172 Dmytro Laktyushkin (2): drm/amd/display: limit unbounded requesting to 5k drm/amd/display: fix yellow carp wm clamping Eric Bernstein (2): drm/amd/display: Fix for variable may be used uninitialized error drm/amd/display: remove static from optc31_set_drr Guo, Bing (1): dc: do blocked MST topology discovery at resume from S3/S4 Ilya (1): drm/amd/display: Fix DP LT sequence on EQ fail Martin Tsai (1): drm/amd/display: handle null link encoder Nicholas Kazlauskas (1): drm/amd/display: Fix stream->link_enc unassigned during stream removal Oliver Logush (1): drm/amd/display: Basic support with device ID Paul Hsieh (1): drm/amd/display: change fastboot timing validation Zhan Liu (1): drm/amd/display: keep eDP Vdd on when eDP stream is already enabled .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 24 +++- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 2 +- .../gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c | 2 +- drivers/gpu/drm/amd/display/dc/core/dc.c | 2 +- drivers/gpu/drm/amd/display/dc/core/dc_link.c | 27 ++-- .../gpu/drm/amd/display/dc/core/dc_link_dp.c | 4 +- .../gpu/drm/amd/display/dc/core/dc_resource.c | 8 +-- drivers/gpu/drm/amd/display/dc/dc.h | 4 +- drivers/gpu/drm/amd/display/dc/dc_link.h | 1 + .../display/dc/dce110/dce110_hw_sequencer.c | 27 +++- .../drm/amd/display/dc/dcn20/dcn20_resource.c | 11 +--- .../drm/amd/display/dc/dcn31/dcn31_hubbub.c | 61 ++- .../gpu/drm/amd/display/dc/dcn31/dcn31_optc.c | 2 +- .../gpu/drm/amd/display/dc/dcn31/dcn31_optc.h | 2 + .../drm/amd/display/dc/dcn31/dcn31_resource.c | 3 +- .../gpu/drm/amd/display/dmub/inc/dmub_cmd.h | 4 +- .../gpu/drm/amd/display/include/dal_asic_id.h | 3 +- 17 files changed, 104 insertions(+), 83 deletions(-) -- 2.25.1
[PATCH 01/13] drm/amd/display: Fix for variable may be used uninitialized error
From: Eric Bernstein [Why] Build failure due to ‘status’ may be used uninitialized [How] Initialize status to LINK_TRAINING_SUCCESS Reviewed-by: Wenjing Liu Acked-by: Jasdeep Dhillon Signed-off-by: Eric Bernstein --- drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c index 18fe8f77821a..d0cb40df60a4 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c @@ -3199,7 +3199,7 @@ static bool dp_verify_link_cap( bool success = false; bool skip_video_pattern; enum clock_source_id dp_cs_id = get_clock_source_id(link); - enum link_training_result status; + enum link_training_result status = LINK_TRAINING_SUCCESS; union hpd_irq_data irq_data; struct link_resource link_res; -- 2.25.1
Re: [PATCH] drm/amdgpu: Set FRU bus for Aldebaran and Vega 20
On Fri, Feb 4, 2022 at 1:56 PM Luben Tuikov wrote: > > On 2022-02-04 13:53, Alex Deucher wrote: > > On Fri, Feb 4, 2022 at 1:48 PM Luben Tuikov wrote: > >> > >> The FRU and RAS EEPROMs share the same I2C bus on Aldebaran and Vega 20 > >> ASICs. Set the FRU bus "pointer" to this single bus, as access to the FRU > >> is sought through that bus "pointer" and not through the RAS bus "pointer". > >> > >> Cc: Roy Sun > >> Cc: Alex Deucher > >> Fixes: 9c6d2ba7057abb ("drm/amd: Expose the FRU SMU I2C bus") > >> Signed-off-by: Luben Tuikov > >> --- > >> drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c | 3 ++- > >> drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 1 + > >> 2 files changed, 3 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c > >> b/drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c > >> index 87acb089cfccd5..dd2d66090d2374 100644 > >> --- a/drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c > >> +++ b/drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c > >> @@ -741,7 +741,7 @@ int smu_v11_0_i2c_control_init(struct amdgpu_device > >> *adev) > >> i2c_set_adapdata(control, smu_i2c); > >> > >> adev->pm.ras_eeprom_i2c_bus = >pm.smu_i2c[0].adapter; > >> - adev->pm.fru_eeprom_i2c_bus = NULL; > >> + adev->pm.fru_eeprom_i2c_bus = >pm.smu_i2c[0].adapter; > >> > >> res = i2c_add_adapter(control); > >> if (res) > >> @@ -756,6 +756,7 @@ void smu_v11_0_i2c_control_fini(struct amdgpu_device > >> *adev) > >> > >> i2c_del_adapter(control); > >> adev->pm.ras_eeprom_i2c_bus = NULL; > >> + adev->pm.fru_eeprom_i2c_bus = NULL; > >> } > >> > >> /* > >> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c > >> b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c > >> index ba4c9771ffc56b..3d96d4af92d97f 100644 > >> --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c > >> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c > >> @@ -1583,6 +1583,7 @@ static int aldebaran_i2c_control_init(struct > >> smu_context *smu) > >> } > >> > >> adev->pm.ras_eeprom_i2c_bus = >pm.smu_i2c[0].adapter; > >> + adev->pm.fru_eeprom_i2c_bus = >pm.smu_i2c[0].adapter; > > > > Does this need to be set to NULL in control_fini() for consistency? > > Other than that, looks good to me. > > It already is--I didn't need to add it as I did above for Vega20. In that case: Reviewed-by: Alex Deucher > > Regards, > Luben > > > > > Alex > > > > > >> > >> return 0; > >> Out_err: > >> > >> base-commit: 24b54e10043609a55194badff58bf91189c49793 > >> prerequisite-patch-id: d694f05bfcdf100c1d6957afa209e304dad98404 > >> prerequisite-patch-id: b106a5bd7e0d3a7c225a767ca4384162353cca9a > >> prerequisite-patch-id: 77d04393fc872e4f4dd158659bd44ba40b749883 > >> -- > >> 2.35.0.3.gb23dac905b > >> > > Regards, > -- > Luben
Re: [PATCH] drm/amdgpu: Set FRU bus for Aldebaran and Vega 20
On 2022-02-04 13:53, Alex Deucher wrote: > On Fri, Feb 4, 2022 at 1:48 PM Luben Tuikov wrote: >> >> The FRU and RAS EEPROMs share the same I2C bus on Aldebaran and Vega 20 >> ASICs. Set the FRU bus "pointer" to this single bus, as access to the FRU >> is sought through that bus "pointer" and not through the RAS bus "pointer". >> >> Cc: Roy Sun >> Cc: Alex Deucher >> Fixes: 9c6d2ba7057abb ("drm/amd: Expose the FRU SMU I2C bus") >> Signed-off-by: Luben Tuikov >> --- >> drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c | 3 ++- >> drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 1 + >> 2 files changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c >> b/drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c >> index 87acb089cfccd5..dd2d66090d2374 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c >> +++ b/drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c >> @@ -741,7 +741,7 @@ int smu_v11_0_i2c_control_init(struct amdgpu_device >> *adev) >> i2c_set_adapdata(control, smu_i2c); >> >> adev->pm.ras_eeprom_i2c_bus = >pm.smu_i2c[0].adapter; >> - adev->pm.fru_eeprom_i2c_bus = NULL; >> + adev->pm.fru_eeprom_i2c_bus = >pm.smu_i2c[0].adapter; >> >> res = i2c_add_adapter(control); >> if (res) >> @@ -756,6 +756,7 @@ void smu_v11_0_i2c_control_fini(struct amdgpu_device >> *adev) >> >> i2c_del_adapter(control); >> adev->pm.ras_eeprom_i2c_bus = NULL; >> + adev->pm.fru_eeprom_i2c_bus = NULL; >> } >> >> /* >> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c >> b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c >> index ba4c9771ffc56b..3d96d4af92d97f 100644 >> --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c >> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c >> @@ -1583,6 +1583,7 @@ static int aldebaran_i2c_control_init(struct >> smu_context *smu) >> } >> >> adev->pm.ras_eeprom_i2c_bus = >pm.smu_i2c[0].adapter; >> + adev->pm.fru_eeprom_i2c_bus = >pm.smu_i2c[0].adapter; > > Does this need to be set to NULL in control_fini() for consistency? > Other than that, looks good to me. It already is--I didn't need to add it as I did above for Vega20. Regards, Luben > > Alex > > >> >> return 0; >> Out_err: >> >> base-commit: 24b54e10043609a55194badff58bf91189c49793 >> prerequisite-patch-id: d694f05bfcdf100c1d6957afa209e304dad98404 >> prerequisite-patch-id: b106a5bd7e0d3a7c225a767ca4384162353cca9a >> prerequisite-patch-id: 77d04393fc872e4f4dd158659bd44ba40b749883 >> -- >> 2.35.0.3.gb23dac905b >> Regards, -- Luben
Re: [PATCH] drm/amdgpu: Set FRU bus for Aldebaran and Vega 20
On Fri, Feb 4, 2022 at 1:48 PM Luben Tuikov wrote: > > The FRU and RAS EEPROMs share the same I2C bus on Aldebaran and Vega 20 > ASICs. Set the FRU bus "pointer" to this single bus, as access to the FRU > is sought through that bus "pointer" and not through the RAS bus "pointer". > > Cc: Roy Sun > Cc: Alex Deucher > Fixes: 9c6d2ba7057abb ("drm/amd: Expose the FRU SMU I2C bus") > Signed-off-by: Luben Tuikov > --- > drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c | 3 ++- > drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 1 + > 2 files changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c > b/drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c > index 87acb089cfccd5..dd2d66090d2374 100644 > --- a/drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c > +++ b/drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c > @@ -741,7 +741,7 @@ int smu_v11_0_i2c_control_init(struct amdgpu_device *adev) > i2c_set_adapdata(control, smu_i2c); > > adev->pm.ras_eeprom_i2c_bus = >pm.smu_i2c[0].adapter; > - adev->pm.fru_eeprom_i2c_bus = NULL; > + adev->pm.fru_eeprom_i2c_bus = >pm.smu_i2c[0].adapter; > > res = i2c_add_adapter(control); > if (res) > @@ -756,6 +756,7 @@ void smu_v11_0_i2c_control_fini(struct amdgpu_device > *adev) > > i2c_del_adapter(control); > adev->pm.ras_eeprom_i2c_bus = NULL; > + adev->pm.fru_eeprom_i2c_bus = NULL; > } > > /* > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c > b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c > index ba4c9771ffc56b..3d96d4af92d97f 100644 > --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c > @@ -1583,6 +1583,7 @@ static int aldebaran_i2c_control_init(struct > smu_context *smu) > } > > adev->pm.ras_eeprom_i2c_bus = >pm.smu_i2c[0].adapter; > + adev->pm.fru_eeprom_i2c_bus = >pm.smu_i2c[0].adapter; Does this need to be set to NULL in control_fini() for consistency? Other than that, looks good to me. Alex > > return 0; > Out_err: > > base-commit: 24b54e10043609a55194badff58bf91189c49793 > prerequisite-patch-id: d694f05bfcdf100c1d6957afa209e304dad98404 > prerequisite-patch-id: b106a5bd7e0d3a7c225a767ca4384162353cca9a > prerequisite-patch-id: 77d04393fc872e4f4dd158659bd44ba40b749883 > -- > 2.35.0.3.gb23dac905b >
Re: [PATCH] drm/amdgpu: move lockdep assert to the right place.
Am 04.02.22 um 19:47 schrieb Bhardwaj, Rajneesh: On 2/4/2022 1:32 PM, Christian König wrote: Am 04.02.22 um 19:12 schrieb Bhardwaj, Rajneesh: [Sorry for top posting] Hi Christian I think you forgot the below hunk, without which the issue is not fixed completely on a multi GPU system. No, that is perfectly intentional. While removing a bo_va structure it can happen that there are still mappings attached to it (for example because the application crashed). OK. but for me, at boot time, I see flood of warning messages coming from amdgpu_vm_bo_del so the previous patch doesn't help. Do you have a backtrace? That should not happen. Could be that we have this buggy at a couple of different places. Regards, Christian. Because of this locking the VM before the remove is mandatory. Only while adding a bo_va structure we can avoid that. Regards, Christian. diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index dcc80d6e099e..6f68fc9da56a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -2670,8 +2670,6 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev, struct amdgpu_vm *vm = bo_va->base.vm; struct amdgpu_vm_bo_base **base; - dma_resv_assert_held(vm->root.bo->tbo.base.resv); - if (bo) { dma_resv_assert_held(bo->tbo.base.resv); if (bo->tbo.base.resv == vm->root.bo->tbo.base.resv) If you chose to include the above hunk, please feel free to add Reviewed-and-tested-by: Rajneesh Bhardwaj On 2/4/2022 11:27 AM, Felix Kuehling wrote: Am 2022-02-04 um 03:52 schrieb Christian König: Since newly added BOs don't have any mappings it's ok to add them without holding the VM lock. Only when we add per VM BOs the lock is mandatory. Signed-off-by: Christian König Reported-by: Bhardwaj, Rajneesh Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index fdc6a1fd74af..dcc80d6e099e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -375,6 +375,8 @@ static void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base, if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv) return; + dma_resv_assert_held(vm->root.bo->tbo.base.resv); + vm->bulk_moveable = false; if (bo->tbo.type == ttm_bo_type_kernel && bo->parent) amdgpu_vm_bo_relocated(base); @@ -2260,8 +2262,6 @@ struct amdgpu_bo_va *amdgpu_vm_bo_add(struct amdgpu_device *adev, { struct amdgpu_bo_va *bo_va; - dma_resv_assert_held(vm->root.bo->tbo.base.resv); - bo_va = kzalloc(sizeof(struct amdgpu_bo_va), GFP_KERNEL); if (bo_va == NULL) { return NULL;
[PATCH] drm/amdgpu: Set FRU bus for Aldebaran and Vega 20
The FRU and RAS EEPROMs share the same I2C bus on Aldebaran and Vega 20 ASICs. Set the FRU bus "pointer" to this single bus, as access to the FRU is sought through that bus "pointer" and not through the RAS bus "pointer". Cc: Roy Sun Cc: Alex Deucher Fixes: 9c6d2ba7057abb ("drm/amd: Expose the FRU SMU I2C bus") Signed-off-by: Luben Tuikov --- drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c | 3 ++- drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 1 + 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c b/drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c index 87acb089cfccd5..dd2d66090d2374 100644 --- a/drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c +++ b/drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c @@ -741,7 +741,7 @@ int smu_v11_0_i2c_control_init(struct amdgpu_device *adev) i2c_set_adapdata(control, smu_i2c); adev->pm.ras_eeprom_i2c_bus = >pm.smu_i2c[0].adapter; - adev->pm.fru_eeprom_i2c_bus = NULL; + adev->pm.fru_eeprom_i2c_bus = >pm.smu_i2c[0].adapter; res = i2c_add_adapter(control); if (res) @@ -756,6 +756,7 @@ void smu_v11_0_i2c_control_fini(struct amdgpu_device *adev) i2c_del_adapter(control); adev->pm.ras_eeprom_i2c_bus = NULL; + adev->pm.fru_eeprom_i2c_bus = NULL; } /* diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c index ba4c9771ffc56b..3d96d4af92d97f 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c @@ -1583,6 +1583,7 @@ static int aldebaran_i2c_control_init(struct smu_context *smu) } adev->pm.ras_eeprom_i2c_bus = >pm.smu_i2c[0].adapter; + adev->pm.fru_eeprom_i2c_bus = >pm.smu_i2c[0].adapter; return 0; Out_err: base-commit: 24b54e10043609a55194badff58bf91189c49793 prerequisite-patch-id: d694f05bfcdf100c1d6957afa209e304dad98404 prerequisite-patch-id: b106a5bd7e0d3a7c225a767ca4384162353cca9a prerequisite-patch-id: 77d04393fc872e4f4dd158659bd44ba40b749883 -- 2.35.0.3.gb23dac905b
Re: [PATCH] drm/amdgpu: move lockdep assert to the right place.
On 2/4/2022 1:32 PM, Christian König wrote: Am 04.02.22 um 19:12 schrieb Bhardwaj, Rajneesh: [Sorry for top posting] Hi Christian I think you forgot the below hunk, without which the issue is not fixed completely on a multi GPU system. No, that is perfectly intentional. While removing a bo_va structure it can happen that there are still mappings attached to it (for example because the application crashed). OK. but for me, at boot time, I see flood of warning messages coming from amdgpu_vm_bo_del so the previous patch doesn't help. Because of this locking the VM before the remove is mandatory. Only while adding a bo_va structure we can avoid that. Regards, Christian. diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index dcc80d6e099e..6f68fc9da56a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -2670,8 +2670,6 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev, struct amdgpu_vm *vm = bo_va->base.vm; struct amdgpu_vm_bo_base **base; - dma_resv_assert_held(vm->root.bo->tbo.base.resv); - if (bo) { dma_resv_assert_held(bo->tbo.base.resv); if (bo->tbo.base.resv == vm->root.bo->tbo.base.resv) If you chose to include the above hunk, please feel free to add Reviewed-and-tested-by: Rajneesh Bhardwaj On 2/4/2022 11:27 AM, Felix Kuehling wrote: Am 2022-02-04 um 03:52 schrieb Christian König: Since newly added BOs don't have any mappings it's ok to add them without holding the VM lock. Only when we add per VM BOs the lock is mandatory. Signed-off-by: Christian König Reported-by: Bhardwaj, Rajneesh Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index fdc6a1fd74af..dcc80d6e099e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -375,6 +375,8 @@ static void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base, if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv) return; + dma_resv_assert_held(vm->root.bo->tbo.base.resv); + vm->bulk_moveable = false; if (bo->tbo.type == ttm_bo_type_kernel && bo->parent) amdgpu_vm_bo_relocated(base); @@ -2260,8 +2262,6 @@ struct amdgpu_bo_va *amdgpu_vm_bo_add(struct amdgpu_device *adev, { struct amdgpu_bo_va *bo_va; - dma_resv_assert_held(vm->root.bo->tbo.base.resv); - bo_va = kzalloc(sizeof(struct amdgpu_bo_va), GFP_KERNEL); if (bo_va == NULL) { return NULL;
Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler
[Public] Seems like this functionality should be moved up into the callers. Maybe add new IP callbacks (dump_reset_registers()) so that each IP can specify what registers are relevant for a reset debugging and then we can walk the IP list and call the callback before we call the asic_reset callbacks. Alex From: amd-gfx on behalf of Deucher, Alexander Sent: Friday, February 4, 2022 1:41 PM To: Sharma, Shashank ; Lazar, Lijo ; amd-gfx@lists.freedesktop.org Cc: Somalapuram, Amaranath ; Koenig, Christian Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler [Public] [Public] In the suspend and hibernate cases, we don't care. In most cases the power rail will be cut once the system enters suspend so it doesn't really matter. That's why we call the asic reset callback directly rather than going through the whole recovery process. The device is already quiescent at this point we just want to make sure the device is in a known state when we come out of suspend (in case suspend overall fails). Alex From: Sharma, Shashank Sent: Friday, February 4, 2022 12:22 PM To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Somalapuram, Amaranath ; Koenig, Christian Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler On 2/4/2022 6:20 PM, Lazar, Lijo wrote: > [AMD Official Use Only] > > One more thing >In suspend-reset case, won't this cause to schedule a work item on > suspend? I don't know if that is a good idea, ideally we would like to clean > up all work items before going to suspend. > > Thanks, > Lijo Again, this opens scope for discussion. What if there is a GPU error during suspend-reset, which is very probable case. - Shashank > > -Original Message- > From: Sharma, Shashank > Sent: Friday, February 4, 2022 10:47 PM > To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander ; Somalapuram, Amaranath > ; Koenig, Christian > Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler > > > > On 2/4/2022 6:11 PM, Lazar, Lijo wrote: >> BTW, since this is already providing a set of values it would be useful to >> provide one more field as the reset reason - RAS error recovery, GPU hung >> recovery or something else. > > Adding this additional parameter instead of blocking something in kernel, > seems like a better idea. The app can filter out and read what it is > interested into. > > - Shashank
[PATCH 05/13] drm/amd/display: remove static from optc31_set_drr
From: Eric Bernstein remove static from optc31_set_drr Reviewed-by: Nevenko Stupar Acked-by: Jasdeep Dhillon Signed-off-by: Eric Bernstein --- drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c | 2 +- drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c index e8562fa11366..8afe2130d7c5 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c +++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c @@ -161,7 +161,7 @@ static bool optc31_immediate_disable_crtc(struct timing_generator *optc) return true; } -static void optc31_set_drr( +void optc31_set_drr( struct timing_generator *optc, const struct drr_params *params) { diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h index d8ef2f0d0c95..a37b16040c1d 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h +++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.h @@ -256,4 +256,6 @@ void dcn31_timing_generator_init(struct optc *optc1); +void optc31_set_drr(struct timing_generator *optc, const struct drr_params *params); + #endif /* __DC_OPTC_DCN31_H__ */ -- 2.25.1
[PATCH 04/13] drm/amd/display: Basic support with device ID
From: Oliver Logush [why] To get the the cyan_skillfish check working Reviewed-by: Charlene Liu Reviewed-by: Charlene Liu Acked-by: Jasdeep Dhillon Signed-off-by: Oliver Logush --- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 24 +-- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 2 +- .../gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c | 2 +- .../gpu/drm/amd/display/dc/core/dc_resource.c | 2 +- .../gpu/drm/amd/display/include/dal_asic_id.h | 3 ++- 5 files changed, 27 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 8f53c9f6b267..f5941e59e5ad 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -1014,6 +1014,14 @@ static void amdgpu_dm_audio_eld_notify(struct amdgpu_device *adev, int pin) } } +bool is_skillfish_series(struct amdgpu_device *adev) +{ + if (adev->asic_type == CHIP_CYAN_SKILLFISH || adev->pdev->revision == 0x143F) { + return true; + } + return false; +} + static int dm_dmub_hw_init(struct amdgpu_device *adev) { const struct dmcub_firmware_header_v1_0 *hdr; @@ -1049,7 +1057,7 @@ static int dm_dmub_hw_init(struct amdgpu_device *adev) return -EINVAL; } - if (!has_hw_support) { + if (is_skillfish_series(adev)) { DRM_INFO("DMUB unsupported on ASIC\n"); return 0; } @@ -1471,6 +1479,10 @@ static int amdgpu_dm_init(struct amdgpu_device *adev) default: break; } + if (is_skillfish_series(adev)) { + init_data.flags.disable_dmcu = true; + break; + } break; } @@ -1777,7 +1789,6 @@ static int load_dmcu_fw(struct amdgpu_device *adev) case CHIP_VEGA10: case CHIP_VEGA12: case CHIP_VEGA20: - return 0; case CHIP_NAVI12: fw_name_dmcu = FIRMWARE_NAVI12_DMCU; break; @@ -1805,6 +1816,9 @@ static int load_dmcu_fw(struct amdgpu_device *adev) default: break; } + if (is_skillfish_series(adev)) { + return 0; + } DRM_ERROR("Unsupported ASIC type: 0x%X\n", adev->asic_type); return -EINVAL; } @@ -4515,6 +4529,12 @@ static int dm_early_init(void *handle) adev->mode_info.num_dig = 6; break; default: + if (is_skillfish_series(adev)) { + adev->mode_info.num_crtc = 2; + adev->mode_info.num_hpd = 2; + adev->mode_info.num_dig = 2; + break; + } #if defined(CONFIG_DRM_AMD_DC_DCN) switch (adev->ip_versions[DCE_HWIP][0]) { case IP_VERSION(2, 0, 2): diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h index e35977fda5c1..13875d669acd 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h @@ -82,7 +82,7 @@ struct common_irq_params { enum dc_irq_source irq_src; atomic64_t previous_timestamp; }; - +bool is_skillfish_series(struct amdgpu_device *adev); /** * struct dm_compressor_info - Buffer info used by frame buffer compression * @cpu_addr: MMIO cpu addr diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c index 9200c8ce02ba..54ef83fe5a9b 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c @@ -259,7 +259,7 @@ struct clk_mgr *dc_clk_mgr_create(struct dc_context *ctx, struct pp_smu_funcs *p dcn3_clk_mgr_construct(ctx, clk_mgr, pp_smu, dccg); return _mgr->base; } - if (asic_id.chip_id == DEVICE_ID_NV_13FE) { + if (asic_id.chip_id == DEVICE_ID_NV_NAVI10_LITE_P_13FE) { dcn201_clk_mgr_construct(ctx, clk_mgr, pp_smu, dccg); return _mgr->base; } diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c index 9df66501a453..8a199d661a66 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c @@ -135,7 +135,7 @@ enum dce_version resource_parse_asic_id(struct hw_asic_id asic_id) case FAMILY_NV: dc_version = DCN_VERSION_2_0; - if (asic_id.chip_id == DEVICE_ID_NV_13FE) { + if (asic_id.chip_id == DEVICE_ID_NV_NAVI10_LITE_P_13FE || asic_id.chip_id == DEVICE_ID_NV_NAVI10_LITE_P_143F) {
[PATCH 03/13] drm/amd/display: limit unbounded requesting to 5k
From: Dmytro Laktyushkin Unbounded requesting is unsupported on pipe split modes and this change prevents us running into such a situation with wide modes. Reviewed-by: Charlene Liu Acked-by: Jasdeep Dhillon Signed-off-by: Dmytro Laktyushkin --- drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c index 7f9ceda4229b..007a7dc4f5be 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c @@ -1836,7 +1836,8 @@ static int dcn31_populate_dml_pipes_from_context( if (is_dual_plane(pipe->plane_state->format) && pipe->plane_state->src_rect.width <= 1920 && pipe->plane_state->src_rect.height <= 1080) { dc->config.enable_4to1MPC = true; - } else if (!is_dual_plane(pipe->plane_state->format)) { + } else if (!is_dual_plane(pipe->plane_state->format) && pipe->plane_state->src_rect.width <= 5120) { + /* Limit to 5k max to avoid forced pipe split when there is not enough detile for swath */ context->bw_ctx.dml.ip.det_buffer_size_kbytes = 192; pipes[0].pipe.src.unbounded_req_mode = true; } -- 2.25.1
[PATCH 02/13] drm/amd/display: Fix stream->link_enc unassigned during stream removal
From: Nicholas Kazlauskas [Why] Found when running igt@kms_atomic. Userspace attempts to do a TEST_COMMIT when 0 streams which calls dc_remove_stream_from_ctx. This in turn calls link_enc_unassign which ends up modifying stream->link = NULL directly, causing the global link_enc to be removed preventing further link activity and future link validation from passing. [How] We take care of link_enc unassignment at the start of link_enc_cfg_link_encs_assign so this call is no longer necessary. Fixes global state from being modified while unlocked. Reviewed-by: Jimmy Kizito Acked-by: Jasdeep Dhillon Signed-off-by: Nicholas Kazlauskas --- drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c index e82aa0559bdf..9df66501a453 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c @@ -1965,10 +1965,6 @@ enum dc_status dc_remove_stream_from_ctx( dc->res_pool, del_pipe->stream_res.stream_enc, false); - /* Release link encoder from stream in new dc_state. */ - if (dc->res_pool->funcs->link_enc_unassign) - dc->res_pool->funcs->link_enc_unassign(new_ctx, del_pipe->stream); - if (is_dp_128b_132b_signal(del_pipe)) { update_hpo_dp_stream_engine_usage( _ctx->res_ctx, dc->res_pool, -- 2.25.1
[PATCH 01/13] drm/amd/display: Fix for variable may be used uninitialized error
From: Eric Bernstein [Why] Build failure due to ‘status’ may be used uninitialized [How] Initialize status to LINK_TRAINING_SUCCESS Reviewed-by: Wenjing Liu Acked-by: Jasdeep Dhillon Signed-off-by: Eric Bernstein --- drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c index 18fe8f77821a..d0cb40df60a4 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c @@ -3199,7 +3199,7 @@ static bool dp_verify_link_cap( bool success = false; bool skip_video_pattern; enum clock_source_id dp_cs_id = get_clock_source_id(link); - enum link_training_result status; + enum link_training_result status = LINK_TRAINING_SUCCESS; union hpd_irq_data irq_data; struct link_resource link_res; -- 2.25.1
[PATCH 00/13] DC Patches Feb 7, 2022
This DC patchset brings improvements in multiple areas. In summary, we have: -fix for build failure uninitalized error -Bug fix for DP2 using uncertified cable -limit unbounded request to 5k -fix DP LT sequence on EQ fail -Bug fixes for S3/S4 Anthony Koo (1): drm/amd/display: [FW Promotion] Release 0.0.103.0 Aric Cyr (1): drm/amd/display: 3.2.172 Dmytro Laktyushkin (2): drm/amd/display: limit unbounded requesting to 5k drm/amd/display: fix yellow carp wm clamping Eric Bernstein (2): drm/amd/display: Fix for variable may be used uninitialized error drm/amd/display: remove static from optc31_set_drr Guo, Bing (1): dc: do blocked MST topology discovery at resume from S3/S4 Ilya (1): drm/amd/display: Fix DP LT sequence on EQ fail Martin Tsai (1): drm/amd/display: handle null link encoder Nicholas Kazlauskas (1): drm/amd/display: Fix stream->link_enc unassigned during stream removal Oliver Logush (1): drm/amd/display: Basic support with device ID Paul Hsieh (1): drm/amd/display: change fastboot timing validation Zhan Liu (1): drm/amd/display: keep eDP Vdd on when eDP stream is already enabled .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 24 +++- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 2 +- .../gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c | 2 +- drivers/gpu/drm/amd/display/dc/core/dc.c | 2 +- drivers/gpu/drm/amd/display/dc/core/dc_link.c | 27 ++-- .../gpu/drm/amd/display/dc/core/dc_link_dp.c | 4 +- .../gpu/drm/amd/display/dc/core/dc_resource.c | 8 +-- drivers/gpu/drm/amd/display/dc/dc.h | 4 +- drivers/gpu/drm/amd/display/dc/dc_link.h | 1 + .../display/dc/dce110/dce110_hw_sequencer.c | 27 +++- .../drm/amd/display/dc/dcn20/dcn20_resource.c | 11 +--- .../drm/amd/display/dc/dcn31/dcn31_hubbub.c | 61 ++- .../gpu/drm/amd/display/dc/dcn31/dcn31_optc.c | 2 +- .../gpu/drm/amd/display/dc/dcn31/dcn31_optc.h | 2 + .../drm/amd/display/dc/dcn31/dcn31_resource.c | 3 +- .../gpu/drm/amd/display/dmub/inc/dmub_cmd.h | 4 +- .../gpu/drm/amd/display/include/dal_asic_id.h | 3 +- 17 files changed, 104 insertions(+), 83 deletions(-) -- 2.25.1
Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler
[Public] In the suspend and hibernate cases, we don't care. In most cases the power rail will be cut once the system enters suspend so it doesn't really matter. That's why we call the asic reset callback directly rather than going through the whole recovery process. The device is already quiescent at this point we just want to make sure the device is in a known state when we come out of suspend (in case suspend overall fails). Alex From: Sharma, Shashank Sent: Friday, February 4, 2022 12:22 PM To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Somalapuram, Amaranath ; Koenig, Christian Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler On 2/4/2022 6:20 PM, Lazar, Lijo wrote: > [AMD Official Use Only] > > One more thing >In suspend-reset case, won't this cause to schedule a work item on > suspend? I don't know if that is a good idea, ideally we would like to clean > up all work items before going to suspend. > > Thanks, > Lijo Again, this opens scope for discussion. What if there is a GPU error during suspend-reset, which is very probable case. - Shashank > > -Original Message- > From: Sharma, Shashank > Sent: Friday, February 4, 2022 10:47 PM > To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander ; Somalapuram, Amaranath > ; Koenig, Christian > Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler > > > > On 2/4/2022 6:11 PM, Lazar, Lijo wrote: >> BTW, since this is already providing a set of values it would be useful to >> provide one more field as the reset reason - RAS error recovery, GPU hung >> recovery or something else. > > Adding this additional parameter instead of blocking something in kernel, > seems like a better idea. The app can filter out and read what it is > interested into. > > - Shashank
Re: [PATCH] drm/amdgpu: move lockdep assert to the right place.
Am 04.02.22 um 19:12 schrieb Bhardwaj, Rajneesh: [Sorry for top posting] Hi Christian I think you forgot the below hunk, without which the issue is not fixed completely on a multi GPU system. No, that is perfectly intentional. While removing a bo_va structure it can happen that there are still mappings attached to it (for example because the application crashed). Because of this locking the VM before the remove is mandatory. Only while adding a bo_va structure we can avoid that. Regards, Christian. diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index dcc80d6e099e..6f68fc9da56a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -2670,8 +2670,6 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev, struct amdgpu_vm *vm = bo_va->base.vm; struct amdgpu_vm_bo_base **base; - dma_resv_assert_held(vm->root.bo->tbo.base.resv); - if (bo) { dma_resv_assert_held(bo->tbo.base.resv); if (bo->tbo.base.resv == vm->root.bo->tbo.base.resv) If you chose to include the above hunk, please feel free to add Reviewed-and-tested-by: Rajneesh Bhardwaj On 2/4/2022 11:27 AM, Felix Kuehling wrote: Am 2022-02-04 um 03:52 schrieb Christian König: Since newly added BOs don't have any mappings it's ok to add them without holding the VM lock. Only when we add per VM BOs the lock is mandatory. Signed-off-by: Christian König Reported-by: Bhardwaj, Rajneesh Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index fdc6a1fd74af..dcc80d6e099e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -375,6 +375,8 @@ static void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base, if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv) return; + dma_resv_assert_held(vm->root.bo->tbo.base.resv); + vm->bulk_moveable = false; if (bo->tbo.type == ttm_bo_type_kernel && bo->parent) amdgpu_vm_bo_relocated(base); @@ -2260,8 +2262,6 @@ struct amdgpu_bo_va *amdgpu_vm_bo_add(struct amdgpu_device *adev, { struct amdgpu_bo_va *bo_va; - dma_resv_assert_held(vm->root.bo->tbo.base.resv); - bo_va = kzalloc(sizeof(struct amdgpu_bo_va), GFP_KERNEL); if (bo_va == NULL) { return NULL;
Re: [PATCH] drm/amdgpu: move lockdep assert to the right place.
[Sorry for top posting] Hi Christian I think you forgot the below hunk, without which the issue is not fixed completely on a multi GPU system. diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index dcc80d6e099e..6f68fc9da56a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -2670,8 +2670,6 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev, struct amdgpu_vm *vm = bo_va->base.vm; struct amdgpu_vm_bo_base **base; - dma_resv_assert_held(vm->root.bo->tbo.base.resv); - if (bo) { dma_resv_assert_held(bo->tbo.base.resv); if (bo->tbo.base.resv == vm->root.bo->tbo.base.resv) If you chose to include the above hunk, please feel free to add Reviewed-and-tested-by: Rajneesh Bhardwaj On 2/4/2022 11:27 AM, Felix Kuehling wrote: Am 2022-02-04 um 03:52 schrieb Christian König: Since newly added BOs don't have any mappings it's ok to add them without holding the VM lock. Only when we add per VM BOs the lock is mandatory. Signed-off-by: Christian König Reported-by: Bhardwaj, Rajneesh Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index fdc6a1fd74af..dcc80d6e099e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -375,6 +375,8 @@ static void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base, if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv) return; + dma_resv_assert_held(vm->root.bo->tbo.base.resv); + vm->bulk_moveable = false; if (bo->tbo.type == ttm_bo_type_kernel && bo->parent) amdgpu_vm_bo_relocated(base); @@ -2260,8 +2262,6 @@ struct amdgpu_bo_va *amdgpu_vm_bo_add(struct amdgpu_device *adev, { struct amdgpu_bo_va *bo_va; - dma_resv_assert_held(vm->root.bo->tbo.base.resv); - bo_va = kzalloc(sizeof(struct amdgpu_bo_va), GFP_KERNEL); if (bo_va == NULL) { return NULL;
[Important!] 2022 X.Org Foundation Membership deadline for voting in the election
The 2022 X.Org Foundation elections are rapidly approaching. We will be forwarding instructions on the nomination process to membership in the near future. Please note that only current members can vote in the upcoming election, and that the deadline for new memberships or renewals to vote in the upcoming election is March 17th 2022 at 23:59 UTC. If you are interested in joining the X.Org Foundation or in renewing your membership, please visit the membership system site at: https://members.x.org/ You can find the current election schedule here: https://www.x.org/wiki/BoardOfDirectors/Elections/2022/ Lyude Paul, On behalf of the X.Org elections committee
Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler
On 2/4/2022 6:20 PM, Lazar, Lijo wrote: [AMD Official Use Only] One more thing In suspend-reset case, won't this cause to schedule a work item on suspend? I don't know if that is a good idea, ideally we would like to clean up all work items before going to suspend. Thanks, Lijo Again, this opens scope for discussion. What if there is a GPU error during suspend-reset, which is very probable case. - Shashank -Original Message- From: Sharma, Shashank Sent: Friday, February 4, 2022 10:47 PM To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Somalapuram, Amaranath ; Koenig, Christian Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler On 2/4/2022 6:11 PM, Lazar, Lijo wrote: BTW, since this is already providing a set of values it would be useful to provide one more field as the reset reason - RAS error recovery, GPU hung recovery or something else. Adding this additional parameter instead of blocking something in kernel, seems like a better idea. The app can filter out and read what it is interested into. - Shashank
RE: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler
[AMD Official Use Only] One more thing In suspend-reset case, won't this cause to schedule a work item on suspend? I don't know if that is a good idea, ideally we would like to clean up all work items before going to suspend. Thanks, Lijo -Original Message- From: Sharma, Shashank Sent: Friday, February 4, 2022 10:47 PM To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Somalapuram, Amaranath ; Koenig, Christian Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler On 2/4/2022 6:11 PM, Lazar, Lijo wrote: > BTW, since this is already providing a set of values it would be useful to > provide one more field as the reset reason - RAS error recovery, GPU hung > recovery or something else. Adding this additional parameter instead of blocking something in kernel, seems like a better idea. The app can filter out and read what it is interested into. - Shashank
Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler
On 2/4/2022 6:11 PM, Lazar, Lijo wrote: BTW, since this is already providing a set of values it would be useful to provide one more field as the reset reason - RAS error recovery, GPU hung recovery or something else. Adding this additional parameter instead of blocking something in kernel, seems like a better idea. The app can filter out and read what it is interested into. - Shashank
RE: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler
[AMD Official Use Only] No, otherwise driver reset only for GPU recovery purpose. S3/S4 is not meant for recovery purpose. It's just a precautionary reset to make sure that everything works fine on resume. BTW, since this is already providing a set of values it would be useful to provide one more field as the reset reason - RAS error recovery, GPU hung recovery or something else. Thanks, Lijo -Original Message- From: Sharma, Shashank Sent: Friday, February 4, 2022 10:37 PM To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Somalapuram, Amaranath ; Koenig, Christian Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler On 2/4/2022 6:02 PM, Lazar, Lijo wrote: > [Public] > > The problem is app doesn't know why the reset happened. It just receives a > bunch of registers to be read. On what basis an app can filter this out? > Again, that is contextual analysis capability, which needs to be embedded in the reader app. Even if we filter out the S3/S4 resets in the kernel, the situation remains the same, isn't it ? - Shashank > Thanks, > Lijo > > -Original Message- > From: Sharma, Shashank > Sent: Friday, February 4, 2022 10:29 PM > To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander ; Somalapuram, > Amaranath ; Koenig, Christian > > Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler > > > > On 2/4/2022 5:50 PM, Lazar, Lijo wrote: >> [AMD Official Use Only] >> >> To explain more - >> It's an unconditional reset done by the kernel on every suspend >> (S3/S4). In such a case which process is going to receive the trace events? >> >> Most likely use case would be related to gpu recovery. Triggering a trace on >> every reset doesn't look like a good idea. >> > > If you observer carefully, we are just providing an infrastructure, the > application's intention is unknown to us. In my opinion it's rather not a > good idea to apply a filter in kernel, with our interpretation of intention. > > For example if an app just wants to count how many resets are happening due > to S3/S4 transition, this infra might become useless. It would rather be a > better idea for the app to learn and ignore these scenarios which it is not > interested in. > > This could eventually be just difference in design philosophy maybe :) > > - Shashank > >> Thanks, >> Lijo >> >> -Original Message- >> From: Sharma, Shashank >> Sent: Friday, February 4, 2022 10:09 PM >> To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org >> Cc: Deucher, Alexander ; Somalapuram, >> Amaranath ; Koenig, Christian >> >> Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler >> >> Hey Lijo, >> I somehow missed to respond on this comment, pls find inline: >> >> Regards >> Shashank >> >> On 1/22/2022 7:42 AM, Lazar, Lijo wrote: >>> >>> >>> On 1/22/2022 2:04 AM, Sharma, Shashank wrote: From 899ec6060eb7d8a3d4d56ab439e4e6cdd74190a4 Mon Sep 17 00:00:00 2001 From: Somalapuram Amaranath Date: Fri, 21 Jan 2022 14:19:42 +0530 Subject: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler This patch adds a GPU reset handler for Navi ASIC family, which typically dumps some of the registersand sends a trace event. V2: Accomodated call to work function to send uevent Signed-off-by: Somalapuram Amaranath Signed-off-by: Shashank Sharma --- drivers/gpu/drm/amd/amdgpu/nv.c | 28 1 file changed, 28 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c index 01efda4398e5..ada35d4c5245 100644 --- a/drivers/gpu/drm/amd/amdgpu/nv.c +++ b/drivers/gpu/drm/amd/amdgpu/nv.c @@ -528,10 +528,38 @@ nv_asic_reset_method(struct amdgpu_device *adev) } } +static void amdgpu_reset_dumps(struct amdgpu_device *adev) { + int r = 0, i; + + /* original raven doesn't have full asic reset */ + if ((adev->apu_flags & AMD_APU_IS_RAVEN) && + !(adev->apu_flags & AMD_APU_IS_RAVEN2)) + return; + for (i = 0; i < adev->num_ip_blocks; i++) { + if (!adev->ip_blocks[i].status.valid) + continue; + if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps) + continue; + r = +adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev); + + if (r) + DRM_ERROR("reset_reg_dumps of IP block <%s> failed +%d\n", + adev->ip_blocks[i].version->funcs->name, r); + } + + /* Schedule work to send uevent */ + if (!queue_work(system_unbound_wq, >gpu_reset_work)) + DRM_ERROR("failed to add GPU reset work\n"); + + dump_stack(); +} + static int nv_asic_reset(struct amdgpu_device *adev)
Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler
On 2/4/2022 6:02 PM, Lazar, Lijo wrote: [Public] The problem is app doesn't know why the reset happened. It just receives a bunch of registers to be read. On what basis an app can filter this out? Again, that is contextual analysis capability, which needs to be embedded in the reader app. Even if we filter out the S3/S4 resets in the kernel, the situation remains the same, isn't it ? - Shashank Thanks, Lijo -Original Message- From: Sharma, Shashank Sent: Friday, February 4, 2022 10:29 PM To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Somalapuram, Amaranath ; Koenig, Christian Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler On 2/4/2022 5:50 PM, Lazar, Lijo wrote: [AMD Official Use Only] To explain more - It's an unconditional reset done by the kernel on every suspend (S3/S4). In such a case which process is going to receive the trace events? Most likely use case would be related to gpu recovery. Triggering a trace on every reset doesn't look like a good idea. If you observer carefully, we are just providing an infrastructure, the application's intention is unknown to us. In my opinion it's rather not a good idea to apply a filter in kernel, with our interpretation of intention. For example if an app just wants to count how many resets are happening due to S3/S4 transition, this infra might become useless. It would rather be a better idea for the app to learn and ignore these scenarios which it is not interested in. This could eventually be just difference in design philosophy maybe :) - Shashank Thanks, Lijo -Original Message- From: Sharma, Shashank Sent: Friday, February 4, 2022 10:09 PM To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Somalapuram, Amaranath ; Koenig, Christian Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler Hey Lijo, I somehow missed to respond on this comment, pls find inline: Regards Shashank On 1/22/2022 7:42 AM, Lazar, Lijo wrote: On 1/22/2022 2:04 AM, Sharma, Shashank wrote: From 899ec6060eb7d8a3d4d56ab439e4e6cdd74190a4 Mon Sep 17 00:00:00 2001 From: Somalapuram Amaranath Date: Fri, 21 Jan 2022 14:19:42 +0530 Subject: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler This patch adds a GPU reset handler for Navi ASIC family, which typically dumps some of the registersand sends a trace event. V2: Accomodated call to work function to send uevent Signed-off-by: Somalapuram Amaranath Signed-off-by: Shashank Sharma --- drivers/gpu/drm/amd/amdgpu/nv.c | 28 1 file changed, 28 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c index 01efda4398e5..ada35d4c5245 100644 --- a/drivers/gpu/drm/amd/amdgpu/nv.c +++ b/drivers/gpu/drm/amd/amdgpu/nv.c @@ -528,10 +528,38 @@ nv_asic_reset_method(struct amdgpu_device *adev) } } +static void amdgpu_reset_dumps(struct amdgpu_device *adev) { + int r = 0, i; + + /* original raven doesn't have full asic reset */ + if ((adev->apu_flags & AMD_APU_IS_RAVEN) && + !(adev->apu_flags & AMD_APU_IS_RAVEN2)) + return; + for (i = 0; i < adev->num_ip_blocks; i++) { + if (!adev->ip_blocks[i].status.valid) + continue; + if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps) + continue; + r = +adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev); + + if (r) + DRM_ERROR("reset_reg_dumps of IP block <%s> failed +%d\n", + adev->ip_blocks[i].version->funcs->name, r); + } + + /* Schedule work to send uevent */ + if (!queue_work(system_unbound_wq, >gpu_reset_work)) + DRM_ERROR("failed to add GPU reset work\n"); + + dump_stack(); +} + static int nv_asic_reset(struct amdgpu_device *adev) { int ret = 0; + amdgpu_reset_dumps(adev); Had a comment on this before. Now there are different reasons (or even no reason like a precautionary reset) to perform reset. A user would be interested in a trace only if the reason is valid. To clarify on why a work shouldn't be scheduled on every reset, check here - https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/am d gpu/amdgpu_drv.c#L2188 In the example you pointed to, they have a criteria to decide what is a valid reset in their context, in the kernel side itself. So they can take a call if they want to do something about it or not. But, in our case, we want to send the trace_event to user with some register values on every reset, and it is actually up to the profiling app to interpret (along with what it wants to call a GPU reset). So I don't think this is causing a considerable overhead. - Shashank Thanks, Lijo switch (nv_asic_reset_method(adev)) { case AMD_RESET_METHOD_PCI: dev_info(adev->dev, "PCI reset\n");
RE: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler
[Public] The problem is app doesn't know why the reset happened. It just receives a bunch of registers to be read. On what basis an app can filter this out? Thanks, Lijo -Original Message- From: Sharma, Shashank Sent: Friday, February 4, 2022 10:29 PM To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Somalapuram, Amaranath ; Koenig, Christian Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler On 2/4/2022 5:50 PM, Lazar, Lijo wrote: > [AMD Official Use Only] > > To explain more - > It's an unconditional reset done by the kernel on every suspend > (S3/S4). In such a case which process is going to receive the trace events? > > Most likely use case would be related to gpu recovery. Triggering a trace on > every reset doesn't look like a good idea. > If you observer carefully, we are just providing an infrastructure, the application's intention is unknown to us. In my opinion it's rather not a good idea to apply a filter in kernel, with our interpretation of intention. For example if an app just wants to count how many resets are happening due to S3/S4 transition, this infra might become useless. It would rather be a better idea for the app to learn and ignore these scenarios which it is not interested in. This could eventually be just difference in design philosophy maybe :) - Shashank > Thanks, > Lijo > > -Original Message- > From: Sharma, Shashank > Sent: Friday, February 4, 2022 10:09 PM > To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander ; Somalapuram, > Amaranath ; Koenig, Christian > > Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler > > Hey Lijo, > I somehow missed to respond on this comment, pls find inline: > > Regards > Shashank > > On 1/22/2022 7:42 AM, Lazar, Lijo wrote: >> >> >> On 1/22/2022 2:04 AM, Sharma, Shashank wrote: >>> From 899ec6060eb7d8a3d4d56ab439e4e6cdd74190a4 Mon Sep 17 00:00:00 >>> 2001 >>> From: Somalapuram Amaranath >>> Date: Fri, 21 Jan 2022 14:19:42 +0530 >>> Subject: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler >>> >>> This patch adds a GPU reset handler for Navi ASIC family, which >>> typically dumps some of the registersand sends a trace event. >>> >>> V2: Accomodated call to work function to send uevent >>> >>> Signed-off-by: Somalapuram Amaranath >>> Signed-off-by: Shashank Sharma >>> --- >>> drivers/gpu/drm/amd/amdgpu/nv.c | 28 >>> 1 file changed, 28 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c >>> b/drivers/gpu/drm/amd/amdgpu/nv.c index 01efda4398e5..ada35d4c5245 >>> 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/nv.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c >>> @@ -528,10 +528,38 @@ nv_asic_reset_method(struct amdgpu_device >>> *adev) >>> } >>> } >>> >>> +static void amdgpu_reset_dumps(struct amdgpu_device *adev) { >>> + int r = 0, i; >>> + >>> + /* original raven doesn't have full asic reset */ >>> + if ((adev->apu_flags & AMD_APU_IS_RAVEN) && >>> + !(adev->apu_flags & AMD_APU_IS_RAVEN2)) >>> + return; >>> + for (i = 0; i < adev->num_ip_blocks; i++) { >>> + if (!adev->ip_blocks[i].status.valid) >>> + continue; >>> + if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps) >>> + continue; >>> + r = >>> +adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev); >>> + >>> + if (r) >>> + DRM_ERROR("reset_reg_dumps of IP block <%s> failed >>> +%d\n", >>> + adev->ip_blocks[i].version->funcs->name, r); >>> + } >>> + >>> + /* Schedule work to send uevent */ >>> + if (!queue_work(system_unbound_wq, >gpu_reset_work)) >>> + DRM_ERROR("failed to add GPU reset work\n"); >>> + >>> + dump_stack(); >>> +} >>> + >>> static int nv_asic_reset(struct amdgpu_device *adev) >>> { >>> int ret = 0; >>> >>> + amdgpu_reset_dumps(adev); >> >> Had a comment on this before. Now there are different reasons (or >> even no reason like a precautionary reset) to perform reset. A user >> would be interested in a trace only if the reason is valid. >> >> To clarify on why a work shouldn't be scheduled on every reset, check >> here - >> >> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/am >> d >> gpu/amdgpu_drv.c#L2188 > In the example you pointed to, they have a criteria to decide what is a valid > reset in their context, in the kernel side itself. So they can take a call if > they want to do something about it or not. > > But, in our case, we want to send the trace_event to user with some register > values on every reset, and it is actually up to the profiling app to > interpret (along with what it wants to call a GPU reset). So I don't think > this is causing a considerable overhead. > > - Shashank >> >> >> >> Thanks, >> Lijo >> >>> switch (nv_asic_reset_method(adev)) { >>> case AMD_RESET_METHOD_PCI: >>>
Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler
On 2/4/2022 5:50 PM, Lazar, Lijo wrote: [AMD Official Use Only] To explain more - It's an unconditional reset done by the kernel on every suspend (S3/S4). In such a case which process is going to receive the trace events? Most likely use case would be related to gpu recovery. Triggering a trace on every reset doesn't look like a good idea. If you observer carefully, we are just providing an infrastructure, the application's intention is unknown to us. In my opinion it's rather not a good idea to apply a filter in kernel, with our interpretation of intention. For example if an app just wants to count how many resets are happening due to S3/S4 transition, this infra might become useless. It would rather be a better idea for the app to learn and ignore these scenarios which it is not interested in. This could eventually be just difference in design philosophy maybe :) - Shashank Thanks, Lijo -Original Message- From: Sharma, Shashank Sent: Friday, February 4, 2022 10:09 PM To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Somalapuram, Amaranath ; Koenig, Christian Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler Hey Lijo, I somehow missed to respond on this comment, pls find inline: Regards Shashank On 1/22/2022 7:42 AM, Lazar, Lijo wrote: On 1/22/2022 2:04 AM, Sharma, Shashank wrote: From 899ec6060eb7d8a3d4d56ab439e4e6cdd74190a4 Mon Sep 17 00:00:00 2001 From: Somalapuram Amaranath Date: Fri, 21 Jan 2022 14:19:42 +0530 Subject: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler This patch adds a GPU reset handler for Navi ASIC family, which typically dumps some of the registersand sends a trace event. V2: Accomodated call to work function to send uevent Signed-off-by: Somalapuram Amaranath Signed-off-by: Shashank Sharma --- drivers/gpu/drm/amd/amdgpu/nv.c | 28 1 file changed, 28 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c index 01efda4398e5..ada35d4c5245 100644 --- a/drivers/gpu/drm/amd/amdgpu/nv.c +++ b/drivers/gpu/drm/amd/amdgpu/nv.c @@ -528,10 +528,38 @@ nv_asic_reset_method(struct amdgpu_device *adev) } } +static void amdgpu_reset_dumps(struct amdgpu_device *adev) { + int r = 0, i; + + /* original raven doesn't have full asic reset */ + if ((adev->apu_flags & AMD_APU_IS_RAVEN) && + !(adev->apu_flags & AMD_APU_IS_RAVEN2)) + return; + for (i = 0; i < adev->num_ip_blocks; i++) { + if (!adev->ip_blocks[i].status.valid) + continue; + if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps) + continue; + r = +adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev); + + if (r) + DRM_ERROR("reset_reg_dumps of IP block <%s> failed +%d\n", + adev->ip_blocks[i].version->funcs->name, r); + } + + /* Schedule work to send uevent */ + if (!queue_work(system_unbound_wq, >gpu_reset_work)) + DRM_ERROR("failed to add GPU reset work\n"); + + dump_stack(); +} + static int nv_asic_reset(struct amdgpu_device *adev) { int ret = 0; + amdgpu_reset_dumps(adev); Had a comment on this before. Now there are different reasons (or even no reason like a precautionary reset) to perform reset. A user would be interested in a trace only if the reason is valid. To clarify on why a work shouldn't be scheduled on every reset, check here - https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amd gpu/amdgpu_drv.c#L2188 In the example you pointed to, they have a criteria to decide what is a valid reset in their context, in the kernel side itself. So they can take a call if they want to do something about it or not. But, in our case, we want to send the trace_event to user with some register values on every reset, and it is actually up to the profiling app to interpret (along with what it wants to call a GPU reset). So I don't think this is causing a considerable overhead. - Shashank Thanks, Lijo switch (nv_asic_reset_method(adev)) { case AMD_RESET_METHOD_PCI: dev_info(adev->dev, "PCI reset\n");
RE: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler
[AMD Official Use Only] To explain more - It's an unconditional reset done by the kernel on every suspend (S3/S4). In such a case which process is going to receive the trace events? Most likely use case would be related to gpu recovery. Triggering a trace on every reset doesn't look like a good idea. Thanks, Lijo -Original Message- From: Sharma, Shashank Sent: Friday, February 4, 2022 10:09 PM To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Somalapuram, Amaranath ; Koenig, Christian Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler Hey Lijo, I somehow missed to respond on this comment, pls find inline: Regards Shashank On 1/22/2022 7:42 AM, Lazar, Lijo wrote: > > > On 1/22/2022 2:04 AM, Sharma, Shashank wrote: >> From 899ec6060eb7d8a3d4d56ab439e4e6cdd74190a4 Mon Sep 17 00:00:00 >> 2001 >> From: Somalapuram Amaranath >> Date: Fri, 21 Jan 2022 14:19:42 +0530 >> Subject: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler >> >> This patch adds a GPU reset handler for Navi ASIC family, which >> typically dumps some of the registersand sends a trace event. >> >> V2: Accomodated call to work function to send uevent >> >> Signed-off-by: Somalapuram Amaranath >> Signed-off-by: Shashank Sharma >> --- >> drivers/gpu/drm/amd/amdgpu/nv.c | 28 >> 1 file changed, 28 insertions(+) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c >> b/drivers/gpu/drm/amd/amdgpu/nv.c index 01efda4398e5..ada35d4c5245 >> 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/nv.c >> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c >> @@ -528,10 +528,38 @@ nv_asic_reset_method(struct amdgpu_device >> *adev) >> } >> } >> >> +static void amdgpu_reset_dumps(struct amdgpu_device *adev) { >> + int r = 0, i; >> + >> + /* original raven doesn't have full asic reset */ >> + if ((adev->apu_flags & AMD_APU_IS_RAVEN) && >> + !(adev->apu_flags & AMD_APU_IS_RAVEN2)) >> + return; >> + for (i = 0; i < adev->num_ip_blocks; i++) { >> + if (!adev->ip_blocks[i].status.valid) >> + continue; >> + if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps) >> + continue; >> + r = >> +adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev); >> + >> + if (r) >> + DRM_ERROR("reset_reg_dumps of IP block <%s> failed >> +%d\n", >> + adev->ip_blocks[i].version->funcs->name, r); >> + } >> + >> + /* Schedule work to send uevent */ >> + if (!queue_work(system_unbound_wq, >gpu_reset_work)) >> + DRM_ERROR("failed to add GPU reset work\n"); >> + >> + dump_stack(); >> +} >> + >> static int nv_asic_reset(struct amdgpu_device *adev) >> { >> int ret = 0; >> >> + amdgpu_reset_dumps(adev); > > Had a comment on this before. Now there are different reasons (or even > no reason like a precautionary reset) to perform reset. A user would > be interested in a trace only if the reason is valid. > > To clarify on why a work shouldn't be scheduled on every reset, check > here - > > https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amd > gpu/amdgpu_drv.c#L2188 In the example you pointed to, they have a criteria to decide what is a valid reset in their context, in the kernel side itself. So they can take a call if they want to do something about it or not. But, in our case, we want to send the trace_event to user with some register values on every reset, and it is actually up to the profiling app to interpret (along with what it wants to call a GPU reset). So I don't think this is causing a considerable overhead. - Shashank > > > > Thanks, > Lijo > >> switch (nv_asic_reset_method(adev)) { >> case AMD_RESET_METHOD_PCI: >> dev_info(adev->dev, "PCI reset\n");
Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler
Hey Lijo, I somehow missed to respond on this comment, pls find inline: Regards Shashank On 1/22/2022 7:42 AM, Lazar, Lijo wrote: On 1/22/2022 2:04 AM, Sharma, Shashank wrote: From 899ec6060eb7d8a3d4d56ab439e4e6cdd74190a4 Mon Sep 17 00:00:00 2001 From: Somalapuram Amaranath Date: Fri, 21 Jan 2022 14:19:42 +0530 Subject: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler This patch adds a GPU reset handler for Navi ASIC family, which typically dumps some of the registersand sends a trace event. V2: Accomodated call to work function to send uevent Signed-off-by: Somalapuram Amaranath Signed-off-by: Shashank Sharma --- drivers/gpu/drm/amd/amdgpu/nv.c | 28 1 file changed, 28 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c index 01efda4398e5..ada35d4c5245 100644 --- a/drivers/gpu/drm/amd/amdgpu/nv.c +++ b/drivers/gpu/drm/amd/amdgpu/nv.c @@ -528,10 +528,38 @@ nv_asic_reset_method(struct amdgpu_device *adev) } } +static void amdgpu_reset_dumps(struct amdgpu_device *adev) +{ + int r = 0, i; + + /* original raven doesn't have full asic reset */ + if ((adev->apu_flags & AMD_APU_IS_RAVEN) && + !(adev->apu_flags & AMD_APU_IS_RAVEN2)) + return; + for (i = 0; i < adev->num_ip_blocks; i++) { + if (!adev->ip_blocks[i].status.valid) + continue; + if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps) + continue; + r = adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev); + + if (r) + DRM_ERROR("reset_reg_dumps of IP block <%s> failed %d\n", + adev->ip_blocks[i].version->funcs->name, r); + } + + /* Schedule work to send uevent */ + if (!queue_work(system_unbound_wq, >gpu_reset_work)) + DRM_ERROR("failed to add GPU reset work\n"); + + dump_stack(); +} + static int nv_asic_reset(struct amdgpu_device *adev) { int ret = 0; + amdgpu_reset_dumps(adev); Had a comment on this before. Now there are different reasons (or even no reason like a precautionary reset) to perform reset. A user would be interested in a trace only if the reason is valid. To clarify on why a work shouldn't be scheduled on every reset, check here - https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c#L2188 In the example you pointed to, they have a criteria to decide what is a valid reset in their context, in the kernel side itself. So they can take a call if they want to do something about it or not. But, in our case, we want to send the trace_event to user with some register values on every reset, and it is actually up to the profiling app to interpret (along with what it wants to call a GPU reset). So I don't think this is causing a considerable overhead. - Shashank Thanks, Lijo switch (nv_asic_reset_method(adev)) { case AMD_RESET_METHOD_PCI: dev_info(adev->dev, "PCI reset\n");
Re: [PATCH] drm/amdgpu/display: change pipe policy for DCN 2.0
On 2022-02-03 13:58, Alex Deucher wrote: > Fixes hangs on driver load on DCN 2.0 parts. > > Bug: https://bugzilla.kernel.org/show_bug.cgi?id=215511 > Fixes: ee2698cf79cc ("drm/amd/display: Changed pipe split policy to allow for > multi-display pipe split") > Signed-off-by: Alex Deucher Reviewed-by: Harry Wentland Harry > --- > drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c > b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c > index fcf388b509db..d9b3f449bf9b 100644 > --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c > +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c > @@ -1069,7 +1069,7 @@ static const struct dc_debug_options debug_defaults_drv > = { > .timing_trace = false, > .clock_trace = true, > .disable_pplib_clock_request = true, > - .pipe_split_policy = MPC_SPLIT_DYNAMIC, > + .pipe_split_policy = MPC_SPLIT_AVOID_MULT_DISP, > .force_single_disp_pipe_split = false, > .disable_dcc = DCC_ENABLE, > .vsr_support = true,
Re: [PATCH] drm/amdgpu: move lockdep assert to the right place.
Am 2022-02-04 um 03:52 schrieb Christian König: Since newly added BOs don't have any mappings it's ok to add them without holding the VM lock. Only when we add per VM BOs the lock is mandatory. Signed-off-by: Christian König Reported-by: Bhardwaj, Rajneesh Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index fdc6a1fd74af..dcc80d6e099e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -375,6 +375,8 @@ static void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base, if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv) return; + dma_resv_assert_held(vm->root.bo->tbo.base.resv); + vm->bulk_moveable = false; if (bo->tbo.type == ttm_bo_type_kernel && bo->parent) amdgpu_vm_bo_relocated(base); @@ -2260,8 +2262,6 @@ struct amdgpu_bo_va *amdgpu_vm_bo_add(struct amdgpu_device *adev, { struct amdgpu_bo_va *bo_va; - dma_resv_assert_held(vm->root.bo->tbo.base.resv); - bo_va = kzalloc(sizeof(struct amdgpu_bo_va), GFP_KERNEL); if (bo_va == NULL) { return NULL;
Re: [PATCH 3/3] drm/amdgpu: move dpcs_3_0_3 headers from dcn to dpcs
Series is Reviewed-by: Harry Wentland Harry On 2022-02-03 13:58, Alex Deucher wrote: > To align with other headers. > > Signed-off-by: Alex Deucher > --- > drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c | 4 ++-- > .../amd/include/asic_reg/{dcn => dpcs}/dpcs_3_0_3_offset.h| 0 > .../amd/include/asic_reg/{dcn => dpcs}/dpcs_3_0_3_sh_mask.h | 0 > 3 files changed, 2 insertions(+), 2 deletions(-) > rename drivers/gpu/drm/amd/include/asic_reg/{dcn => > dpcs}/dpcs_3_0_3_offset.h (100%) > rename drivers/gpu/drm/amd/include/asic_reg/{dcn => > dpcs}/dpcs_3_0_3_sh_mask.h (100%) > > diff --git a/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c > b/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c > index 2de687f64cf6..36649716e991 100644 > --- a/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c > +++ b/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c > @@ -48,8 +48,8 @@ > #include "sienna_cichlid_ip_offset.h" > #include "dcn/dcn_3_0_3_offset.h" > #include "dcn/dcn_3_0_3_sh_mask.h" > -#include "dcn/dpcs_3_0_3_offset.h" > -#include "dcn/dpcs_3_0_3_sh_mask.h" > +#include "dpcs/dpcs_3_0_3_offset.h" > +#include "dpcs/dpcs_3_0_3_sh_mask.h" > #include "nbio/nbio_2_3_offset.h" > > #define DC_LOGGER_INIT(logger) > diff --git a/drivers/gpu/drm/amd/include/asic_reg/dcn/dpcs_3_0_3_offset.h > b/drivers/gpu/drm/amd/include/asic_reg/dpcs/dpcs_3_0_3_offset.h > similarity index 100% > rename from drivers/gpu/drm/amd/include/asic_reg/dcn/dpcs_3_0_3_offset.h > rename to drivers/gpu/drm/amd/include/asic_reg/dpcs/dpcs_3_0_3_offset.h > diff --git a/drivers/gpu/drm/amd/include/asic_reg/dcn/dpcs_3_0_3_sh_mask.h > b/drivers/gpu/drm/amd/include/asic_reg/dpcs/dpcs_3_0_3_sh_mask.h > similarity index 100% > rename from drivers/gpu/drm/amd/include/asic_reg/dcn/dpcs_3_0_3_sh_mask.h > rename to drivers/gpu/drm/amd/include/asic_reg/dpcs/dpcs_3_0_3_sh_mask.h
Re: [PATCH] drm/amdgpu: Fix recursive locking warning
Am 04.02.22 um 17:23 schrieb Felix Kuehling: Am 2022-02-04 um 02:13 schrieb Christian König: Am 04.02.22 um 04:11 schrieb Rajneesh Bhardwaj: Noticed the below warning while running a pytorch workload on vega10 GPUs. Change to trylock to avoid conflicts with already held reservation locks. [ +0.03] WARNING: possible recursive locking detected [ +0.03] 5.13.0-kfd-rajneesh #1030 Not tainted [ +0.04] [ +0.02] python/4822 is trying to acquire lock: [ +0.04] 932cd9a259f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000203] but task is already holding lock: [ +0.03] 932cbb7181f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: ttm_eu_reserve_buffers+0x270/0x470 [ttm] [ +0.17] other info that might help us debug this: [ +0.02] Possible unsafe locking scenario: [ +0.03] CPU0 [ +0.02] [ +0.02] lock(reservation_ww_class_mutex); [ +0.04] lock(reservation_ww_class_mutex); [ +0.03] *** DEADLOCK *** [ +0.02] May be due to missing lock nesting notation [ +0.03] 7 locks held by python/4822: [ +0.03] #0: 932c4ac028d0 (>mutex){+.+.}-{3:3}, at: kfd_ioctl_map_memory_to_gpu+0x10b/0x320 [amdgpu] [ +0.000232] #1: 932c55e830a8 (>lock#2){+.+.}-{3:3}, at: amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x64/0xf60 [amdgpu] [ +0.000241] #2: 932cc45b5e68 (&(*mem)->lock){+.+.}-{3:3}, at: amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0xdf/0xf60 [amdgpu] [ +0.000236] #3: b2b35606fd28 (reservation_ww_class_acquire){+.+.}-{0:0}, at: amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x232/0xf60 [amdgpu] [ +0.000235] #4: 932cbb7181f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: ttm_eu_reserve_buffers+0x270/0x470 [ttm] [ +0.15] #5: c045f700 (*(sspp++)){}-{0:0}, at: drm_dev_enter+0x5/0xa0 [drm] [ +0.38] #6: 932c52da7078 (>eviction_lock){+.+.}-{3:3}, at: amdgpu_vm_bo_update_mapping+0xd5/0x4f0 [amdgpu] [ +0.000195] stack backtrace: [ +0.03] CPU: 11 PID: 4822 Comm: python Not tainted 5.13.0-kfd-rajneesh #1030 [ +0.05] Hardware name: GIGABYTE MZ01-CE0-00/MZ01-CE0-00, BIOS F02 08/29/2018 [ +0.03] Call Trace: [ +0.03] dump_stack+0x6d/0x89 [ +0.10] __lock_acquire+0xb93/0x1a90 [ +0.09] lock_acquire+0x25d/0x2d0 [ +0.05] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000184] ? lock_is_held_type+0xa2/0x110 [ +0.06] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000184] __ww_mutex_lock.constprop.17+0xca/0x1060 [ +0.07] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000183] ? lock_release+0x13f/0x270 [ +0.05] ? lock_is_held_type+0xa2/0x110 [ +0.06] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000183] amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000185] ttm_bo_release+0x4c6/0x580 [ttm] [ +0.10] amdgpu_bo_unref+0x1a/0x30 [amdgpu] [ +0.000183] amdgpu_vm_free_table+0x76/0xa0 [amdgpu] [ +0.000189] amdgpu_vm_free_pts+0xb8/0xf0 [amdgpu] [ +0.000189] amdgpu_vm_update_ptes+0x411/0x770 [amdgpu] [ +0.000191] amdgpu_vm_bo_update_mapping+0x324/0x4f0 [amdgpu] [ +0.000191] amdgpu_vm_bo_update+0x251/0x610 [amdgpu] [ +0.000191] update_gpuvm_pte+0xcc/0x290 [amdgpu] [ +0.000229] ? amdgpu_vm_bo_map+0xd7/0x130 [amdgpu] [ +0.000190] amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x912/0xf60 [amdgpu] [ +0.000234] kfd_ioctl_map_memory_to_gpu+0x182/0x320 [amdgpu] [ +0.000218] kfd_ioctl+0x2b9/0x600 [amdgpu] [ +0.000216] ? kfd_ioctl_unmap_memory_from_gpu+0x270/0x270 [amdgpu] [ +0.000216] ? lock_release+0x13f/0x270 [ +0.06] ? __fget_files+0x107/0x1e0 [ +0.07] __x64_sys_ioctl+0x8b/0xd0 [ +0.07] do_syscall_64+0x36/0x70 [ +0.04] entry_SYSCALL_64_after_hwframe+0x44/0xae [ +0.07] RIP: 0033:0x7fbff90a7317 [ +0.04] Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48 [ +0.05] RSP: 002b:7fbe301fe648 EFLAGS: 0246 ORIG_RAX: 0010 [ +0.06] RAX: ffda RBX: 7fbcc402d820 RCX: 7fbff90a7317 [ +0.03] RDX: 7fbe301fe690 RSI: c0184b18 RDI: 0004 [ +0.03] RBP: 7fbe301fe690 R08: R09: 7fbcc402d880 [ +0.03] R10: 02001000 R11: 0246 R12: c0184b18 [ +0.03] R13: 0004 R14: 7fbf689593a0 R15: 7fbcc402d820 Cc: Christian König Cc: Felix Kuehling Cc: Alex Deucher Fixes: 627b92ef9d7c ("drm/amdgpu: Wipe all VRAM on free when RAS is enabled") Signed-off-by: Rajneesh Bhardwaj The fixes tag is not necessarily correct, I would remove that. But apart from that the patch is Reviewed-by: Christian König . I suggested the Fixes tag since it was my patch
Re: [PATCH] drm/amdgpu: Fix recursive locking warning
Am 2022-02-04 um 02:13 schrieb Christian König: Am 04.02.22 um 04:11 schrieb Rajneesh Bhardwaj: Noticed the below warning while running a pytorch workload on vega10 GPUs. Change to trylock to avoid conflicts with already held reservation locks. [ +0.03] WARNING: possible recursive locking detected [ +0.03] 5.13.0-kfd-rajneesh #1030 Not tainted [ +0.04] [ +0.02] python/4822 is trying to acquire lock: [ +0.04] 932cd9a259f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000203] but task is already holding lock: [ +0.03] 932cbb7181f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: ttm_eu_reserve_buffers+0x270/0x470 [ttm] [ +0.17] other info that might help us debug this: [ +0.02] Possible unsafe locking scenario: [ +0.03] CPU0 [ +0.02] [ +0.02] lock(reservation_ww_class_mutex); [ +0.04] lock(reservation_ww_class_mutex); [ +0.03] *** DEADLOCK *** [ +0.02] May be due to missing lock nesting notation [ +0.03] 7 locks held by python/4822: [ +0.03] #0: 932c4ac028d0 (>mutex){+.+.}-{3:3}, at: kfd_ioctl_map_memory_to_gpu+0x10b/0x320 [amdgpu] [ +0.000232] #1: 932c55e830a8 (>lock#2){+.+.}-{3:3}, at: amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x64/0xf60 [amdgpu] [ +0.000241] #2: 932cc45b5e68 (&(*mem)->lock){+.+.}-{3:3}, at: amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0xdf/0xf60 [amdgpu] [ +0.000236] #3: b2b35606fd28 (reservation_ww_class_acquire){+.+.}-{0:0}, at: amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x232/0xf60 [amdgpu] [ +0.000235] #4: 932cbb7181f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: ttm_eu_reserve_buffers+0x270/0x470 [ttm] [ +0.15] #5: c045f700 (*(sspp++)){}-{0:0}, at: drm_dev_enter+0x5/0xa0 [drm] [ +0.38] #6: 932c52da7078 (>eviction_lock){+.+.}-{3:3}, at: amdgpu_vm_bo_update_mapping+0xd5/0x4f0 [amdgpu] [ +0.000195] stack backtrace: [ +0.03] CPU: 11 PID: 4822 Comm: python Not tainted 5.13.0-kfd-rajneesh #1030 [ +0.05] Hardware name: GIGABYTE MZ01-CE0-00/MZ01-CE0-00, BIOS F02 08/29/2018 [ +0.03] Call Trace: [ +0.03] dump_stack+0x6d/0x89 [ +0.10] __lock_acquire+0xb93/0x1a90 [ +0.09] lock_acquire+0x25d/0x2d0 [ +0.05] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000184] ? lock_is_held_type+0xa2/0x110 [ +0.06] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000184] __ww_mutex_lock.constprop.17+0xca/0x1060 [ +0.07] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000183] ? lock_release+0x13f/0x270 [ +0.05] ? lock_is_held_type+0xa2/0x110 [ +0.06] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000183] amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000185] ttm_bo_release+0x4c6/0x580 [ttm] [ +0.10] amdgpu_bo_unref+0x1a/0x30 [amdgpu] [ +0.000183] amdgpu_vm_free_table+0x76/0xa0 [amdgpu] [ +0.000189] amdgpu_vm_free_pts+0xb8/0xf0 [amdgpu] [ +0.000189] amdgpu_vm_update_ptes+0x411/0x770 [amdgpu] [ +0.000191] amdgpu_vm_bo_update_mapping+0x324/0x4f0 [amdgpu] [ +0.000191] amdgpu_vm_bo_update+0x251/0x610 [amdgpu] [ +0.000191] update_gpuvm_pte+0xcc/0x290 [amdgpu] [ +0.000229] ? amdgpu_vm_bo_map+0xd7/0x130 [amdgpu] [ +0.000190] amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x912/0xf60 [amdgpu] [ +0.000234] kfd_ioctl_map_memory_to_gpu+0x182/0x320 [amdgpu] [ +0.000218] kfd_ioctl+0x2b9/0x600 [amdgpu] [ +0.000216] ? kfd_ioctl_unmap_memory_from_gpu+0x270/0x270 [amdgpu] [ +0.000216] ? lock_release+0x13f/0x270 [ +0.06] ? __fget_files+0x107/0x1e0 [ +0.07] __x64_sys_ioctl+0x8b/0xd0 [ +0.07] do_syscall_64+0x36/0x70 [ +0.04] entry_SYSCALL_64_after_hwframe+0x44/0xae [ +0.07] RIP: 0033:0x7fbff90a7317 [ +0.04] Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48 [ +0.05] RSP: 002b:7fbe301fe648 EFLAGS: 0246 ORIG_RAX: 0010 [ +0.06] RAX: ffda RBX: 7fbcc402d820 RCX: 7fbff90a7317 [ +0.03] RDX: 7fbe301fe690 RSI: c0184b18 RDI: 0004 [ +0.03] RBP: 7fbe301fe690 R08: R09: 7fbcc402d880 [ +0.03] R10: 02001000 R11: 0246 R12: c0184b18 [ +0.03] R13: 0004 R14: 7fbf689593a0 R15: 7fbcc402d820 Cc: Christian König Cc: Felix Kuehling Cc: Alex Deucher Fixes: 627b92ef9d7c ("drm/amdgpu: Wipe all VRAM on free when RAS is enabled") Signed-off-by: Rajneesh Bhardwaj The fixes tag is not necessarily correct, I would remove that. But apart from that the patch is Reviewed-by: Christian König . I suggested the Fixes tag since it was my patch that introduced the problem. Without my patch,
RE: [PATCH v1 3/3] drm/amdgpu: Prevent random memory access in FRU code
One tiny thing to fix before merging > -Original Message- > From: Kasiviswanathan, Harish > Sent: Friday, February 4, 2022 10:01 AM > To: Tuikov, Luben ; amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander ; Tuikov, Luben > ; Russell, Kent > Subject: RE: [PATCH v1 3/3] drm/amdgpu: Prevent random memory access in FRU > code > > [AMD Official Use Only] > > This series acked-by: Harish Kasiviswanathan > > -Original Message- > From: amd-gfx On Behalf Of Luben > Tuikov > Sent: Friday, February 4, 2022 12:27 AM > To: amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander ; Tuikov, Luben > ; Russell, Kent > Subject: [PATCH v1 3/3] drm/amdgpu: Prevent random memory access in FRU code > > Prevent random memory access in the FRU EEPROM code by passing the size of > the destination buffer to the reading routine, and reading no more than the > size of the buffer. > > Cc: Kent Russell > Cc: Alex Deucher > Signed-off-by: Luben Tuikov > --- > .../gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c| 21 +++ > 1 file changed, 12 insertions(+), 9 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c > index 61c4e71e399855..07e045fae83a9a 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c > @@ -77,9 +77,10 @@ static bool is_fru_eeprom_supported(struct amdgpu_device > *adev) > } > > static int amdgpu_fru_read_eeprom(struct amdgpu_device *adev, uint32_t > addrptr, > - unsigned char *buf) > + unsigned char *buf, size_t buf_size) > { > - int ret, size; > + int ret; > + u8 size; > > ret = amdgpu_eeprom_read(adev->pm.fru_eeprom_i2c_bus, addrptr, buf, 1); > if (ret < 1) { > @@ -90,9 +91,11 @@ static int amdgpu_fru_read_eeprom(struct amdgpu_device > *adev, > uint32_t addrptr, > /* The size returned by the i2c requires subtraction of 0xC0 since the >* size apparently always reports as 0xC0+actual size. >*/ > - size = buf[0] - I2C_PRODUCT_INFO_OFFSET; You can also remove this definition from the eeprom.h file, since it's no longer needed. Kent > + size = buf[0] & 0x3F; > + size = min_t(size_t, size, buf_size); > > - ret = amdgpu_eeprom_read(adev->pm.fru_eeprom_i2c_bus, addrptr + 1, buf, > size); > + ret = amdgpu_eeprom_read(adev->pm.fru_eeprom_i2c_bus, addrptr + 1, > + buf, size); > if (ret < 1) { > DRM_WARN("FRU: Failed to get data field"); > return ret; > @@ -129,7 +132,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device > *adev) >* and the language field, so just start from 0xb, manufacturer size >*/ > addrptr = FRU_EEPROM_MADDR + 0xb; > - size = amdgpu_fru_read_eeprom(adev, addrptr, buf); > + size = amdgpu_fru_read_eeprom(adev, addrptr, buf, sizeof(buf)); > if (size < 1) { > DRM_ERROR("Failed to read FRU Manufacturer, ret:%d", size); > return -EINVAL; > @@ -139,7 +142,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device > *adev) >* size field being 1 byte. This pattern continues below. >*/ > addrptr += size + 1; > - size = amdgpu_fru_read_eeprom(adev, addrptr, buf); > + size = amdgpu_fru_read_eeprom(adev, addrptr, buf, sizeof(buf)); > if (size < 1) { > DRM_ERROR("Failed to read FRU product name, ret:%d", size); > return -EINVAL; > @@ -155,7 +158,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device > *adev) > adev->product_name[len] = '\0'; > > addrptr += size + 1; > - size = amdgpu_fru_read_eeprom(adev, addrptr, buf); > + size = amdgpu_fru_read_eeprom(adev, addrptr, buf, sizeof(buf)); > if (size < 1) { > DRM_ERROR("Failed to read FRU product number, ret:%d", size); > return -EINVAL; > @@ -173,7 +176,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device > *adev) > adev->product_number[len] = '\0'; > > addrptr += size + 1; > - size = amdgpu_fru_read_eeprom(adev, addrptr, buf); > + size = amdgpu_fru_read_eeprom(adev, addrptr, buf, sizeof(buf)); > > if (size < 1) { > DRM_ERROR("Failed to read FRU product version, ret:%d", size); > @@ -181,7 +184,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device > *adev) > } > > addrptr += size + 1; > - size = amdgpu_fru_read_eeprom(adev, addrptr, buf); > + size = amdgpu_fru_read_eeprom(adev, addrptr, buf, sizeof(buf)); > > if (size < 1) { > DRM_ERROR("Failed to read FRU serial number, ret:%d", size); > -- > 2.35.0.3.gb23dac905b
RE: [PATCH v1 0/3] AMDGPU FRU fixes
[AMD Official Use Only] Looks good, even if I am a proponent of "buff, don't nerf" Series is Reviewed-by: Kent Russell > -Original Message- > From: Tuikov, Luben > Sent: Friday, February 4, 2022 12:27 AM > To: amd-gfx@lists.freedesktop.org > Cc: Tuikov, Luben ; Deucher, Alexander > ; Russell, Kent ; Grodzovsky, > Andrey > Subject: [PATCH v1 0/3] AMDGPU FRU fixes > > Reordered the patches; fixed some bugs. > > Luben Tuikov (3): > drm/amdgpu: Nerf "buff" to "buf" > drm/amdgpu: Don't offset by 2 in FRU EEPROM > drm/amdgpu: Prevent random memory access in FRU code > > Cc: Alex Deucher > Cc: Kent Russell > Cc: Andrey Grodzovsky > > .../gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c| 36 +-- > 1 file changed, 17 insertions(+), 19 deletions(-) > > base-commit: af42455918c42274f6f317a88c878d59c4564168 > -- > 2.35.0.3.gb23dac905b
RE: [PATCH v1 3/3] drm/amdgpu: Prevent random memory access in FRU code
[AMD Official Use Only] This series acked-by: Harish Kasiviswanathan -Original Message- From: amd-gfx On Behalf Of Luben Tuikov Sent: Friday, February 4, 2022 12:27 AM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Tuikov, Luben ; Russell, Kent Subject: [PATCH v1 3/3] drm/amdgpu: Prevent random memory access in FRU code Prevent random memory access in the FRU EEPROM code by passing the size of the destination buffer to the reading routine, and reading no more than the size of the buffer. Cc: Kent Russell Cc: Alex Deucher Signed-off-by: Luben Tuikov --- .../gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c| 21 +++ 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c index 61c4e71e399855..07e045fae83a9a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c @@ -77,9 +77,10 @@ static bool is_fru_eeprom_supported(struct amdgpu_device *adev) } static int amdgpu_fru_read_eeprom(struct amdgpu_device *adev, uint32_t addrptr, - unsigned char *buf) + unsigned char *buf, size_t buf_size) { - int ret, size; + int ret; + u8 size; ret = amdgpu_eeprom_read(adev->pm.fru_eeprom_i2c_bus, addrptr, buf, 1); if (ret < 1) { @@ -90,9 +91,11 @@ static int amdgpu_fru_read_eeprom(struct amdgpu_device *adev, uint32_t addrptr, /* The size returned by the i2c requires subtraction of 0xC0 since the * size apparently always reports as 0xC0+actual size. */ - size = buf[0] - I2C_PRODUCT_INFO_OFFSET; + size = buf[0] & 0x3F; + size = min_t(size_t, size, buf_size); - ret = amdgpu_eeprom_read(adev->pm.fru_eeprom_i2c_bus, addrptr + 1, buf, size); + ret = amdgpu_eeprom_read(adev->pm.fru_eeprom_i2c_bus, addrptr + 1, +buf, size); if (ret < 1) { DRM_WARN("FRU: Failed to get data field"); return ret; @@ -129,7 +132,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device *adev) * and the language field, so just start from 0xb, manufacturer size */ addrptr = FRU_EEPROM_MADDR + 0xb; - size = amdgpu_fru_read_eeprom(adev, addrptr, buf); + size = amdgpu_fru_read_eeprom(adev, addrptr, buf, sizeof(buf)); if (size < 1) { DRM_ERROR("Failed to read FRU Manufacturer, ret:%d", size); return -EINVAL; @@ -139,7 +142,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device *adev) * size field being 1 byte. This pattern continues below. */ addrptr += size + 1; - size = amdgpu_fru_read_eeprom(adev, addrptr, buf); + size = amdgpu_fru_read_eeprom(adev, addrptr, buf, sizeof(buf)); if (size < 1) { DRM_ERROR("Failed to read FRU product name, ret:%d", size); return -EINVAL; @@ -155,7 +158,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device *adev) adev->product_name[len] = '\0'; addrptr += size + 1; - size = amdgpu_fru_read_eeprom(adev, addrptr, buf); + size = amdgpu_fru_read_eeprom(adev, addrptr, buf, sizeof(buf)); if (size < 1) { DRM_ERROR("Failed to read FRU product number, ret:%d", size); return -EINVAL; @@ -173,7 +176,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device *adev) adev->product_number[len] = '\0'; addrptr += size + 1; - size = amdgpu_fru_read_eeprom(adev, addrptr, buf); + size = amdgpu_fru_read_eeprom(adev, addrptr, buf, sizeof(buf)); if (size < 1) { DRM_ERROR("Failed to read FRU product version, ret:%d", size); @@ -181,7 +184,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device *adev) } addrptr += size + 1; - size = amdgpu_fru_read_eeprom(adev, addrptr, buf); + size = amdgpu_fru_read_eeprom(adev, addrptr, buf, sizeof(buf)); if (size < 1) { DRM_ERROR("Failed to read FRU serial number, ret:%d", size); -- 2.35.0.3.gb23dac905b
Re: [PATCH] drm/amdgpu: move lockdep assert to the right place.
[Public] Acked-by: Alex Deucher From: amd-gfx on behalf of Christian König Sent: Friday, February 4, 2022 3:52 AM To: Bhardwaj, Rajneesh ; amd-gfx@lists.freedesktop.org Subject: [PATCH] drm/amdgpu: move lockdep assert to the right place. Since newly added BOs don't have any mappings it's ok to add them without holding the VM lock. Only when we add per VM BOs the lock is mandatory. Signed-off-by: Christian König Reported-by: Bhardwaj, Rajneesh --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index fdc6a1fd74af..dcc80d6e099e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -375,6 +375,8 @@ static void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base, if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv) return; + dma_resv_assert_held(vm->root.bo->tbo.base.resv); + vm->bulk_moveable = false; if (bo->tbo.type == ttm_bo_type_kernel && bo->parent) amdgpu_vm_bo_relocated(base); @@ -2260,8 +2262,6 @@ struct amdgpu_bo_va *amdgpu_vm_bo_add(struct amdgpu_device *adev, { struct amdgpu_bo_va *bo_va; - dma_resv_assert_held(vm->root.bo->tbo.base.resv); - bo_va = kzalloc(sizeof(struct amdgpu_bo_va), GFP_KERNEL); if (bo_va == NULL) { return NULL; -- 2.25.1
Re: [PATCH v11 5/5] drm/amdgpu: add drm buddy support to amdgpu
Am 04.02.22 um 12:22 schrieb Arunpravin: On 28/01/22 7:48 pm, Matthew Auld wrote: On Thu, 27 Jan 2022 at 14:11, Arunpravin wrote: - Remove drm_mm references and replace with drm buddy functionalities - Add res cursor support for drm buddy v2(Matthew Auld): - replace spinlock with mutex as we call kmem_cache_zalloc (..., GFP_KERNEL) in drm_buddy_alloc() function - lock drm_buddy_block_trim() function as it calls mark_free/mark_split are all globally visible v3(Matthew Auld): - remove trim method error handling as we address the failure case at drm_buddy_block_trim() function v4: - fix warnings reported by kernel test robot v5: - fix merge conflict issue v6: - fix warnings reported by kernel test robot Signed-off-by: Arunpravin --- drivers/gpu/drm/Kconfig | 1 + .../gpu/drm/amd/amdgpu/amdgpu_res_cursor.h| 97 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 7 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 259 ++ 4 files changed, 231 insertions(+), 133 deletions(-) -/** - * amdgpu_vram_mgr_virt_start - update virtual start address - * - * @mem: ttm_resource to update - * @node: just allocated node - * - * Calculate a virtual BO start address to easily check if everything is CPU - * accessible. - */ -static void amdgpu_vram_mgr_virt_start(struct ttm_resource *mem, - struct drm_mm_node *node) -{ - unsigned long start; - - start = node->start + node->size; - if (start > mem->num_pages) - start -= mem->num_pages; - else - start = 0; - mem->start = max(mem->start, start); -} - /** * amdgpu_vram_mgr_new - allocate new ranges * @@ -366,13 +357,13 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, const struct ttm_place *place, struct ttm_resource **res) { - unsigned long lpfn, num_nodes, pages_per_node, pages_left, pages; + unsigned long lpfn, pages_per_node, pages_left, pages, n_pages; + u64 vis_usage = 0, mem_bytes, max_bytes, min_page_size; struct amdgpu_vram_mgr *mgr = to_vram_mgr(man); struct amdgpu_device *adev = to_amdgpu_device(mgr); - uint64_t vis_usage = 0, mem_bytes, max_bytes; - struct ttm_range_mgr_node *node; - struct drm_mm *mm = >mm; - enum drm_mm_insert_mode mode; + struct amdgpu_vram_mgr_node *node; + struct drm_buddy *mm = >mm; + struct drm_buddy_block *block; unsigned i; int r; @@ -391,10 +382,9 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, goto error_sub; } - if (place->flags & TTM_PL_FLAG_CONTIGUOUS) { + if (place->flags & TTM_PL_FLAG_CONTIGUOUS) pages_per_node = ~0ul; - num_nodes = 1; - } else { + else { #ifdef CONFIG_TRANSPARENT_HUGEPAGE pages_per_node = HPAGE_PMD_NR; #else @@ -403,11 +393,9 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, #endif pages_per_node = max_t(uint32_t, pages_per_node, tbo->page_alignment); - num_nodes = DIV_ROUND_UP_ULL(PFN_UP(mem_bytes), pages_per_node); } - node = kvmalloc(struct_size(node, mm_nodes, num_nodes), - GFP_KERNEL | __GFP_ZERO); + node = kzalloc(sizeof(*node), GFP_KERNEL); if (!node) { r = -ENOMEM; goto error_sub; @@ -415,9 +403,17 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, ttm_resource_init(tbo, place, >base); - mode = DRM_MM_INSERT_BEST; + INIT_LIST_HEAD(>blocks); + if (place->flags & TTM_PL_FLAG_TOPDOWN) - mode = DRM_MM_INSERT_HIGH; + node->flags |= DRM_BUDDY_TOPDOWN_ALLOCATION; + + if (place->fpfn || lpfn != man->size) + /* Allocate blocks in desired range */ + node->flags |= DRM_BUDDY_RANGE_ALLOCATION; + + min_page_size = mgr->default_page_size; + BUG_ON(min_page_size < mm->chunk_size); pages_left = node->base.num_pages; @@ -425,36 +421,61 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, pages = min(pages_left, 2UL << (30 - PAGE_SHIFT)); i = 0; - spin_lock(>lock); while (pages_left) { - uint32_t alignment = tbo->page_alignment; - if (pages >= pages_per_node) - alignment = pages_per_node; - - r = drm_mm_insert_node_in_range(mm, >mm_nodes[i], pages, - alignment, 0, place->fpfn, - lpfn, mode); - if (unlikely(r)) { - if (pages > pages_per_node) { -
Re: [PATCH v11 5/5] drm/amdgpu: add drm buddy support to amdgpu
On 28/01/22 7:48 pm, Matthew Auld wrote: > On Thu, 27 Jan 2022 at 14:11, Arunpravin > wrote: >> >> - Remove drm_mm references and replace with drm buddy functionalities >> - Add res cursor support for drm buddy >> >> v2(Matthew Auld): >> - replace spinlock with mutex as we call kmem_cache_zalloc >> (..., GFP_KERNEL) in drm_buddy_alloc() function >> >> - lock drm_buddy_block_trim() function as it calls >> mark_free/mark_split are all globally visible >> >> v3(Matthew Auld): >> - remove trim method error handling as we address the failure case >> at drm_buddy_block_trim() function >> >> v4: >> - fix warnings reported by kernel test robot >> >> v5: >> - fix merge conflict issue >> >> v6: >> - fix warnings reported by kernel test robot >> >> Signed-off-by: Arunpravin >> --- >> drivers/gpu/drm/Kconfig | 1 + >> .../gpu/drm/amd/amdgpu/amdgpu_res_cursor.h| 97 +-- >> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 7 +- >> drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 259 ++ >> 4 files changed, 231 insertions(+), 133 deletions(-) > > > >> >> -/** >> - * amdgpu_vram_mgr_virt_start - update virtual start address >> - * >> - * @mem: ttm_resource to update >> - * @node: just allocated node >> - * >> - * Calculate a virtual BO start address to easily check if everything is CPU >> - * accessible. >> - */ >> -static void amdgpu_vram_mgr_virt_start(struct ttm_resource *mem, >> - struct drm_mm_node *node) >> -{ >> - unsigned long start; >> - >> - start = node->start + node->size; >> - if (start > mem->num_pages) >> - start -= mem->num_pages; >> - else >> - start = 0; >> - mem->start = max(mem->start, start); >> -} >> - >> /** >> * amdgpu_vram_mgr_new - allocate new ranges >> * >> @@ -366,13 +357,13 @@ static int amdgpu_vram_mgr_new(struct >> ttm_resource_manager *man, >>const struct ttm_place *place, >>struct ttm_resource **res) >> { >> - unsigned long lpfn, num_nodes, pages_per_node, pages_left, pages; >> + unsigned long lpfn, pages_per_node, pages_left, pages, n_pages; >> + u64 vis_usage = 0, mem_bytes, max_bytes, min_page_size; >> struct amdgpu_vram_mgr *mgr = to_vram_mgr(man); >> struct amdgpu_device *adev = to_amdgpu_device(mgr); >> - uint64_t vis_usage = 0, mem_bytes, max_bytes; >> - struct ttm_range_mgr_node *node; >> - struct drm_mm *mm = >mm; >> - enum drm_mm_insert_mode mode; >> + struct amdgpu_vram_mgr_node *node; >> + struct drm_buddy *mm = >mm; >> + struct drm_buddy_block *block; >> unsigned i; >> int r; >> >> @@ -391,10 +382,9 @@ static int amdgpu_vram_mgr_new(struct >> ttm_resource_manager *man, >> goto error_sub; >> } >> >> - if (place->flags & TTM_PL_FLAG_CONTIGUOUS) { >> + if (place->flags & TTM_PL_FLAG_CONTIGUOUS) >> pages_per_node = ~0ul; >> - num_nodes = 1; >> - } else { >> + else { >> #ifdef CONFIG_TRANSPARENT_HUGEPAGE >> pages_per_node = HPAGE_PMD_NR; >> #else >> @@ -403,11 +393,9 @@ static int amdgpu_vram_mgr_new(struct >> ttm_resource_manager *man, >> #endif >> pages_per_node = max_t(uint32_t, pages_per_node, >>tbo->page_alignment); >> - num_nodes = DIV_ROUND_UP_ULL(PFN_UP(mem_bytes), >> pages_per_node); >> } >> >> - node = kvmalloc(struct_size(node, mm_nodes, num_nodes), >> - GFP_KERNEL | __GFP_ZERO); >> + node = kzalloc(sizeof(*node), GFP_KERNEL); >> if (!node) { >> r = -ENOMEM; >> goto error_sub; >> @@ -415,9 +403,17 @@ static int amdgpu_vram_mgr_new(struct >> ttm_resource_manager *man, >> >> ttm_resource_init(tbo, place, >base); >> >> - mode = DRM_MM_INSERT_BEST; >> + INIT_LIST_HEAD(>blocks); >> + >> if (place->flags & TTM_PL_FLAG_TOPDOWN) >> - mode = DRM_MM_INSERT_HIGH; >> + node->flags |= DRM_BUDDY_TOPDOWN_ALLOCATION; >> + >> + if (place->fpfn || lpfn != man->size) >> + /* Allocate blocks in desired range */ >> + node->flags |= DRM_BUDDY_RANGE_ALLOCATION; >> + >> + min_page_size = mgr->default_page_size; >> + BUG_ON(min_page_size < mm->chunk_size); >> >> pages_left = node->base.num_pages; >> >> @@ -425,36 +421,61 @@ static int amdgpu_vram_mgr_new(struct >> ttm_resource_manager *man, >> pages = min(pages_left, 2UL << (30 - PAGE_SHIFT)); >> >> i = 0; >> - spin_lock(>lock); >> while (pages_left) { >> - uint32_t alignment = tbo->page_alignment; >> - >> if (pages >= pages_per_node) >> - alignment = pages_per_node; >>
[PATCH] drm/amdgpu: move lockdep assert to the right place.
Since newly added BOs don't have any mappings it's ok to add them without holding the VM lock. Only when we add per VM BOs the lock is mandatory. Signed-off-by: Christian König Reported-by: Bhardwaj, Rajneesh --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index fdc6a1fd74af..dcc80d6e099e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -375,6 +375,8 @@ static void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base, if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv) return; + dma_resv_assert_held(vm->root.bo->tbo.base.resv); + vm->bulk_moveable = false; if (bo->tbo.type == ttm_bo_type_kernel && bo->parent) amdgpu_vm_bo_relocated(base); @@ -2260,8 +2262,6 @@ struct amdgpu_bo_va *amdgpu_vm_bo_add(struct amdgpu_device *adev, { struct amdgpu_bo_va *bo_va; - dma_resv_assert_held(vm->root.bo->tbo.base.resv); - bo_va = kzalloc(sizeof(struct amdgpu_bo_va), GFP_KERNEL); if (bo_va == NULL) { return NULL; -- 2.25.1