AMD Display Core (DC) patches (was: [PATCH 13/16] drm/amd/display: Revert FEC check in validation)
[Cc: +dri-de...@lists.freedesktop.org, +Daniel Vetter, +Alexander Deucher, +Greg KH] Dear Alex, I am a little confused and upset about how Display Core patches are handled in the Linux kernel. Am 25.03.22 um 23:53 schrieb Alex Hung: From: Martin Leung git puts a line “This reverts commit …” into the commit message, when something is reverted. Why isn’t this here? Right now, commit 7d56a154e22f, reverted here, is proposed for the stable series. I guess, because these indicators and meta data are missing. why and how: causes failure on install on certain machines Why are such kind of commit messages accepted? What does “failure on install” even mean? Why can’t the machine configuration be documented so it can be reproduced, when necessary. No less confusing, the date you posted it on amd-gfx is from March 25th, 2022, but the author date of the commit in agd5f/amd-staging-drm-next is `Fri Mar 18 11:12:36 2022 -0400`. Why is the patch missing the Date field then? Reviewed-by: George Shen Acked-by: Alex Hung Signed-off-by: Martin Leung Shouldn’t the Signed-off-by line by the author go first? You committed this on `Mon Mar 28 08:26:48 2022 -0600`, while you posted the patch on amd-gfx on Friday. How should *proper* review happen over the weekend? --- drivers/gpu/drm/amd/display/dc/core/dc.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index f2ad8f58e69c..c436db416708 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -1496,10 +1496,6 @@ bool dc_validate_boot_timing(const struct dc *dc, if (!link->link_enc->funcs->is_dig_enabled(link->link_enc)) return false; - /* Check for FEC status*/ - if (link->link_enc->funcs->fec_is_active(link->link_enc)) - return false; - enc_inst = link->link_enc->funcs->get_dig_frontend(link->link_enc); if (enc_inst == ENGINE_ID_UNKNOWN) The patch reverted here, also lacked proper review, had a to-be desired commit message, did not follow the Linux kernel coding style (missing space before the comment terminator), so should not have been committed in the first place. Seeing how many people are in the Cc list, I would have hoped, that somebody noticed and commented. The current state also makes it really hard for non-AMD employees to get the necessary information to do proper reviews as the needed documentation and information is non-public. So good/excellent commit messages are a must. I think to remember, you replied to me once, that Display Core patches are shared also with the Microsoft Windows driver, restricting the workflow options. But I think the issues I mentioned are unrelated. I know graphics hardware is very complex, but if quality of the commits and review would be improved, hopefully it saves time for everyone in the end, as less bugs are introduced. Could AMD team please address these issues as soon as possible? Kind regards, Paul
Re: [PATCH 04/13] drm/amd/display: FEC check in timing validation
Dear Alex, Am 19.03.22 um 08:43 schrieb Paul Menzel: Dear Alex, dear Chiawen, Thank you for your patch. Am 18.03.22 um 22:47 schrieb Alex Hung: From: Chiawen Huang [Why] disable/enable leads fec mismatch between hw/sw fec state. 1. Disable/enable of what? 2. How can this be reproduced? 3. s/fec/FEC/ [How] check fec status to fastboot on/off. What do you mean by “to fastboot on/off”? Reviewed-by: Anthony Koo Acked-by: Alex Hung Signed-off-by: Chiawen Huang --- drivers/gpu/drm/amd/display/dc/core/dc.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index f6e19efea756..75f9c97bebb0 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -1496,6 +1496,10 @@ bool dc_validate_boot_timing(const struct dc *dc, if (!link->link_enc->funcs->is_dig_enabled(link->link_enc)) return false; + /* Check for FEC status*/ Missing space before `*/`. + if (link->link_enc->funcs->fec_is_active(link->link_enc)) + return false; + enc_inst = link->link_enc->funcs->get_dig_frontend(link->link_enc); if (enc_inst == ENGINE_ID_UNKNOWN) I just saw, that this patch was committed on March 25th, 2022 (commit 7d56a154e22f) with my comments ignored. Could you please explain why? Kind regards, Paul
Re: Vega 56 failing to process EDID from VR Headset
Dear James, Am 11.04.22 um 23:39 schrieb James Dutton: I have an Oculus Rift S, that I am trying to get working in Linux. Please always mention the Linux kernel version. I have an AMD Vega 56 graphics card. The VR headset plugs into a display port of the Vega56. The amdgpu driver sees the connection, and tries to process it. The problem is it cannot process the EDID, so fails to recognise the VR headset, and the VR headset does not work as a result. Please find the EDID below. I am guessing that the following is causing the problem: Established Timings I & II: none Standard Timings: none Forcing the driver to understand the Detailed mode, to which it is failing. If it helps, when attached to windows 10, it uses 1440x2560, portrait mode. Some dmesg lines that may be useful: // We should pick 1440x2560 as Windows picks that, but for some reason is rejects it with error 10. [10402.650734] [drm:create_validate_stream_for_sink [amdgpu]] Mode 1440x2560 (clk 571570) failed DC validation with error 10 (No DP link bandwidth) [10402.650991] [drm:update_stream_scaling_settings [amdgpu]] Destination Rectangle x:0 y:0 width:1440 height:2560 [10402.651225] [drm:create_validate_stream_for_sink [amdgpu]] Mode 1440x2560 (clk 571570) failed DC validation with error 10 (No DP link bandwidth) (Please use an email program, which does not wrap lines after 72 characters.) Can anyone help give me with some pointers as to how to get the amdgpu driver to accept this EDID? So, did you do any changes to Linux? Why do you think the EDID is at fault? […] I suggest to analyze, why `No DP link bandwidth` is logged. The macro is `DC_NO_DP_LINK_BANDWIDTH`, and you should first check why `dp_validate_mode_timing()` in `drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c` returns false. Kind regards, Paul PS: Using the issue tracker [1] might make it easier to keep track of this problem, and also to attach all the necessary information. [1]: https://gitlab.freedesktop.org/drm/amd/-/issues/
Re: [PATCH] drm/amdkfd: shrink bitmap size in struct svm_validate_context
Am 2022-04-11 um 23:06 schrieb Lang Yu: A MAX_GPU_INSTANCE bits bitmap will suffice. Signed-off-by: Lang Yu Thanks for catching that. The patch is Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 8b6adc142e25..459fa07a3bcc 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -1370,7 +1370,7 @@ struct svm_validate_context { struct kfd_process *process; struct svm_range *prange; bool intr; - unsigned long bitmap[MAX_GPU_INSTANCE]; + DECLARE_BITMAP(bitmap, MAX_GPU_INSTANCE); struct ttm_validate_buffer tv[MAX_GPU_INSTANCE]; struct list_head validate_list; struct ww_acquire_ctx ticket;
Re: [PATCH 1/1] amdgpu/pm: Clarify documentation of error handling in send_smc_mesg
On 4/12/2022 9:38 AM, Darren Powell wrote: Contrary to the smu_cmn_send_smc_msg_with_param documentation, two cases exist where messages are silently dropped with no error returned to the caller. These cases occur in unusual situations where either: 1. the message target is a virtual GPU, or This is not fully correct - only messages which are not valid for virtual GPU are dropped, not all. Thanks, Lijo 2. a PCI recovery is underway and the HW is not yet in sync with the SW For more details see commit 4ea5081c82c4 ("drm/amd/powerplay: enable SMC message filter") commit bf36b52e781d ("drm/amdgpu: Avoid accessing HW when suspending SW state") (v2) Reworked with suggestions from Luben & Paul Signed-off-by: Darren Powell --- drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c index b8d0c70ff668..8008ae5508e6 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c @@ -356,9 +356,11 @@ int smu_cmn_wait_for_response(struct smu_context *smu) * completion of the command, and return back a value from the SMU in * @read_arg pointer. * - * Return 0 on success, -errno on error, if we weren't able to send - * the message or if the message completed with some kind of - * error. See __smu_cmn_reg2errno() for details of the -errno. + * Return 0 on success, -errno when a problem is encountered sending + * message or receiving reply. If there is a PCI bus recovery or + * the destination is a virtual GPU, the message is simply dropped and + * success is also returned. + * See __smu_cmn_reg2errno() for details of the -errno. * * If we weren't able to send the message to the SMU, we also print * the error to the standard log. base-commit: 4585c45a6a66cb17cc97f4370457503746e540b7
[PATCH 1/1] amdgpu/pm: Clarify documentation of error handling in send_smc_mesg
Contrary to the smu_cmn_send_smc_msg_with_param documentation, two cases exist where messages are silently dropped with no error returned to the caller. These cases occur in unusual situations where either: 1. the message target is a virtual GPU, or 2. a PCI recovery is underway and the HW is not yet in sync with the SW For more details see commit 4ea5081c82c4 ("drm/amd/powerplay: enable SMC message filter") commit bf36b52e781d ("drm/amdgpu: Avoid accessing HW when suspending SW state") (v2) Reworked with suggestions from Luben & Paul Signed-off-by: Darren Powell --- drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c index b8d0c70ff668..8008ae5508e6 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c @@ -356,9 +356,11 @@ int smu_cmn_wait_for_response(struct smu_context *smu) * completion of the command, and return back a value from the SMU in * @read_arg pointer. * - * Return 0 on success, -errno on error, if we weren't able to send - * the message or if the message completed with some kind of - * error. See __smu_cmn_reg2errno() for details of the -errno. + * Return 0 on success, -errno when a problem is encountered sending + * message or receiving reply. If there is a PCI bus recovery or + * the destination is a virtual GPU, the message is simply dropped and + * success is also returned. + * See __smu_cmn_reg2errno() for details of the -errno. * * If we weren't able to send the message to the SMU, we also print * the error to the standard log. base-commit: 4585c45a6a66cb17cc97f4370457503746e540b7 -- 2.35.1
Re: [Intel-gfx] [V2 1/3] drm/debug: Expose connector's max supported bpc via debugfs
On Tue-12-04-2022 08:37 am, Murthy, Arun R wrote: +static int output_bpc_show(struct seq_file *m, void *data) { Would it be better to have this function name as drm_output_bpc_show() As we are using DEFINE_SHOW_ATTRIBUTE() to define file_operations, this function name must be _show(). Otherwise, either we need to define new file_operations to use the suggested name or rename the debugfs name to "drm_output_bpc" Also, to align/maintain uniform with other debugfs, I think it's ok to use output_bpc_show(). - Bhanu Thanks and Regards, Arun R Murthy
[PATCH v2 5/5] drm/radeon: Use TTM builtin resource manager debugfs code
From: Zack Rusin Switch to using the TTM resource manager debugfs helpers. The functionality is largely the same. The TTM resource managers need to stay valid for as long as the drm debugfs_root is valid. Signed-off-by: Zack Rusin Cc: Alex Deucher Cc: "Christian König" Cc: "Pan, Xinhui" Cc: David Airlie Cc: Daniel Vetter Cc: amd-gfx@lists.freedesktop.org --- drivers/gpu/drm/radeon/radeon_ttm.c | 36 + 1 file changed, 6 insertions(+), 30 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 44594d16611f..d33fec488713 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -781,17 +781,6 @@ void radeon_ttm_set_active_vram_size(struct radeon_device *rdev, u64 size) #if defined(CONFIG_DEBUG_FS) -static int radeon_mm_vram_dump_table_show(struct seq_file *m, void *unused) -{ - struct radeon_device *rdev = (struct radeon_device *)m->private; - struct ttm_resource_manager *man = ttm_manager_type(&rdev->mman.bdev, - TTM_PL_VRAM); - struct drm_printer p = drm_seq_file_printer(m); - - ttm_resource_manager_debug(man, &p); - return 0; -} - static int radeon_ttm_page_pool_show(struct seq_file *m, void *data) { struct radeon_device *rdev = (struct radeon_device *)m->private; @@ -799,19 +788,6 @@ static int radeon_ttm_page_pool_show(struct seq_file *m, void *data) return ttm_pool_debugfs(&rdev->mman.bdev.pool, m); } -static int radeon_mm_gtt_dump_table_show(struct seq_file *m, void *unused) -{ - struct radeon_device *rdev = (struct radeon_device *)m->private; - struct ttm_resource_manager *man = ttm_manager_type(&rdev->mman.bdev, - TTM_PL_TT); - struct drm_printer p = drm_seq_file_printer(m); - - ttm_resource_manager_debug(man, &p); - return 0; -} - -DEFINE_SHOW_ATTRIBUTE(radeon_mm_vram_dump_table); -DEFINE_SHOW_ATTRIBUTE(radeon_mm_gtt_dump_table); DEFINE_SHOW_ATTRIBUTE(radeon_ttm_page_pool); static int radeon_ttm_vram_open(struct inode *inode, struct file *filep) @@ -930,15 +906,15 @@ static void radeon_ttm_debugfs_init(struct radeon_device *rdev) debugfs_create_file("radeon_vram", 0444, root, rdev, &radeon_ttm_vram_fops); - debugfs_create_file("radeon_gtt", 0444, root, rdev, &radeon_ttm_gtt_fops); - - debugfs_create_file("radeon_vram_mm", 0444, root, rdev, - &radeon_mm_vram_dump_table_fops); - debugfs_create_file("radeon_gtt_mm", 0444, root, rdev, - &radeon_mm_gtt_dump_table_fops); debugfs_create_file("ttm_page_pool", 0444, root, rdev, &radeon_ttm_page_pool_fops); + ttm_resource_manager_create_debugfs(ttm_manager_type(&rdev->mman.bdev, +TTM_PL_VRAM), + root, "radeon_vram_mm"); + ttm_resource_manager_create_debugfs(ttm_manager_type(&rdev->mman.bdev, +TTM_PL_TT), + root, "radeon_gtt_mm"); #endif } -- 2.32.0
[PATCH v2 3/5] drm/amdgpu: Use TTM builtin resource manager debugfs code
From: Zack Rusin Switch to using the TTM resource manager debugfs helpers. It's exactly the same functionality but the debugfs code is shared with other drivers. The TTM resource managers need to stay valid for as long as the drm debugfs_root is valid. Signed-off-by: Zack Rusin Cc: Alex Deucher Cc: "Christian König" Cc: "Pan, Xinhui" Cc: David Airlie Cc: Daniel Vetter Cc: Felix Kuehling Cc: Nirmoy Das Cc: Thomas Zimmermann Cc: amd-gfx@lists.freedesktop.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 86 + 1 file changed, 16 insertions(+), 70 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 57ac118fc266..37fe9fb3f8e5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -2079,17 +2079,6 @@ int amdgpu_ttm_evict_resources(struct amdgpu_device *adev, int mem_type) #if defined(CONFIG_DEBUG_FS) -static int amdgpu_mm_vram_table_show(struct seq_file *m, void *unused) -{ - struct amdgpu_device *adev = (struct amdgpu_device *)m->private; - struct ttm_resource_manager *man = ttm_manager_type(&adev->mman.bdev, - TTM_PL_VRAM); - struct drm_printer p = drm_seq_file_printer(m); - - ttm_resource_manager_debug(man, &p); - return 0; -} - static int amdgpu_ttm_page_pool_show(struct seq_file *m, void *unused) { struct amdgpu_device *adev = (struct amdgpu_device *)m->private; @@ -2097,55 +2086,6 @@ static int amdgpu_ttm_page_pool_show(struct seq_file *m, void *unused) return ttm_pool_debugfs(&adev->mman.bdev.pool, m); } -static int amdgpu_mm_tt_table_show(struct seq_file *m, void *unused) -{ - struct amdgpu_device *adev = (struct amdgpu_device *)m->private; - struct ttm_resource_manager *man = ttm_manager_type(&adev->mman.bdev, - TTM_PL_TT); - struct drm_printer p = drm_seq_file_printer(m); - - ttm_resource_manager_debug(man, &p); - return 0; -} - -static int amdgpu_mm_gds_table_show(struct seq_file *m, void *unused) -{ - struct amdgpu_device *adev = (struct amdgpu_device *)m->private; - struct ttm_resource_manager *man = ttm_manager_type(&adev->mman.bdev, - AMDGPU_PL_GDS); - struct drm_printer p = drm_seq_file_printer(m); - - ttm_resource_manager_debug(man, &p); - return 0; -} - -static int amdgpu_mm_gws_table_show(struct seq_file *m, void *unused) -{ - struct amdgpu_device *adev = (struct amdgpu_device *)m->private; - struct ttm_resource_manager *man = ttm_manager_type(&adev->mman.bdev, - AMDGPU_PL_GWS); - struct drm_printer p = drm_seq_file_printer(m); - - ttm_resource_manager_debug(man, &p); - return 0; -} - -static int amdgpu_mm_oa_table_show(struct seq_file *m, void *unused) -{ - struct amdgpu_device *adev = (struct amdgpu_device *)m->private; - struct ttm_resource_manager *man = ttm_manager_type(&adev->mman.bdev, - AMDGPU_PL_OA); - struct drm_printer p = drm_seq_file_printer(m); - - ttm_resource_manager_debug(man, &p); - return 0; -} - -DEFINE_SHOW_ATTRIBUTE(amdgpu_mm_vram_table); -DEFINE_SHOW_ATTRIBUTE(amdgpu_mm_tt_table); -DEFINE_SHOW_ATTRIBUTE(amdgpu_mm_gds_table); -DEFINE_SHOW_ATTRIBUTE(amdgpu_mm_gws_table); -DEFINE_SHOW_ATTRIBUTE(amdgpu_mm_oa_table); DEFINE_SHOW_ATTRIBUTE(amdgpu_ttm_page_pool); /* @@ -2355,17 +2295,23 @@ void amdgpu_ttm_debugfs_init(struct amdgpu_device *adev) &amdgpu_ttm_vram_fops, adev->gmc.mc_vram_size); debugfs_create_file("amdgpu_iomem", 0444, root, adev, &amdgpu_ttm_iomem_fops); - debugfs_create_file("amdgpu_vram_mm", 0444, root, adev, - &amdgpu_mm_vram_table_fops); - debugfs_create_file("amdgpu_gtt_mm", 0444, root, adev, - &amdgpu_mm_tt_table_fops); - debugfs_create_file("amdgpu_gds_mm", 0444, root, adev, - &amdgpu_mm_gds_table_fops); - debugfs_create_file("amdgpu_gws_mm", 0444, root, adev, - &amdgpu_mm_gws_table_fops); - debugfs_create_file("amdgpu_oa_mm", 0444, root, adev, - &amdgpu_mm_oa_table_fops); debugfs_create_file("ttm_page_pool", 0444, root, adev, &amdgpu_ttm_page_pool_fops); + ttm_resource_manager_create_debugfs(ttm_manager_type(&adev->mman.bdev, +TTM_PL_VRAM), + root, "amdgpu_vram_mm"); + ttm_resource_manager_create_debugfs(ttm_manager_type(&adev->mman.bdev, +
RE: [Intel-gfx] [V2 2/3] drm/i915/display/debug: Expose crtc current bpc via debugfs
> -Original Message- > From: Intel-gfx On Behalf Of > Bhanuprakash Modem > Sent: Monday, April 11, 2022 3:21 PM > To: intel-...@lists.freedesktop.org; dri-de...@lists.freedesktop.org; amd- > g...@lists.freedesktop.org; jani.nik...@linux.intel.com; > ville.syrj...@linux.intel.com; harry.wentl...@amd.com; Sharma, Swati2 > > Subject: [Intel-gfx] [V2 2/3] drm/i915/display/debug: Expose crtc current bpc > via debugfs > > This new debugfs will expose the currently using bpc by crtc. > It is very useful for verifying whether we enter the correct output color > depth > from IGT. > > This patch will also add the connector's max supported bpc to > "i915_display_info" debugfs. > > Example: > cat /sys/kernel/debug/dri/0/crtc-0/i915_current_bpc > Current: 8 > > Cc: Jani Nikula > Cc: Ville Syrjälä > Cc: Uma Shankar > Signed-off-by: Bhanuprakash Modem Reviewed-by: Arun R Murthy Thanks and Regards, Arun R Murthy ---
RE: [Intel-gfx] [V2 1/3] drm/debug: Expose connector's max supported bpc via debugfs
> +static int output_bpc_show(struct seq_file *m, void *data) { Would it be better to have this function name as drm_output_bpc_show() Thanks and Regards, Arun R Murthy
[PATCH] drm/amdkfd: shrink bitmap size in struct svm_validate_context
A MAX_GPU_INSTANCE bits bitmap will suffice. Signed-off-by: Lang Yu --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 8b6adc142e25..459fa07a3bcc 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -1370,7 +1370,7 @@ struct svm_validate_context { struct kfd_process *process; struct svm_range *prange; bool intr; - unsigned long bitmap[MAX_GPU_INSTANCE]; + DECLARE_BITMAP(bitmap, MAX_GPU_INSTANCE); struct ttm_validate_buffer tv[MAX_GPU_INSTANCE]; struct list_head validate_list; struct ww_acquire_ctx ticket; -- 2.25.1
RE: [PATCH 1/3] drm/amdgpu: add poison consumption flag for RAS IH
[AMD Official Use Only] Hi Stanley, The flag is set by RAS block poison irq handler, such as vcn/jpeg poison irq handler. It's not configured in RAS init. Regards, Tao > -Original Message- > From: Yang, Stanley > Sent: Monday, April 11, 2022 10:12 PM > To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org; Lazar, > Lijo ; Ziya, Mohammad zafar > ; Zhang, Hawking > ; Chai, Thomas > Subject: 回复: [PATCH 1/3] drm/amdgpu: add poison consumption flag for RAS > IH > > [AMD Official Use Only] > > Hi Tao, > > According to the series patches, I have one question, is the ras_ih_flag set > according to poison mode configuration, if yes, driver will handle poison once > get ecc_irq interrupt, but at this moment there may no app to consumes it, > this > seems conflict the poison consumption definition. > > Regards, > Stanley > > -邮件原件- > > 发件人: Zhou1, Tao > > 发送时间: Monday, April 11, 2022 7:08 PM > > 收件人: amd-gfx@lists.freedesktop.org; Lazar, Lijo ; > > Ziya, Mohammad zafar ; Zhang, Hawking > > ; Yang, Stanley ; Chai, > > Thomas > > 抄送: Zhou1, Tao > > 主题: [PATCH 1/3] drm/amdgpu: add poison consumption flag for RAS IH > > > > So we can distinguish RAS poison consumption interrupt from UE interrupt. > > > > Signed-off-by: Tao Zhou > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 7 +++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h > > index 606df8869b89..380f4c3020c7 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h > > @@ -314,6 +314,11 @@ enum amdgpu_ras_ret { > > AMDGPU_RAS_PT, > > }; > > > > +enum amdgpu_ras_ih_flag { > > + AMDGPU_RAS_IH_POISON_CONSUMPTION = 0, > > + AMDGPU_RAS_IH_LAST, > > +}; > > + > > struct ras_common_if { > > enum amdgpu_ras_block block; > > enum amdgpu_ras_error_type type; @@ -419,6 +424,8 @@ struct > > ras_ih_data { > > unsigned int aligned_element_size; > > unsigned int rptr; > > unsigned int wptr; > > + /* interrupt type flag */ > > + unsigned int flag; > > }; > > > > struct ras_manager { > > -- > > 2.35.1 >
Re: [PATCH ] drm/amdgpu: fix discovery ip failed
Thanks Jesse. Please also add below before you commit. Fixes: 1ef60191b259 ("drm/amdgpu: Use flexible array member") Reviewed-by: Lijo Lazar Thanks, Lijo On 4/12/2022 7:22 AM, Zhang, Jesse(Jie) wrote: [AMD Official Use Only] Thanks Paul Menzel Attach the patch file . -Original Message- From: Paul Menzel Sent: Monday, 11 April 2022 11:21 pm To: Zhang, Jesse(Jie) Cc: amd-gfx@lists.freedesktop.org; Zhang, Yifan ; Huang, Ray Subject: Re: [PATCH ] drm/amdgpu: fix discovery ip failed [CAUTION: External Email] Dear Jie, Thank you for your patch. Am 11.04.22 um 17:15 schrieb Zhang, Jesse(Jie): You might want to add a space before the (. [AMD Official Use Only] Please send a patch with `git format-patch` or similar. Fix discovery ip failed, and the log: On what system? 56.129549] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected number_instance (64) from ip discovery blob [ 56.130129] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected hw_id (45056) from ip discovery blob [ 56.130701] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected number_instance (66) from ip discovery blob [ 56.131283] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected hw_id (45568) from ip discovery blob [ 56.131855] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected number_instance (66) from ip discovery blob [ 56.132436] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected hw_id (480) from ip discovery blob [ 56.133053] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected hw_id (608) from ip discovery blob [ 56.133626] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected hw_id (640) from ip discovery blob [ 56.134207] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected number_instance (64) from ip discovery blob [ 56.134780] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected number_instance (64) from ip discovery blob [ 56.135360] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected hw_id (28672) from ip discovery blob Please describe the reason for the failure, and your fix. A Signed-off-by line is missing. diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c index 766006a075ec..a778b0392e9f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c @@ -798,7 +798,7 @@ static int amdgpu_discovery_sysfs_ips(struct amdgpu_device *adev, res = kobject_add(&ip_hw_instance->kobj, NULL, "%d", ip_hw_instance->num_instance); next_ip: - ip_offset += sizeof(*ip) + 4 * (ip->num_base_address - 1); + ip_offset += struct_size(ip, base_address, ip->num_base_address); } } @@ -1063,7 +1063,7 @@ int amdgpu_discovery_reg_base_init(struct amdgpu_device *adev) } next_ip: - ip_offset += sizeof(*ip) + 4 * (ip->num_base_address - 1); + ip_offset += struct_size(ip, base_address, ip->num_base_address); } } @@ -1113,7 +1113,7 @@ int amdgpu_discovery_get_ip_version(struct amdgpu_device *adev, int hw_id, int n *revision = ip->revision; return 0; } - ip_offset += sizeof(*ip) + 4 * (ip->num_base_address - 1); + ip_offset += struct_size(ip, base_address, ip->num_base_address); } } Kind regards, Paul
Re: [PATCHv2] drm/amdkfd: Cleanup IO links during KFD device removal
Am 2022-04-11 um 22:15 schrieb Mukul Joshi: Currently, the IO-links to the device being removed from topology, are not cleared. As a result, there would be dangling links left in the KFD topology. This patch aims to fix the following: 1. Cleanup all IO links to the device being removed. 2. Ensure that node numbering in sysfs and nodes proximity domain values are consistent after the device is removed: a. Adding a device and removing a GPU device are made mutually exclusive. b. The global proximity domain counter is no longer required to be an atomic counter. A normal 32-bit counter can be used instead. 3. Update generation_count to let user-mode know that topology has changed due to device removal. CC: Shuotao Xu Signed-off-by: Mukul Joshi Reviewed-by: Shuotao Xu Reviewed-by: Felix Kuehling --- v1->v2: - Remove comments from inside kfd_topology_update_io_links() and add them as kernel-doc comments. drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 4 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 + drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 83 --- 3 files changed, 78 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c index 1eaabd2cb41b..afc8a7fcdad8 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c @@ -1056,7 +1056,7 @@ static int kfd_parse_subtype_iolink(struct crat_subtype_iolink *iolink, * table, add corresponded reversed direction link now. */ if (props && (iolink->flags & CRAT_IOLINK_FLAGS_BI_DIRECTIONAL)) { - to_dev = kfd_topology_device_by_proximity_domain(id_to); + to_dev = kfd_topology_device_by_proximity_domain_no_lock(id_to); if (!to_dev) return -ENODEV; /* same everything but the other direction */ @@ -2225,7 +2225,7 @@ static int kfd_create_vcrat_image_gpu(void *pcrat_image, */ if (kdev->hive_id) { for (nid = 0; nid < proximity_domain; ++nid) { - peer_dev = kfd_topology_device_by_proximity_domain(nid); + peer_dev = kfd_topology_device_by_proximity_domain_no_lock(nid); if (!peer_dev->gpu) continue; if (peer_dev->gpu->hive_id != kdev->hive_id) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index e1b7e6afa920..8a43def1f638 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -1016,6 +1016,8 @@ int kfd_topology_add_device(struct kfd_dev *gpu); int kfd_topology_remove_device(struct kfd_dev *gpu); struct kfd_topology_device *kfd_topology_device_by_proximity_domain( uint32_t proximity_domain); +struct kfd_topology_device *kfd_topology_device_by_proximity_domain_no_lock( + uint32_t proximity_domain); struct kfd_topology_device *kfd_topology_device_by_id(uint32_t gpu_id); struct kfd_dev *kfd_device_by_id(uint32_t gpu_id); struct kfd_dev *kfd_device_by_pci_dev(const struct pci_dev *pdev); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index 3bdcae239bc0..98a51847cd8c 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -46,22 +46,32 @@ static struct list_head topology_device_list; static struct kfd_system_properties sys_props; static DECLARE_RWSEM(topology_lock); -static atomic_t topology_crat_proximity_domain; +static uint32_t topology_crat_proximity_domain; -struct kfd_topology_device *kfd_topology_device_by_proximity_domain( +struct kfd_topology_device *kfd_topology_device_by_proximity_domain_no_lock( uint32_t proximity_domain) { struct kfd_topology_device *top_dev; struct kfd_topology_device *device = NULL; - down_read(&topology_lock); - list_for_each_entry(top_dev, &topology_device_list, list) if (top_dev->proximity_domain == proximity_domain) { device = top_dev; break; } + return device; +} + +struct kfd_topology_device *kfd_topology_device_by_proximity_domain( + uint32_t proximity_domain) +{ + struct kfd_topology_device *device = NULL; + + down_read(&topology_lock); + + device = kfd_topology_device_by_proximity_domain_no_lock( + proximity_domain); up_read(&topology_lock); return device; @@ -1060,7 +1070,7 @@ int kfd_topology_init(void) down_write(&topology_lock); kfd_topology_update_device_list(&temp_topology_device_list,
[PATCHv2] drm/amdkfd: Cleanup IO links during KFD device removal
Currently, the IO-links to the device being removed from topology, are not cleared. As a result, there would be dangling links left in the KFD topology. This patch aims to fix the following: 1. Cleanup all IO links to the device being removed. 2. Ensure that node numbering in sysfs and nodes proximity domain values are consistent after the device is removed: a. Adding a device and removing a GPU device are made mutually exclusive. b. The global proximity domain counter is no longer required to be an atomic counter. A normal 32-bit counter can be used instead. 3. Update generation_count to let user-mode know that topology has changed due to device removal. CC: Shuotao Xu Signed-off-by: Mukul Joshi Reviewed-by: Shuotao Xu --- v1->v2: - Remove comments from inside kfd_topology_update_io_links() and add them as kernel-doc comments. drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 4 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 + drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 83 --- 3 files changed, 78 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c index 1eaabd2cb41b..afc8a7fcdad8 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c @@ -1056,7 +1056,7 @@ static int kfd_parse_subtype_iolink(struct crat_subtype_iolink *iolink, * table, add corresponded reversed direction link now. */ if (props && (iolink->flags & CRAT_IOLINK_FLAGS_BI_DIRECTIONAL)) { - to_dev = kfd_topology_device_by_proximity_domain(id_to); + to_dev = kfd_topology_device_by_proximity_domain_no_lock(id_to); if (!to_dev) return -ENODEV; /* same everything but the other direction */ @@ -2225,7 +2225,7 @@ static int kfd_create_vcrat_image_gpu(void *pcrat_image, */ if (kdev->hive_id) { for (nid = 0; nid < proximity_domain; ++nid) { - peer_dev = kfd_topology_device_by_proximity_domain(nid); + peer_dev = kfd_topology_device_by_proximity_domain_no_lock(nid); if (!peer_dev->gpu) continue; if (peer_dev->gpu->hive_id != kdev->hive_id) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index e1b7e6afa920..8a43def1f638 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -1016,6 +1016,8 @@ int kfd_topology_add_device(struct kfd_dev *gpu); int kfd_topology_remove_device(struct kfd_dev *gpu); struct kfd_topology_device *kfd_topology_device_by_proximity_domain( uint32_t proximity_domain); +struct kfd_topology_device *kfd_topology_device_by_proximity_domain_no_lock( + uint32_t proximity_domain); struct kfd_topology_device *kfd_topology_device_by_id(uint32_t gpu_id); struct kfd_dev *kfd_device_by_id(uint32_t gpu_id); struct kfd_dev *kfd_device_by_pci_dev(const struct pci_dev *pdev); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index 3bdcae239bc0..98a51847cd8c 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -46,22 +46,32 @@ static struct list_head topology_device_list; static struct kfd_system_properties sys_props; static DECLARE_RWSEM(topology_lock); -static atomic_t topology_crat_proximity_domain; +static uint32_t topology_crat_proximity_domain; -struct kfd_topology_device *kfd_topology_device_by_proximity_domain( +struct kfd_topology_device *kfd_topology_device_by_proximity_domain_no_lock( uint32_t proximity_domain) { struct kfd_topology_device *top_dev; struct kfd_topology_device *device = NULL; - down_read(&topology_lock); - list_for_each_entry(top_dev, &topology_device_list, list) if (top_dev->proximity_domain == proximity_domain) { device = top_dev; break; } + return device; +} + +struct kfd_topology_device *kfd_topology_device_by_proximity_domain( + uint32_t proximity_domain) +{ + struct kfd_topology_device *device = NULL; + + down_read(&topology_lock); + + device = kfd_topology_device_by_proximity_domain_no_lock( + proximity_domain); up_read(&topology_lock); return device; @@ -1060,7 +1070,7 @@ int kfd_topology_init(void) down_write(&topology_lock); kfd_topology_update_device_list(&temp_topology_device_list, &topology_device_list); - atomic_set(&topology_crat_proximi
RE: [PATCH ] drm/amdgpu: fix discovery ip failed
[AMD Official Use Only] Thanks Paul Menzel Attach the patch file . -Original Message- From: Paul Menzel Sent: Monday, 11 April 2022 11:21 pm To: Zhang, Jesse(Jie) Cc: amd-gfx@lists.freedesktop.org; Zhang, Yifan ; Huang, Ray Subject: Re: [PATCH ] drm/amdgpu: fix discovery ip failed [CAUTION: External Email] Dear Jie, Thank you for your patch. Am 11.04.22 um 17:15 schrieb Zhang, Jesse(Jie): You might want to add a space before the (. > [AMD Official Use Only] Please send a patch with `git format-patch` or similar. > Fix discovery ip failed, and the log: On what system? > 56.129549] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected > number_instance (64) from ip discovery blob > [ 56.130129] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected > hw_id (45056) from ip discovery blob > [ 56.130701] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected > number_instance (66) from ip discovery blob > [ 56.131283] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected > hw_id (45568) from ip discovery blob > [ 56.131855] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected > number_instance (66) from ip discovery blob > [ 56.132436] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected > hw_id (480) from ip discovery blob > [ 56.133053] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected > hw_id (608) from ip discovery blob > [ 56.133626] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected > hw_id (640) from ip discovery blob > [ 56.134207] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected > number_instance (64) from ip discovery blob > [ 56.134780] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected > number_instance (64) from ip discovery blob > [ 56.135360] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected > hw_id (28672) from ip discovery blob Please describe the reason for the failure, and your fix. A Signed-off-by line is missing. > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c > index 766006a075ec..a778b0392e9f 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c > @@ -798,7 +798,7 @@ static int amdgpu_discovery_sysfs_ips(struct > amdgpu_device *adev, > res = kobject_add(&ip_hw_instance->kobj, NULL, >"%d", > ip_hw_instance->num_instance); > next_ip: > - ip_offset += sizeof(*ip) + 4 * (ip->num_base_address > - 1); > + ip_offset += struct_size(ip, base_address, > ip->num_base_address); > } > } > > @@ -1063,7 +1063,7 @@ int amdgpu_discovery_reg_base_init(struct amdgpu_device > *adev) > } > > next_ip: > - ip_offset += sizeof(*ip) + 4 * (ip->num_base_address > - 1); > + ip_offset += struct_size(ip, base_address, > ip->num_base_address); > } > } > > @@ -1113,7 +1113,7 @@ int amdgpu_discovery_get_ip_version(struct > amdgpu_device *adev, int hw_id, int n > *revision = ip->revision; > return 0; > } > - ip_offset += sizeof(*ip) + 4 * (ip->num_base_address > - 1); > + ip_offset += struct_size(ip, base_address, > ip->num_base_address); > } > } > Kind regards, Paul 0001-fix-ip-discovery-failed-SWDEV-332054.patch Description: 0001-fix-ip-discovery-failed-SWDEV-332054.patch
Re: [PATCH] drm/amdkfd: Cleanup IO links during KFD device removal
Am 2022-04-11 um 21:14 schrieb Joshi, Mukul: [AMD Official Use Only] -Original Message- From: Kuehling, Felix Sent: Monday, April 11, 2022 8:16 PM To: Joshi, Mukul ; amd-gfx@lists.freedesktop.org Cc: Shuotao Xu Subject: Re: [PATCH] drm/amdkfd: Cleanup IO links during KFD device removal Am 2022-04-07 um 12:15 schrieb Mukul Joshi: Currently, the IO-links to the device being removed from topology, are not cleared. As a result, there would be dangling links left in the KFD topology. This patch aims to fix the following: 1. Cleanup all IO links to the device being removed. 2. Ensure that node numbering in sysfs and nodes proximity domain values are consistent after the device is removed: a. Adding a device and removing a GPU device are made mutually exclusive. b. The global proximity domain counter is no longer required to be an atomic counter. A normal 32-bit counter can be used instead. 3. Update generation_count to let user-mode know that topology has changed due to device removal. CC: Shuotao Xu Signed-off-by: Mukul Joshi Looks good to me. I have two nit-picks inline. --- drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 4 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 + drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 79 --- 3 files changed, 74 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c index 1eaabd2cb41b..afc8a7fcdad8 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c @@ -1056,7 +1056,7 @@ static int kfd_parse_subtype_iolink(struct crat_subtype_iolink *iolink, * table, add corresponded reversed direction link now. */ if (props && (iolink->flags & CRAT_IOLINK_FLAGS_BI_DIRECTIONAL)) { - to_dev = kfd_topology_device_by_proximity_domain(id_to); + to_dev = kfd_topology_device_by_proximity_domain_no_lock(id_to); if (!to_dev) return -ENODEV; /* same everything but the other direction */ @@ -2225,7 +2225,7 @@ static int kfd_create_vcrat_image_gpu(void *pcrat_image, */ if (kdev->hive_id) { for (nid = 0; nid < proximity_domain; ++nid) { - peer_dev = kfd_topology_device_by_proximity_domain(nid); + peer_dev = kfd_topology_device_by_proximity_domain_no_lock(nid); if (!peer_dev->gpu) continue; if (peer_dev->gpu->hive_id != kdev->hive_id) diff -- git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index e1b7e6afa920..8a43def1f638 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -1016,6 +1016,8 @@ int kfd_topology_add_device(struct kfd_dev *gpu); int kfd_topology_remove_device(struct kfd_dev *gpu); struct kfd_topology_device *kfd_topology_device_by_proximity_domain( uint32_t proximity_domain); +struct kfd_topology_device *kfd_topology_device_by_proximity_domain_no_lock( + uint32_t proximity_domain); struct kfd_topology_device *kfd_topology_device_by_id(uint32_t gpu_id); struct kfd_dev *kfd_device_by_id(uint32_t gpu_id); struct kfd_dev *kfd_device_by_pci_dev(const struct pci_dev *pdev); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index 3bdcae239bc0..874a273b81f7 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -46,27 +46,38 @@ static struct list_head topology_device_list; static struct kfd_system_properties sys_props; static DECLARE_RWSEM(topology_lock); -static atomic_t topology_crat_proximity_domain; +static uint32_t topology_crat_proximity_domain; -struct kfd_topology_device *kfd_topology_device_by_proximity_domain( +struct kfd_topology_device +*kfd_topology_device_by_proximity_domain_no_lock( uint32_t proximity_domain) I remember we discussed this and I suggested splitting a no_lock version out of this function. But now I don't see it being used anywhere. Was that lost somewhere in refactoring or porting to the upstream branch? Maybe the no_lock version isn't needed any more. Its used in the changes in kfd_crat.c (in kfd_create_vcrat_image_gpu() and kfd_parse_subtype_iolink ()) and below in kfd_topology_device_by_proximity_domain(). You're right, I missed the changes in kfd_crat.c. And they are needed because the whole CRAT table parsing is now under the topology lock. Thanks for the reminder. Regards, Felix { struct kfd_topology_device *top_dev; struct kfd_topology_device *device = NULL; - down_read(&topology_lock); - l
RE: [PATCH] drm/amdkfd: Cleanup IO links during KFD device removal
[AMD Official Use Only] > -Original Message- > From: Kuehling, Felix > Sent: Monday, April 11, 2022 8:16 PM > To: Joshi, Mukul ; amd-gfx@lists.freedesktop.org > Cc: Shuotao Xu > Subject: Re: [PATCH] drm/amdkfd: Cleanup IO links during KFD device > removal > > Am 2022-04-07 um 12:15 schrieb Mukul Joshi: > > Currently, the IO-links to the device being removed from topology, are > > not cleared. As a result, there would be dangling links left in the > > KFD topology. This patch aims to fix the following: > > 1. Cleanup all IO links to the device being removed. > > 2. Ensure that node numbering in sysfs and nodes proximity domain > > values are consistent after the device is removed: > > a. Adding a device and removing a GPU device are made mutually > >exclusive. > > b. The global proximity domain counter is no longer required to be > >an atomic counter. A normal 32-bit counter can be used instead. > > 3. Update generation_count to let user-mode know that topology has > > changed due to device removal. > > > > CC: Shuotao Xu > > Signed-off-by: Mukul Joshi > > Looks good to me. I have two nit-picks inline. > > > > --- > > drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 4 +- > > drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 + > > drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 79 > --- > > 3 files changed, 74 insertions(+), 11 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c > > b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c > > index 1eaabd2cb41b..afc8a7fcdad8 100644 > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c > > @@ -1056,7 +1056,7 @@ static int kfd_parse_subtype_iolink(struct > crat_subtype_iolink *iolink, > > * table, add corresponded reversed direction link now. > > */ > > if (props && (iolink->flags & CRAT_IOLINK_FLAGS_BI_DIRECTIONAL)) > { > > - to_dev = > kfd_topology_device_by_proximity_domain(id_to); > > + to_dev = > kfd_topology_device_by_proximity_domain_no_lock(id_to); > > if (!to_dev) > > return -ENODEV; > > /* same everything but the other direction */ @@ -2225,7 > +2225,7 > > @@ static int kfd_create_vcrat_image_gpu(void *pcrat_image, > > */ > > if (kdev->hive_id) { > > for (nid = 0; nid < proximity_domain; ++nid) { > > - peer_dev = > kfd_topology_device_by_proximity_domain(nid); > > + peer_dev = > kfd_topology_device_by_proximity_domain_no_lock(nid); > > if (!peer_dev->gpu) > > continue; > > if (peer_dev->gpu->hive_id != kdev->hive_id) diff -- > git > > a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > > b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > > index e1b7e6afa920..8a43def1f638 100644 > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > > @@ -1016,6 +1016,8 @@ int kfd_topology_add_device(struct kfd_dev > *gpu); > > int kfd_topology_remove_device(struct kfd_dev *gpu); > > struct kfd_topology_device > *kfd_topology_device_by_proximity_domain( > > uint32_t proximity_domain); > > +struct kfd_topology_device > *kfd_topology_device_by_proximity_domain_no_lock( > > + uint32_t proximity_domain); > > struct kfd_topology_device *kfd_topology_device_by_id(uint32_t > gpu_id); > > struct kfd_dev *kfd_device_by_id(uint32_t gpu_id); > > struct kfd_dev *kfd_device_by_pci_dev(const struct pci_dev *pdev); > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c > > b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c > > index 3bdcae239bc0..874a273b81f7 100644 > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c > > @@ -46,27 +46,38 @@ static struct list_head topology_device_list; > > static struct kfd_system_properties sys_props; > > > > static DECLARE_RWSEM(topology_lock); -static atomic_t > > topology_crat_proximity_domain; > > +static uint32_t topology_crat_proximity_domain; > > > > -struct kfd_topology_device > *kfd_topology_device_by_proximity_domain( > > +struct kfd_topology_device > > +*kfd_topology_device_by_proximity_domain_no_lock( > > uint32_t proximity_domain) > > I remember we discussed this and I suggested splitting a no_lock version out > of this function. But now I don't see it being used anywhere. Was that lost > somewhere in refactoring or porting to the upstream branch? > Maybe the no_lock version isn't needed any more. > Its used in the changes in kfd_crat.c (in kfd_create_vcrat_image_gpu() and kfd_parse_subtype_iolink ()) and below in kfd_topology_device_by_proximity_domain(). > > > { > > struct kfd_topology_device *top_dev; > > struct kfd_topology_device *device = NULL; > > > > - down_read(&topology_lock); > > - >
[PATCH AUTOSEL 4.9 1/7] drm/amdkfd: Check for potential null return of kmalloc_array()
From: QintaoShen [ Upstream commit ebbb7bb9e80305820dc2328a371c1b35679f2667 ] As the kmalloc_array() may return null, the 'event_waiters[i].wait' would lead to null-pointer dereference. Therefore, it is better to check the return value of kmalloc_array() to avoid this confusion. Signed-off-by: QintaoShen Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c index 6a3470f84998..732713ff3190 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c @@ -607,6 +607,8 @@ static struct kfd_event_waiter *alloc_event_waiters(uint32_t num_events) event_waiters = kmalloc_array(num_events, sizeof(struct kfd_event_waiter), GFP_KERNEL); + if (!event_waiters) + return NULL; for (i = 0; (event_waiters) && (i < num_events) ; i++) { INIT_LIST_HEAD(&event_waiters[i].waiters); -- 2.35.1
[PATCH AUTOSEL 4.14 02/11] drm/amdkfd: Check for potential null return of kmalloc_array()
From: QintaoShen [ Upstream commit ebbb7bb9e80305820dc2328a371c1b35679f2667 ] As the kmalloc_array() may return null, the 'event_waiters[i].wait' would lead to null-pointer dereference. Therefore, it is better to check the return value of kmalloc_array() to avoid this confusion. Signed-off-by: QintaoShen Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c index 944abfad39c1..1d8dd81dfc70 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c @@ -607,6 +607,8 @@ static struct kfd_event_waiter *alloc_event_waiters(uint32_t num_events) event_waiters = kmalloc_array(num_events, sizeof(struct kfd_event_waiter), GFP_KERNEL); + if (!event_waiters) + return NULL; for (i = 0; (event_waiters) && (i < num_events) ; i++) { INIT_LIST_HEAD(&event_waiters[i].waiters); -- 2.35.1
[PATCH AUTOSEL 4.14 01/11] drm/amd: Add USBC connector ID
From: Aurabindo Pillai [ Upstream commit c5c948aa894a831f96fccd025e47186b1ee41615 ] [Why&How] Add a dedicated AMDGPU specific ID for use with newer ASICs that support USB-C output Signed-off-by: Aurabindo Pillai Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/ObjectID.h | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/ObjectID.h b/drivers/gpu/drm/amd/amdgpu/ObjectID.h index 06192698bd96..c90567de8bf7 100644 --- a/drivers/gpu/drm/amd/amdgpu/ObjectID.h +++ b/drivers/gpu/drm/amd/amdgpu/ObjectID.h @@ -119,6 +119,7 @@ #define CONNECTOR_OBJECT_ID_eDP 0x14 #define CONNECTOR_OBJECT_ID_MXM 0x15 #define CONNECTOR_OBJECT_ID_LVDS_eDP 0x16 +#define CONNECTOR_OBJECT_ID_USBC 0x17 /* deleted */ -- 2.35.1
[PATCH AUTOSEL 4.19 10/12] drm/amd/display: Fix allocate_mst_payload assert on resume
From: Roman Li [ Upstream commit f4346fb3edf7720db3f7f5e1cab1f667cd024280 ] [Why] On resume we do link detection for all non-MST connectors. MST is handled separately. However the condition for telling if connector is on mst branch is not enough for mst hub case. Link detection for mst branch link leads to mst topology reset. That causes assert in dc_link_allocate_mst_payload() [How] Use link type as indicator for mst link. Reviewed-by: Wayne Lin Acked-by: Alex Hung Signed-off-by: Roman Li Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index ed02bb6b2cd0..1d1183c47beb 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -777,7 +777,8 @@ static int dm_resume(void *handle) * this is the case when traversing through already created * MST connectors, should be skipped */ - if (aconnector->mst_port) + if (aconnector->dc_link && + aconnector->dc_link->type == dc_connection_mst_branch) continue; mutex_lock(&aconnector->hpd_lock); -- 2.35.1
[PATCH AUTOSEL 4.19 02/12] drm/amdkfd: Check for potential null return of kmalloc_array()
From: QintaoShen [ Upstream commit ebbb7bb9e80305820dc2328a371c1b35679f2667 ] As the kmalloc_array() may return null, the 'event_waiters[i].wait' would lead to null-pointer dereference. Therefore, it is better to check the return value of kmalloc_array() to avoid this confusion. Signed-off-by: QintaoShen Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c index e9f0e0a1b41c..892077377339 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c @@ -532,6 +532,8 @@ static struct kfd_event_waiter *alloc_event_waiters(uint32_t num_events) event_waiters = kmalloc_array(num_events, sizeof(struct kfd_event_waiter), GFP_KERNEL); + if (!event_waiters) + return NULL; for (i = 0; (event_waiters) && (i < num_events) ; i++) { init_wait(&event_waiters[i].wait); -- 2.35.1
[PATCH AUTOSEL 4.19 01/12] drm/amd: Add USBC connector ID
From: Aurabindo Pillai [ Upstream commit c5c948aa894a831f96fccd025e47186b1ee41615 ] [Why&How] Add a dedicated AMDGPU specific ID for use with newer ASICs that support USB-C output Signed-off-by: Aurabindo Pillai Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/ObjectID.h | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/ObjectID.h b/drivers/gpu/drm/amd/amdgpu/ObjectID.h index 5b393622f592..a0f0a17e224f 100644 --- a/drivers/gpu/drm/amd/amdgpu/ObjectID.h +++ b/drivers/gpu/drm/amd/amdgpu/ObjectID.h @@ -119,6 +119,7 @@ #define CONNECTOR_OBJECT_ID_eDP 0x14 #define CONNECTOR_OBJECT_ID_MXM 0x15 #define CONNECTOR_OBJECT_ID_LVDS_eDP 0x16 +#define CONNECTOR_OBJECT_ID_USBC 0x17 /* deleted */ -- 2.35.1
[PATCH AUTOSEL 5.4 16/21] drm/amd/display: Fix allocate_mst_payload assert on resume
From: Roman Li [ Upstream commit f4346fb3edf7720db3f7f5e1cab1f667cd024280 ] [Why] On resume we do link detection for all non-MST connectors. MST is handled separately. However the condition for telling if connector is on mst branch is not enough for mst hub case. Link detection for mst branch link leads to mst topology reset. That causes assert in dc_link_allocate_mst_payload() [How] Use link type as indicator for mst link. Reviewed-by: Wayne Lin Acked-by: Alex Hung Signed-off-by: Roman Li Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 8e4d863c7570..a83d33a92650 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -1210,7 +1210,8 @@ static int dm_resume(void *handle) * this is the case when traversing through already created * MST connectors, should be skipped */ - if (aconnector->mst_port) + if (aconnector->dc_link && + aconnector->dc_link->type == dc_connection_mst_branch) continue; mutex_lock(&aconnector->hpd_lock); -- 2.35.1
[PATCH AUTOSEL 5.4 05/21] drm/amdkfd: Check for potential null return of kmalloc_array()
From: QintaoShen [ Upstream commit ebbb7bb9e80305820dc2328a371c1b35679f2667 ] As the kmalloc_array() may return null, the 'event_waiters[i].wait' would lead to null-pointer dereference. Therefore, it is better to check the return value of kmalloc_array() to avoid this confusion. Signed-off-by: QintaoShen Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c index d674d4b3340f..adbb2fec2e0f 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c @@ -532,6 +532,8 @@ static struct kfd_event_waiter *alloc_event_waiters(uint32_t num_events) event_waiters = kmalloc_array(num_events, sizeof(struct kfd_event_waiter), GFP_KERNEL); + if (!event_waiters) + return NULL; for (i = 0; (event_waiters) && (i < num_events) ; i++) { init_wait(&event_waiters[i].wait); -- 2.35.1
[PATCH AUTOSEL 5.4 04/21] drm/amdkfd: Fix Incorrect VMIDs passed to HWS
From: Tushar Patel [ Upstream commit b7dfbd2e601f3fee545bc158feceba4f340fe7cf ] Compute-only GPUs have more than 8 VMIDs allocated to KFD. Fix this by passing correct number of VMIDs to HWS v2: squash in warning fix (Alex) Signed-off-by: Tushar Patel Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 11 +++ 2 files changed, 4 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index e8e172010416..ffd754713522 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -633,7 +633,7 @@ MODULE_PARM_DESC(sched_policy, * Maximum number of processes that HWS can schedule concurrently. The maximum is the * number of VMIDs assigned to the HWS, which is also the default. */ -int hws_max_conc_proc = 8; +int hws_max_conc_proc = -1; module_param(hws_max_conc_proc, int, 0444); MODULE_PARM_DESC(hws_max_conc_proc, "Max # processes HWS can execute concurrently when sched_policy=0 (0 = no concurrency, #VMIDs for KFD = Maximum(default))"); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index ad9483b9eea3..60ee1a832112 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -609,15 +609,10 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd, - kfd->vm_info.first_vmid_kfd + 1; /* Verify module parameters regarding mapped process number*/ - if ((hws_max_conc_proc < 0) - || (hws_max_conc_proc > kfd->vm_info.vmid_num_kfd)) { - dev_err(kfd_device, - "hws_max_conc_proc %d must be between 0 and %d, use %d instead\n", - hws_max_conc_proc, kfd->vm_info.vmid_num_kfd, - kfd->vm_info.vmid_num_kfd); + if (hws_max_conc_proc >= 0) + kfd->max_proc_per_quantum = min((u32)hws_max_conc_proc, kfd->vm_info.vmid_num_kfd); + else kfd->max_proc_per_quantum = kfd->vm_info.vmid_num_kfd; - } else - kfd->max_proc_per_quantum = hws_max_conc_proc; /* Allocate global GWS that is shared by all KFD processes */ if (hws_gws_support && amdgpu_amdkfd_alloc_gws(kfd->kgd, -- 2.35.1
[PATCH AUTOSEL 5.4 03/21] drm/amd/display: Update VTEM Infopacket definition
From: "Leo (Hanghong) Ma" [ Upstream commit c9fbf6435162ed5fb7201d1d4adf6585c6a8c327 ] [Why & How] The latest HDMI SPEC has updated the VTEM packet structure, so change the VTEM Infopacket defined in the driver side to align with the SPEC. Reviewed-by: Chris Park Acked-by: Alex Hung Signed-off-by: Leo (Hanghong) Ma Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- .../gpu/drm/amd/display/modules/info_packet/info_packet.c| 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c b/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c index d885d642ed7f..537736713598 100644 --- a/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c +++ b/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c @@ -85,7 +85,8 @@ //PB7 = MD0 #define MASK_VTEM_MD0__VRR_EN 0x01 #define MASK_VTEM_MD0__M_CONST0x02 -#define MASK_VTEM_MD0__RESERVED2 0x0C +#define MASK_VTEM_MD0__QMS_EN 0x04 +#define MASK_VTEM_MD0__RESERVED2 0x08 #define MASK_VTEM_MD0__FVA_FACTOR_M1 0xF0 //MD1 @@ -94,7 +95,7 @@ //MD2 #define MASK_VTEM_MD2__BASE_REFRESH_RATE_98 0x03 #define MASK_VTEM_MD2__RB0x04 -#define MASK_VTEM_MD2__RESERVED3 0xF8 +#define MASK_VTEM_MD2__NEXT_TFR 0xF8 //MD3 #define MASK_VTEM_MD3__BASE_REFRESH_RATE_07 0xFF -- 2.35.1
[PATCH AUTOSEL 5.4 02/21] drm/amd/display: fix audio format not updated after edid updated
From: Charlene Liu [ Upstream commit 5e8a71cf13bc9184fee915b2220be71b4c6cac74 ] [why] for the case edid change only changed audio format. driver still need to update stream. Reviewed-by: Alvin Lee Reviewed-by: Aric Cyr Acked-by: Alex Hung Signed-off-by: Charlene Liu Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c index 95a5310e9e66..de246e183d6b 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c @@ -1546,8 +1546,8 @@ bool dc_is_stream_unchanged( if (old_stream->ignore_msa_timing_param != stream->ignore_msa_timing_param) return false; - // Only Have Audio left to check whether it is same or not. This is a corner case for Tiled sinks - if (old_stream->audio_info.mode_count != stream->audio_info.mode_count) + /*compare audio info*/ + if (memcmp(&old_stream->audio_info, &stream->audio_info, sizeof(stream->audio_info)) != 0) return false; return true; -- 2.35.1
[PATCH AUTOSEL 5.4 01/21] drm/amd: Add USBC connector ID
From: Aurabindo Pillai [ Upstream commit c5c948aa894a831f96fccd025e47186b1ee41615 ] [Why&How] Add a dedicated AMDGPU specific ID for use with newer ASICs that support USB-C output Signed-off-by: Aurabindo Pillai Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/ObjectID.h | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/ObjectID.h b/drivers/gpu/drm/amd/amdgpu/ObjectID.h index 5b393622f592..a0f0a17e224f 100644 --- a/drivers/gpu/drm/amd/amdgpu/ObjectID.h +++ b/drivers/gpu/drm/amd/amdgpu/ObjectID.h @@ -119,6 +119,7 @@ #define CONNECTOR_OBJECT_ID_eDP 0x14 #define CONNECTOR_OBJECT_ID_MXM 0x15 #define CONNECTOR_OBJECT_ID_LVDS_eDP 0x16 +#define CONNECTOR_OBJECT_ID_USBC 0x17 /* deleted */ -- 2.35.1
[PATCH AUTOSEL 5.10 24/30] drm/amd/display: Fix allocate_mst_payload assert on resume
From: Roman Li [ Upstream commit f4346fb3edf7720db3f7f5e1cab1f667cd024280 ] [Why] On resume we do link detection for all non-MST connectors. MST is handled separately. However the condition for telling if connector is on mst branch is not enough for mst hub case. Link detection for mst branch link leads to mst topology reset. That causes assert in dc_link_allocate_mst_payload() [How] Use link type as indicator for mst link. Reviewed-by: Wayne Lin Acked-by: Alex Hung Signed-off-by: Roman Li Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index e828f9414ba2..7bb151283f44 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -2022,7 +2022,8 @@ static int dm_resume(void *handle) * this is the case when traversing through already created * MST connectors, should be skipped */ - if (aconnector->mst_port) + if (aconnector->dc_link && + aconnector->dc_link->type == dc_connection_mst_branch) continue; mutex_lock(&aconnector->hpd_lock); -- 2.35.1
[PATCH AUTOSEL 5.10 23/30] drm/amd/display: Revert FEC check in validation
From: Martin Leung [ Upstream commit b2075fce104b88b789c15ef1ed2b91dc94198e26 ] why and how: causes failure on install on certain machines Reviewed-by: George Shen Acked-by: Alex Hung Signed-off-by: Martin Leung Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/dc/core/dc.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index ac5323596c65..93f5229c303e 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -1173,10 +1173,6 @@ bool dc_validate_seamless_boot_timing(const struct dc *dc, if (!link->link_enc->funcs->is_dig_enabled(link->link_enc)) return false; - /* Check for FEC status*/ - if (link->link_enc->funcs->fec_is_active(link->link_enc)) - return false; - enc_inst = link->link_enc->funcs->get_dig_frontend(link->link_enc); if (enc_inst == ENGINE_ID_UNKNOWN) -- 2.35.1
[PATCH AUTOSEL 5.10 09/30] drm/amdkfd: Check for potential null return of kmalloc_array()
From: QintaoShen [ Upstream commit ebbb7bb9e80305820dc2328a371c1b35679f2667 ] As the kmalloc_array() may return null, the 'event_waiters[i].wait' would lead to null-pointer dereference. Therefore, it is better to check the return value of kmalloc_array() to avoid this confusion. Signed-off-by: QintaoShen Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c index ba2c2ce0c55a..159be13ef20b 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c @@ -531,6 +531,8 @@ static struct kfd_event_waiter *alloc_event_waiters(uint32_t num_events) event_waiters = kmalloc_array(num_events, sizeof(struct kfd_event_waiter), GFP_KERNEL); + if (!event_waiters) + return NULL; for (i = 0; (event_waiters) && (i < num_events) ; i++) { init_wait(&event_waiters[i].wait); -- 2.35.1
[PATCH AUTOSEL 5.10 08/30] drm/amdgpu/vcn: improve vcn dpg stop procedure
From: Tianci Yin [ Upstream commit 6ea239adc2a712eb318f04f5c29b018ba65ea38a ] Prior to disabling dpg, VCN need unpausing dpg mode, or VCN will hang in S3 resuming. Reviewed-by: James Zhu Signed-off-by: Tianci Yin Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c index 2099f6ebd833..bdb8e596bda6 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c @@ -1429,8 +1429,11 @@ static int vcn_v3_0_start_sriov(struct amdgpu_device *adev) static int vcn_v3_0_stop_dpg_mode(struct amdgpu_device *adev, int inst_idx) { + struct dpg_pause_state state = {.fw_based = VCN_DPG_STATE__UNPAUSE}; uint32_t tmp; + vcn_v3_0_pause_dpg_mode(adev, 0, &state); + /* Wait for power status to be 1 */ SOC15_WAIT_ON_RREG(VCN, inst_idx, mmUVD_POWER_STATUS, 1, UVD_POWER_STATUS__UVD_POWER_STATUS_MASK); -- 2.35.1
[PATCH AUTOSEL 5.10 07/30] drm/amdkfd: Fix Incorrect VMIDs passed to HWS
From: Tushar Patel [ Upstream commit b7dfbd2e601f3fee545bc158feceba4f340fe7cf ] Compute-only GPUs have more than 8 VMIDs allocated to KFD. Fix this by passing correct number of VMIDs to HWS v2: squash in warning fix (Alex) Signed-off-by: Tushar Patel Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 11 +++ 2 files changed, 4 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index ed13a2f76884..30659c1776e8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -632,7 +632,7 @@ MODULE_PARM_DESC(sched_policy, * Maximum number of processes that HWS can schedule concurrently. The maximum is the * number of VMIDs assigned to the HWS, which is also the default. */ -int hws_max_conc_proc = 8; +int hws_max_conc_proc = -1; module_param(hws_max_conc_proc, int, 0444); MODULE_PARM_DESC(hws_max_conc_proc, "Max # processes HWS can execute concurrently when sched_policy=0 (0 = no concurrency, #VMIDs for KFD = Maximum(default))"); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 84313135c2ea..148e43dee657 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -664,15 +664,10 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd, - kfd->vm_info.first_vmid_kfd + 1; /* Verify module parameters regarding mapped process number*/ - if ((hws_max_conc_proc < 0) - || (hws_max_conc_proc > kfd->vm_info.vmid_num_kfd)) { - dev_err(kfd_device, - "hws_max_conc_proc %d must be between 0 and %d, use %d instead\n", - hws_max_conc_proc, kfd->vm_info.vmid_num_kfd, - kfd->vm_info.vmid_num_kfd); + if (hws_max_conc_proc >= 0) + kfd->max_proc_per_quantum = min((u32)hws_max_conc_proc, kfd->vm_info.vmid_num_kfd); + else kfd->max_proc_per_quantum = kfd->vm_info.vmid_num_kfd; - } else - kfd->max_proc_per_quantum = hws_max_conc_proc; /* calculate max size of mqds needed for queues */ size = max_num_of_queues_per_device * -- 2.35.1
[PATCH AUTOSEL 5.10 06/30] drm/amd/display: Update VTEM Infopacket definition
From: "Leo (Hanghong) Ma" [ Upstream commit c9fbf6435162ed5fb7201d1d4adf6585c6a8c327 ] [Why & How] The latest HDMI SPEC has updated the VTEM packet structure, so change the VTEM Infopacket defined in the driver side to align with the SPEC. Reviewed-by: Chris Park Acked-by: Alex Hung Signed-off-by: Leo (Hanghong) Ma Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- .../gpu/drm/amd/display/modules/info_packet/info_packet.c| 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c b/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c index 0fdf7a3e96de..96e18050a617 100644 --- a/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c +++ b/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c @@ -100,7 +100,8 @@ enum vsc_packet_revision { //PB7 = MD0 #define MASK_VTEM_MD0__VRR_EN 0x01 #define MASK_VTEM_MD0__M_CONST0x02 -#define MASK_VTEM_MD0__RESERVED2 0x0C +#define MASK_VTEM_MD0__QMS_EN 0x04 +#define MASK_VTEM_MD0__RESERVED2 0x08 #define MASK_VTEM_MD0__FVA_FACTOR_M1 0xF0 //MD1 @@ -109,7 +110,7 @@ enum vsc_packet_revision { //MD2 #define MASK_VTEM_MD2__BASE_REFRESH_RATE_98 0x03 #define MASK_VTEM_MD2__RB0x04 -#define MASK_VTEM_MD2__RESERVED3 0xF8 +#define MASK_VTEM_MD2__NEXT_TFR 0xF8 //MD3 #define MASK_VTEM_MD3__BASE_REFRESH_RATE_07 0xFF -- 2.35.1
[PATCH AUTOSEL 5.10 05/30] drm/amd/display: FEC check in timing validation
From: Chiawen Huang [ Upstream commit 7d56a154e22ffb3613fdebf83ec34d5225a22993 ] [Why] disable/enable leads FEC mismatch between hw/sw FEC state. [How] check FEC status to fastboot on/off. Reviewed-by: Anthony Koo Acked-by: Alex Hung Signed-off-by: Chiawen Huang Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/dc/core/dc.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index 93f5229c303e..ac5323596c65 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -1173,6 +1173,10 @@ bool dc_validate_seamless_boot_timing(const struct dc *dc, if (!link->link_enc->funcs->is_dig_enabled(link->link_enc)) return false; + /* Check for FEC status*/ + if (link->link_enc->funcs->fec_is_active(link->link_enc)) + return false; + enc_inst = link->link_enc->funcs->get_dig_frontend(link->link_enc); if (enc_inst == ENGINE_ID_UNKNOWN) -- 2.35.1
[PATCH AUTOSEL 5.10 04/30] drm/amd/display: fix audio format not updated after edid updated
From: Charlene Liu [ Upstream commit 5e8a71cf13bc9184fee915b2220be71b4c6cac74 ] [why] for the case edid change only changed audio format. driver still need to update stream. Reviewed-by: Alvin Lee Reviewed-by: Aric Cyr Acked-by: Alex Hung Signed-off-by: Charlene Liu Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c index 5f4cdb05c4db..a21f1141fbb0 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c @@ -1698,8 +1698,8 @@ bool dc_is_stream_unchanged( if (old_stream->ignore_msa_timing_param != stream->ignore_msa_timing_param) return false; - // Only Have Audio left to check whether it is same or not. This is a corner case for Tiled sinks - if (old_stream->audio_info.mode_count != stream->audio_info.mode_count) + /*compare audio info*/ + if (memcmp(&old_stream->audio_info, &stream->audio_info, sizeof(stream->audio_info)) != 0) return false; return true; -- 2.35.1
[PATCH AUTOSEL 5.10 01/30] drm/amd: Add USBC connector ID
From: Aurabindo Pillai [ Upstream commit c5c948aa894a831f96fccd025e47186b1ee41615 ] [Why&How] Add a dedicated AMDGPU specific ID for use with newer ASICs that support USB-C output Signed-off-by: Aurabindo Pillai Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/ObjectID.h | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/ObjectID.h b/drivers/gpu/drm/amd/amdgpu/ObjectID.h index 5b393622f592..a0f0a17e224f 100644 --- a/drivers/gpu/drm/amd/amdgpu/ObjectID.h +++ b/drivers/gpu/drm/amd/amdgpu/ObjectID.h @@ -119,6 +119,7 @@ #define CONNECTOR_OBJECT_ID_eDP 0x14 #define CONNECTOR_OBJECT_ID_MXM 0x15 #define CONNECTOR_OBJECT_ID_LVDS_eDP 0x16 +#define CONNECTOR_OBJECT_ID_USBC 0x17 /* deleted */ -- 2.35.1
[PATCH AUTOSEL 5.15 31/41] drm/amd/display: Fix allocate_mst_payload assert on resume
From: Roman Li [ Upstream commit f4346fb3edf7720db3f7f5e1cab1f667cd024280 ] [Why] On resume we do link detection for all non-MST connectors. MST is handled separately. However the condition for telling if connector is on mst branch is not enough for mst hub case. Link detection for mst branch link leads to mst topology reset. That causes assert in dc_link_allocate_mst_payload() [How] Use link type as indicator for mst link. Reviewed-by: Wayne Lin Acked-by: Alex Hung Signed-off-by: Roman Li Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index b9859e52ad92..091524bdc673 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -2296,7 +2296,8 @@ static int dm_resume(void *handle) * this is the case when traversing through already created * MST connectors, should be skipped */ - if (aconnector->mst_port) + if (aconnector->dc_link && + aconnector->dc_link->type == dc_connection_mst_branch) continue; mutex_lock(&aconnector->hpd_lock); -- 2.35.1
[PATCH AUTOSEL 5.15 30/41] drm/amd/display: Revert FEC check in validation
From: Martin Leung [ Upstream commit b2075fce104b88b789c15ef1ed2b91dc94198e26 ] why and how: causes failure on install on certain machines Reviewed-by: George Shen Acked-by: Alex Hung Signed-off-by: Martin Leung Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/dc/core/dc.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index 35a27fe48f66..b37c4d2e7a1e 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -1377,10 +1377,6 @@ bool dc_validate_seamless_boot_timing(const struct dc *dc, if (!link->link_enc->funcs->is_dig_enabled(link->link_enc)) return false; - /* Check for FEC status*/ - if (link->link_enc->funcs->fec_is_active(link->link_enc)) - return false; - enc_inst = link->link_enc->funcs->get_dig_frontend(link->link_enc); if (enc_inst == ENGINE_ID_UNKNOWN) -- 2.35.1
[PATCH AUTOSEL 5.15 29/41] drm/amd/display: Enable power gating before init_pipes
From: Roman Li [ Upstream commit 58e16c752e9540b28a873c44c3bee83e022007c1 ] [Why] In init_hw() we call init_pipes() before enabling power gating. init_pipes() tries to power gate dsc but it may fail because required force-ons are not released yet. As a result with dsc config the following errors observed on resume: "REG_WAIT timeout 1us * 1000 tries - dcn20_dsc_pg_control" "REG_WAIT timeout 1us * 1000 tries - dcn20_dpp_pg_control" "REG_WAIT timeout 1us * 1000 tries - dcn20_hubp_pg_control" [How] Move enable_power_gating_plane() before init_pipes() in init_hw() Reviewed-by: Anthony Koo Reviewed-by: Eric Yang Acked-by: Alex Hung Signed-off-by: Roman Li Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c | 5 +++-- drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c| 5 +++-- drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c| 5 +++-- 3 files changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c index 3af49cdf89eb..174dd149fee7 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c +++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c @@ -1435,6 +1435,9 @@ void dcn10_init_hw(struct dc *dc) } } + if (hws->funcs.enable_power_gating_plane) + hws->funcs.enable_power_gating_plane(dc->hwseq, true); + /* If taking control over from VBIOS, we may want to optimize our first * mode set, so we need to skip powering down pipes until we know which * pipes we want to use. @@ -1487,8 +1490,6 @@ void dcn10_init_hw(struct dc *dc) REG_UPDATE(DCFCLK_CNTL, DCFCLK_GATE_DIS, 0); } - if (hws->funcs.enable_power_gating_plane) - hws->funcs.enable_power_gating_plane(dc->hwseq, true); if (dc->clk_mgr->funcs->notify_wm_ranges) dc->clk_mgr->funcs->notify_wm_ranges(dc->clk_mgr); diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c index 0950784bafa4..f83457375811 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c @@ -570,6 +570,9 @@ void dcn30_init_hw(struct dc *dc) } } + if (hws->funcs.enable_power_gating_plane) + hws->funcs.enable_power_gating_plane(dc->hwseq, true); + /* If taking control over from VBIOS, we may want to optimize our first * mode set, so we need to skip powering down pipes until we know which * pipes we want to use. @@ -647,8 +650,6 @@ void dcn30_init_hw(struct dc *dc) REG_UPDATE(DCFCLK_CNTL, DCFCLK_GATE_DIS, 0); } - if (hws->funcs.enable_power_gating_plane) - hws->funcs.enable_power_gating_plane(dc->hwseq, true); if (!dcb->funcs->is_accelerated_mode(dcb) && dc->res_pool->hubbub->funcs->init_watermarks) dc->res_pool->hubbub->funcs->init_watermarks(dc->res_pool->hubbub); diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c index 3afa1159a5f7..b72d080b302a 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c @@ -204,6 +204,9 @@ void dcn31_init_hw(struct dc *dc) } } + if (hws->funcs.enable_power_gating_plane) + hws->funcs.enable_power_gating_plane(dc->hwseq, true); + /* If taking control over from VBIOS, we may want to optimize our first * mode set, so we need to skip powering down pipes until we know which * pipes we want to use. @@ -287,8 +290,6 @@ void dcn31_init_hw(struct dc *dc) REG_UPDATE(DCFCLK_CNTL, DCFCLK_GATE_DIS, 0); } - if (hws->funcs.enable_power_gating_plane) - hws->funcs.enable_power_gating_plane(dc->hwseq, true); if (!dcb->funcs->is_accelerated_mode(dcb) && dc->res_pool->hubbub->funcs->init_watermarks) dc->res_pool->hubbub->funcs->init_watermarks(dc->res_pool->hubbub); -- 2.35.1
[PATCH AUTOSEL 5.15 11/41] drm/amdkfd: Check for potential null return of kmalloc_array()
From: QintaoShen [ Upstream commit ebbb7bb9e80305820dc2328a371c1b35679f2667 ] As the kmalloc_array() may return null, the 'event_waiters[i].wait' would lead to null-pointer dereference. Therefore, it is better to check the return value of kmalloc_array() to avoid this confusion. Signed-off-by: QintaoShen Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c index 3eea4edee355..b8bdd796cd91 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c @@ -531,6 +531,8 @@ static struct kfd_event_waiter *alloc_event_waiters(uint32_t num_events) event_waiters = kmalloc_array(num_events, sizeof(struct kfd_event_waiter), GFP_KERNEL); + if (!event_waiters) + return NULL; for (i = 0; (event_waiters) && (i < num_events) ; i++) { init_wait(&event_waiters[i].wait); -- 2.35.1
[PATCH AUTOSEL 5.15 10/41] drm/amdgpu/vcn: improve vcn dpg stop procedure
From: Tianci Yin [ Upstream commit 6ea239adc2a712eb318f04f5c29b018ba65ea38a ] Prior to disabling dpg, VCN need unpausing dpg mode, or VCN will hang in S3 resuming. Reviewed-by: James Zhu Signed-off-by: Tianci Yin Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c index 3d18aab88b4e..a026b2eaec21 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c @@ -1508,8 +1508,11 @@ static int vcn_v3_0_start_sriov(struct amdgpu_device *adev) static int vcn_v3_0_stop_dpg_mode(struct amdgpu_device *adev, int inst_idx) { + struct dpg_pause_state state = {.fw_based = VCN_DPG_STATE__UNPAUSE}; uint32_t tmp; + vcn_v3_0_pause_dpg_mode(adev, 0, &state); + /* Wait for power status to be 1 */ SOC15_WAIT_ON_RREG(VCN, inst_idx, mmUVD_POWER_STATUS, 1, UVD_POWER_STATUS__UVD_POWER_STATUS_MASK); -- 2.35.1
[PATCH AUTOSEL 5.15 09/41] drm/amdkfd: Fix Incorrect VMIDs passed to HWS
From: Tushar Patel [ Upstream commit b7dfbd2e601f3fee545bc158feceba4f340fe7cf ] Compute-only GPUs have more than 8 VMIDs allocated to KFD. Fix this by passing correct number of VMIDs to HWS v2: squash in warning fix (Alex) Signed-off-by: Tushar Patel Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 11 +++ 2 files changed, 4 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 5a7fef324c82..b517b76e96a1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -680,7 +680,7 @@ MODULE_PARM_DESC(sched_policy, * Maximum number of processes that HWS can schedule concurrently. The maximum is the * number of VMIDs assigned to the HWS, which is also the default. */ -int hws_max_conc_proc = 8; +int hws_max_conc_proc = -1; module_param(hws_max_conc_proc, int, 0444); MODULE_PARM_DESC(hws_max_conc_proc, "Max # processes HWS can execute concurrently when sched_policy=0 (0 = no concurrency, #VMIDs for KFD = Maximum(default))"); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 88c483f69989..660eb7097cfc 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -834,15 +834,10 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd, } /* Verify module parameters regarding mapped process number*/ - if ((hws_max_conc_proc < 0) - || (hws_max_conc_proc > kfd->vm_info.vmid_num_kfd)) { - dev_err(kfd_device, - "hws_max_conc_proc %d must be between 0 and %d, use %d instead\n", - hws_max_conc_proc, kfd->vm_info.vmid_num_kfd, - kfd->vm_info.vmid_num_kfd); + if (hws_max_conc_proc >= 0) + kfd->max_proc_per_quantum = min((u32)hws_max_conc_proc, kfd->vm_info.vmid_num_kfd); + else kfd->max_proc_per_quantum = kfd->vm_info.vmid_num_kfd; - } else - kfd->max_proc_per_quantum = hws_max_conc_proc; /* calculate max size of mqds needed for queues */ size = max_num_of_queues_per_device * -- 2.35.1
[PATCH AUTOSEL 5.15 08/41] drm/amd/display: Update VTEM Infopacket definition
From: "Leo (Hanghong) Ma" [ Upstream commit c9fbf6435162ed5fb7201d1d4adf6585c6a8c327 ] [Why & How] The latest HDMI SPEC has updated the VTEM packet structure, so change the VTEM Infopacket defined in the driver side to align with the SPEC. Reviewed-by: Chris Park Acked-by: Alex Hung Signed-off-by: Leo (Hanghong) Ma Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- .../gpu/drm/amd/display/modules/info_packet/info_packet.c| 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c b/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c index 57f198de5e2c..4e075b01d48b 100644 --- a/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c +++ b/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c @@ -100,7 +100,8 @@ enum vsc_packet_revision { //PB7 = MD0 #define MASK_VTEM_MD0__VRR_EN 0x01 #define MASK_VTEM_MD0__M_CONST0x02 -#define MASK_VTEM_MD0__RESERVED2 0x0C +#define MASK_VTEM_MD0__QMS_EN 0x04 +#define MASK_VTEM_MD0__RESERVED2 0x08 #define MASK_VTEM_MD0__FVA_FACTOR_M1 0xF0 //MD1 @@ -109,7 +110,7 @@ enum vsc_packet_revision { //MD2 #define MASK_VTEM_MD2__BASE_REFRESH_RATE_98 0x03 #define MASK_VTEM_MD2__RB0x04 -#define MASK_VTEM_MD2__RESERVED3 0xF8 +#define MASK_VTEM_MD2__NEXT_TFR 0xF8 //MD3 #define MASK_VTEM_MD3__BASE_REFRESH_RATE_07 0xFF -- 2.35.1
[PATCH AUTOSEL 5.15 07/41] drm/amd/display: FEC check in timing validation
From: Chiawen Huang [ Upstream commit 7d56a154e22ffb3613fdebf83ec34d5225a22993 ] [Why] disable/enable leads FEC mismatch between hw/sw FEC state. [How] check FEC status to fastboot on/off. Reviewed-by: Anthony Koo Acked-by: Alex Hung Signed-off-by: Chiawen Huang Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/dc/core/dc.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index b37c4d2e7a1e..35a27fe48f66 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -1377,6 +1377,10 @@ bool dc_validate_seamless_boot_timing(const struct dc *dc, if (!link->link_enc->funcs->is_dig_enabled(link->link_enc)) return false; + /* Check for FEC status*/ + if (link->link_enc->funcs->fec_is_active(link->link_enc)) + return false; + enc_inst = link->link_enc->funcs->get_dig_frontend(link->link_enc); if (enc_inst == ENGINE_ID_UNKNOWN) -- 2.35.1
[PATCH AUTOSEL 5.15 06/41] drm/amd/display: fix audio format not updated after edid updated
From: Charlene Liu [ Upstream commit 5e8a71cf13bc9184fee915b2220be71b4c6cac74 ] [why] for the case edid change only changed audio format. driver still need to update stream. Reviewed-by: Alvin Lee Reviewed-by: Aric Cyr Acked-by: Alex Hung Signed-off-by: Charlene Liu Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c index 7ae409f7dcf8..8c285d585249 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c @@ -1623,8 +1623,8 @@ bool dc_is_stream_unchanged( if (old_stream->ignore_msa_timing_param != stream->ignore_msa_timing_param) return false; - // Only Have Audio left to check whether it is same or not. This is a corner case for Tiled sinks - if (old_stream->audio_info.mode_count != stream->audio_info.mode_count) + /*compare audio info*/ + if (memcmp(&old_stream->audio_info, &stream->audio_info, sizeof(stream->audio_info)) != 0) return false; return true; -- 2.35.1
[PATCH AUTOSEL 5.15 05/41] drm/amdgpu/gmc: use PCI BARs for APUs in passthrough
From: Alex Deucher [ Upstream commit b818a5d374542ccec73dcfe578a081574029820e ] If the GPU is passed through to a guest VM, use the PCI BAR for CPU FB access rather than the physical address of carve out. The physical address is not valid in a guest. v2: Fix HDP handing as suggested by Michel Reviewed-by: Christian König Reviewed-by: Michel Dänzer Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 5 +++-- drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 2 +- 5 files changed, 8 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 33026b3eafd2..2f2ae26a8068 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5625,7 +5625,7 @@ void amdgpu_device_flush_hdp(struct amdgpu_device *adev, struct amdgpu_ring *ring) { #ifdef CONFIG_X86_64 - if (adev->flags & AMD_IS_APU) + if ((adev->flags & AMD_IS_APU) && !amdgpu_passthrough(adev)) return; #endif if (adev->gmc.xgmi.connected_to_cpu) @@ -5641,7 +5641,7 @@ void amdgpu_device_invalidate_hdp(struct amdgpu_device *adev, struct amdgpu_ring *ring) { #ifdef CONFIG_X86_64 - if (adev->flags & AMD_IS_APU) + if ((adev->flags & AMD_IS_APU) && !amdgpu_passthrough(adev)) return; #endif if (adev->gmc.xgmi.connected_to_cpu) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c index 3c01be661014..93a4da4284ed 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c @@ -788,7 +788,7 @@ static int gmc_v10_0_mc_init(struct amdgpu_device *adev) adev->gmc.aper_size = pci_resource_len(adev->pdev, 0); #ifdef CONFIG_X86_64 - if (adev->flags & AMD_IS_APU) { + if ((adev->flags & AMD_IS_APU) && !amdgpu_passthrough(adev)) { adev->gmc.aper_base = adev->gfxhub.funcs->get_mc_fb_offset(adev); adev->gmc.aper_size = adev->gmc.real_vram_size; } diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c index 0a50fdaced7e..63c47f61d0df 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c @@ -381,8 +381,9 @@ static int gmc_v7_0_mc_init(struct amdgpu_device *adev) adev->gmc.aper_size = pci_resource_len(adev->pdev, 0); #ifdef CONFIG_X86_64 - if (adev->flags & AMD_IS_APU && - adev->gmc.real_vram_size > adev->gmc.aper_size) { + if ((adev->flags & AMD_IS_APU) && + adev->gmc.real_vram_size > adev->gmc.aper_size && + !amdgpu_passthrough(adev)) { adev->gmc.aper_base = ((u64)RREG32(mmMC_VM_FB_OFFSET)) << 22; adev->gmc.aper_size = adev->gmc.real_vram_size; } diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c index 63b890f1e8af..bef9610084f1 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c @@ -581,7 +581,7 @@ static int gmc_v8_0_mc_init(struct amdgpu_device *adev) adev->gmc.aper_size = pci_resource_len(adev->pdev, 0); #ifdef CONFIG_X86_64 - if (adev->flags & AMD_IS_APU) { + if ((adev->flags & AMD_IS_APU) && !amdgpu_passthrough(adev)) { adev->gmc.aper_base = ((u64)RREG32(mmMC_VM_FB_OFFSET)) << 22; adev->gmc.aper_size = adev->gmc.real_vram_size; } diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index 6dc16ccf6c81..0e731016921b 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -1387,7 +1387,7 @@ static int gmc_v9_0_mc_init(struct amdgpu_device *adev) */ /* check whether both host-gpu and gpu-gpu xgmi links exist */ - if ((adev->flags & AMD_IS_APU) || + if (((adev->flags & AMD_IS_APU) && !amdgpu_passthrough(adev)) || (adev->gmc.xgmi.supported && adev->gmc.xgmi.connected_to_cpu)) { adev->gmc.aper_base = -- 2.35.1
[PATCH AUTOSEL 5.15 04/41] drm/amdgpu: conduct a proper cleanup of PDB bo
From: Guchun Chen [ Upstream commit 2d505453f38e18d42ba7d5428aaa17aaa7752c65 ] Use amdgpu_bo_free_kernel instead of amdgpu_bo_unref to perform a proper cleanup of PDB bo. v2: update subject to be more accurate Signed-off-by: Guchun Chen Reviewed-by: Christian König Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index c67e21244342..6dc16ccf6c81 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -1652,7 +1652,7 @@ static int gmc_v9_0_sw_fini(void *handle) amdgpu_gem_force_release(adev); amdgpu_vm_manager_fini(adev); amdgpu_gart_table_vram_free(adev); - amdgpu_bo_unref(&adev->gmc.pdb0_bo); + amdgpu_bo_free_kernel(&adev->gmc.pdb0_bo, NULL, &adev->gmc.ptr_pdb0); amdgpu_bo_fini(adev); return 0; -- 2.35.1
[PATCH AUTOSEL 5.15 01/41] drm/amd: Add USBC connector ID
From: Aurabindo Pillai [ Upstream commit c5c948aa894a831f96fccd025e47186b1ee41615 ] [Why&How] Add a dedicated AMDGPU specific ID for use with newer ASICs that support USB-C output Signed-off-by: Aurabindo Pillai Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/ObjectID.h | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/ObjectID.h b/drivers/gpu/drm/amd/amdgpu/ObjectID.h index 5b393622f592..a0f0a17e224f 100644 --- a/drivers/gpu/drm/amd/amdgpu/ObjectID.h +++ b/drivers/gpu/drm/amd/amdgpu/ObjectID.h @@ -119,6 +119,7 @@ #define CONNECTOR_OBJECT_ID_eDP 0x14 #define CONNECTOR_OBJECT_ID_MXM 0x15 #define CONNECTOR_OBJECT_ID_LVDS_eDP 0x16 +#define CONNECTOR_OBJECT_ID_USBC 0x17 /* deleted */ -- 2.35.1
[PATCH AUTOSEL 5.17 38/49] drm/amd/display: Fix allocate_mst_payload assert on resume
From: Roman Li [ Upstream commit f4346fb3edf7720db3f7f5e1cab1f667cd024280 ] [Why] On resume we do link detection for all non-MST connectors. MST is handled separately. However the condition for telling if connector is on mst branch is not enough for mst hub case. Link detection for mst branch link leads to mst topology reset. That causes assert in dc_link_allocate_mst_payload() [How] Use link type as indicator for mst link. Reviewed-by: Wayne Lin Acked-by: Alex Hung Signed-off-by: Roman Li Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index b28b5c490860..4ba173b47617 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -2693,7 +2693,8 @@ static int dm_resume(void *handle) * this is the case when traversing through already created * MST connectors, should be skipped */ - if (aconnector->mst_port) + if (aconnector->dc_link && + aconnector->dc_link->type == dc_connection_mst_branch) continue; mutex_lock(&aconnector->hpd_lock); -- 2.35.1
[PATCH AUTOSEL 5.17 37/49] drm/amd/display: Revert FEC check in validation
From: Martin Leung [ Upstream commit b2075fce104b88b789c15ef1ed2b91dc94198e26 ] why and how: causes failure on install on certain machines Reviewed-by: George Shen Acked-by: Alex Hung Signed-off-by: Martin Leung Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/dc/core/dc.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index a62711d71d0e..ba1aa994db4b 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -1493,10 +1493,6 @@ bool dc_validate_seamless_boot_timing(const struct dc *dc, if (!link->link_enc->funcs->is_dig_enabled(link->link_enc)) return false; - /* Check for FEC status*/ - if (link->link_enc->funcs->fec_is_active(link->link_enc)) - return false; - enc_inst = link->link_enc->funcs->get_dig_frontend(link->link_enc); if (enc_inst == ENGINE_ID_UNKNOWN) -- 2.35.1
[PATCH AUTOSEL 5.17 36/49] drm/amd/display: Enable power gating before init_pipes
From: Roman Li [ Upstream commit 58e16c752e9540b28a873c44c3bee83e022007c1 ] [Why] In init_hw() we call init_pipes() before enabling power gating. init_pipes() tries to power gate dsc but it may fail because required force-ons are not released yet. As a result with dsc config the following errors observed on resume: "REG_WAIT timeout 1us * 1000 tries - dcn20_dsc_pg_control" "REG_WAIT timeout 1us * 1000 tries - dcn20_dpp_pg_control" "REG_WAIT timeout 1us * 1000 tries - dcn20_hubp_pg_control" [How] Move enable_power_gating_plane() before init_pipes() in init_hw() Reviewed-by: Anthony Koo Reviewed-by: Eric Yang Acked-by: Alex Hung Signed-off-by: Roman Li Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c | 5 +++-- drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c| 5 +++-- drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c| 5 +++-- 3 files changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c index 530a72e3eefe..39d5ceb9a373 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c +++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c @@ -1508,6 +1508,9 @@ void dcn10_init_hw(struct dc *dc) if (dc->config.power_down_display_on_boot) dc_link_blank_all_dp_displays(dc); + if (hws->funcs.enable_power_gating_plane) + hws->funcs.enable_power_gating_plane(dc->hwseq, true); + /* If taking control over from VBIOS, we may want to optimize our first * mode set, so we need to skip powering down pipes until we know which * pipes we want to use. @@ -1560,8 +1563,6 @@ void dcn10_init_hw(struct dc *dc) REG_UPDATE(DCFCLK_CNTL, DCFCLK_GATE_DIS, 0); } - if (hws->funcs.enable_power_gating_plane) - hws->funcs.enable_power_gating_plane(dc->hwseq, true); if (dc->clk_mgr->funcs->notify_wm_ranges) dc->clk_mgr->funcs->notify_wm_ranges(dc->clk_mgr); diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c index 1db1ca19411d..05dc0a3ae2a3 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c @@ -548,6 +548,9 @@ void dcn30_init_hw(struct dc *dc) if (dc->config.power_down_display_on_boot) dc_link_blank_all_dp_displays(dc); + if (hws->funcs.enable_power_gating_plane) + hws->funcs.enable_power_gating_plane(dc->hwseq, true); + /* If taking control over from VBIOS, we may want to optimize our first * mode set, so we need to skip powering down pipes until we know which * pipes we want to use. @@ -625,8 +628,6 @@ void dcn30_init_hw(struct dc *dc) REG_UPDATE(DCFCLK_CNTL, DCFCLK_GATE_DIS, 0); } - if (hws->funcs.enable_power_gating_plane) - hws->funcs.enable_power_gating_plane(dc->hwseq, true); if (!dcb->funcs->is_accelerated_mode(dcb) && dc->res_pool->hubbub->funcs->init_watermarks) dc->res_pool->hubbub->funcs->init_watermarks(dc->res_pool->hubbub); diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c index 4206ce5bf9a9..1d2682104670 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c @@ -200,6 +200,9 @@ void dcn31_init_hw(struct dc *dc) if (dc->config.power_down_display_on_boot) dc_link_blank_all_dp_displays(dc); + if (hws->funcs.enable_power_gating_plane) + hws->funcs.enable_power_gating_plane(dc->hwseq, true); + /* If taking control over from VBIOS, we may want to optimize our first * mode set, so we need to skip powering down pipes until we know which * pipes we want to use. @@ -249,8 +252,6 @@ void dcn31_init_hw(struct dc *dc) REG_UPDATE(DCFCLK_CNTL, DCFCLK_GATE_DIS, 0); } - if (hws->funcs.enable_power_gating_plane) - hws->funcs.enable_power_gating_plane(dc->hwseq, true); if (!dcb->funcs->is_accelerated_mode(dcb) && dc->res_pool->hubbub->funcs->init_watermarks) dc->res_pool->hubbub->funcs->init_watermarks(dc->res_pool->hubbub); -- 2.35.1
[PATCH AUTOSEL 5.17 35/49] drm/amd/display: Correct Slice reset calculation
From: Chris Park [ Upstream commit 862a876c3a6372f2fa9d0c6510f1976ac94fc857 ] [Why] Once DSC slice cannot fit pixel clock, we incorrectly reset min slices to 0 and allow max slice to operate, even when max slice itself cannot fit the pixel clock properly. [How] Change the sequence such that we correctly determine DSC is not possible when both min slices and max slices cannot fit pixel clock per slice. Reviewed-by: Wenjing Liu Acked-by: Alex Hung Signed-off-by: Chris Park Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c b/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c index 9c74564cbd8d..8973d3a38f9c 100644 --- a/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c +++ b/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c @@ -864,11 +864,11 @@ static bool setup_dsc_config( min_slices_h = inc_num_slices(dsc_common_caps.slice_caps, min_slices_h); } + is_dsc_possible = (min_slices_h <= max_slices_h); + if (pic_width % min_slices_h != 0) min_slices_h = 0; // DSC TODO: Maybe try increasing the number of slices first? - is_dsc_possible = (min_slices_h <= max_slices_h); - if (min_slices_h == 0 && max_slices_h == 0) is_dsc_possible = false; -- 2.35.1
[PATCH AUTOSEL 5.17 12/49] drm/amdkfd: Check for potential null return of kmalloc_array()
From: QintaoShen [ Upstream commit ebbb7bb9e80305820dc2328a371c1b35679f2667 ] As the kmalloc_array() may return null, the 'event_waiters[i].wait' would lead to null-pointer dereference. Therefore, it is better to check the return value of kmalloc_array() to avoid this confusion. Signed-off-by: QintaoShen Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c index afe72dd11325..6ca7e12bdab8 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c @@ -531,6 +531,8 @@ static struct kfd_event_waiter *alloc_event_waiters(uint32_t num_events) event_waiters = kmalloc_array(num_events, sizeof(struct kfd_event_waiter), GFP_KERNEL); + if (!event_waiters) + return NULL; for (i = 0; (event_waiters) && (i < num_events) ; i++) { init_wait(&event_waiters[i].wait); -- 2.35.1
[PATCH AUTOSEL 5.17 11/49] drm/amdgpu/vcn: improve vcn dpg stop procedure
From: Tianci Yin [ Upstream commit 6ea239adc2a712eb318f04f5c29b018ba65ea38a ] Prior to disabling dpg, VCN need unpausing dpg mode, or VCN will hang in S3 resuming. Reviewed-by: James Zhu Signed-off-by: Tianci Yin Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c index da11ceba0698..2ae56b2ed318 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c @@ -1474,8 +1474,11 @@ static int vcn_v3_0_start_sriov(struct amdgpu_device *adev) static int vcn_v3_0_stop_dpg_mode(struct amdgpu_device *adev, int inst_idx) { + struct dpg_pause_state state = {.fw_based = VCN_DPG_STATE__UNPAUSE}; uint32_t tmp; + vcn_v3_0_pause_dpg_mode(adev, 0, &state); + /* Wait for power status to be 1 */ SOC15_WAIT_ON_RREG(VCN, inst_idx, mmUVD_POWER_STATUS, 1, UVD_POWER_STATUS__UVD_POWER_STATUS_MASK); -- 2.35.1
[PATCH AUTOSEL 5.17 10/49] drm/amdkfd: Fix Incorrect VMIDs passed to HWS
From: Tushar Patel [ Upstream commit b7dfbd2e601f3fee545bc158feceba4f340fe7cf ] Compute-only GPUs have more than 8 VMIDs allocated to KFD. Fix this by passing correct number of VMIDs to HWS v2: squash in warning fix (Alex) Signed-off-by: Tushar Patel Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 11 +++ 2 files changed, 4 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 0ead08ba58c2..7672790d8b6b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -686,7 +686,7 @@ MODULE_PARM_DESC(sched_policy, * Maximum number of processes that HWS can schedule concurrently. The maximum is the * number of VMIDs assigned to the HWS, which is also the default. */ -int hws_max_conc_proc = 8; +int hws_max_conc_proc = -1; module_param(hws_max_conc_proc, int, 0444); MODULE_PARM_DESC(hws_max_conc_proc, "Max # processes HWS can execute concurrently when sched_policy=0 (0 = no concurrency, #VMIDs for KFD = Maximum(default))"); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 2b65d0acae2c..2fdbe2f475e4 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -480,15 +480,10 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd, } /* Verify module parameters regarding mapped process number*/ - if ((hws_max_conc_proc < 0) - || (hws_max_conc_proc > kfd->vm_info.vmid_num_kfd)) { - dev_err(kfd_device, - "hws_max_conc_proc %d must be between 0 and %d, use %d instead\n", - hws_max_conc_proc, kfd->vm_info.vmid_num_kfd, - kfd->vm_info.vmid_num_kfd); + if (hws_max_conc_proc >= 0) + kfd->max_proc_per_quantum = min((u32)hws_max_conc_proc, kfd->vm_info.vmid_num_kfd); + else kfd->max_proc_per_quantum = kfd->vm_info.vmid_num_kfd; - } else - kfd->max_proc_per_quantum = hws_max_conc_proc; /* calculate max size of mqds needed for queues */ size = max_num_of_queues_per_device * -- 2.35.1
[PATCH AUTOSEL 5.17 09/49] drm/amd/display: Update VTEM Infopacket definition
From: "Leo (Hanghong) Ma" [ Upstream commit c9fbf6435162ed5fb7201d1d4adf6585c6a8c327 ] [Why & How] The latest HDMI SPEC has updated the VTEM packet structure, so change the VTEM Infopacket defined in the driver side to align with the SPEC. Reviewed-by: Chris Park Acked-by: Alex Hung Signed-off-by: Leo (Hanghong) Ma Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- .../gpu/drm/amd/display/modules/info_packet/info_packet.c| 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c b/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c index 57f198de5e2c..4e075b01d48b 100644 --- a/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c +++ b/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c @@ -100,7 +100,8 @@ enum vsc_packet_revision { //PB7 = MD0 #define MASK_VTEM_MD0__VRR_EN 0x01 #define MASK_VTEM_MD0__M_CONST0x02 -#define MASK_VTEM_MD0__RESERVED2 0x0C +#define MASK_VTEM_MD0__QMS_EN 0x04 +#define MASK_VTEM_MD0__RESERVED2 0x08 #define MASK_VTEM_MD0__FVA_FACTOR_M1 0xF0 //MD1 @@ -109,7 +110,7 @@ enum vsc_packet_revision { //MD2 #define MASK_VTEM_MD2__BASE_REFRESH_RATE_98 0x03 #define MASK_VTEM_MD2__RB0x04 -#define MASK_VTEM_MD2__RESERVED3 0xF8 +#define MASK_VTEM_MD2__NEXT_TFR 0xF8 //MD3 #define MASK_VTEM_MD3__BASE_REFRESH_RATE_07 0xFF -- 2.35.1
[PATCH AUTOSEL 5.17 08/49] drm/amd/display: FEC check in timing validation
From: Chiawen Huang [ Upstream commit 7d56a154e22ffb3613fdebf83ec34d5225a22993 ] [Why] disable/enable leads FEC mismatch between hw/sw FEC state. [How] check FEC status to fastboot on/off. Reviewed-by: Anthony Koo Acked-by: Alex Hung Signed-off-by: Chiawen Huang Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/dc/core/dc.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index ba1aa994db4b..a62711d71d0e 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -1493,6 +1493,10 @@ bool dc_validate_seamless_boot_timing(const struct dc *dc, if (!link->link_enc->funcs->is_dig_enabled(link->link_enc)) return false; + /* Check for FEC status*/ + if (link->link_enc->funcs->fec_is_active(link->link_enc)) + return false; + enc_inst = link->link_enc->funcs->get_dig_frontend(link->link_enc); if (enc_inst == ENGINE_ID_UNKNOWN) -- 2.35.1
[PATCH AUTOSEL 5.17 07/49] drm/amd/display: fix audio format not updated after edid updated
From: Charlene Liu [ Upstream commit 5e8a71cf13bc9184fee915b2220be71b4c6cac74 ] [why] for the case edid change only changed audio format. driver still need to update stream. Reviewed-by: Alvin Lee Reviewed-by: Aric Cyr Acked-by: Alex Hung Signed-off-by: Charlene Liu Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c index 18757c158523..bb8a762cf8da 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c @@ -1664,8 +1664,8 @@ bool dc_is_stream_unchanged( if (old_stream->ignore_msa_timing_param != stream->ignore_msa_timing_param) return false; - // Only Have Audio left to check whether it is same or not. This is a corner case for Tiled sinks - if (old_stream->audio_info.mode_count != stream->audio_info.mode_count) + /*compare audio info*/ + if (memcmp(&old_stream->audio_info, &stream->audio_info, sizeof(stream->audio_info)) != 0) return false; return true; -- 2.35.1
[PATCH AUTOSEL 5.17 06/49] drm/amdgpu/gmc: use PCI BARs for APUs in passthrough
From: Alex Deucher [ Upstream commit b818a5d374542ccec73dcfe578a081574029820e ] If the GPU is passed through to a guest VM, use the PCI BAR for CPU FB access rather than the physical address of carve out. The physical address is not valid in a guest. v2: Fix HDP handing as suggested by Michel Reviewed-by: Christian König Reviewed-by: Michel Dänzer Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 5 +++-- drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 2 +- 5 files changed, 8 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index f18c698137a6..b590795c1bc4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5678,7 +5678,7 @@ void amdgpu_device_flush_hdp(struct amdgpu_device *adev, struct amdgpu_ring *ring) { #ifdef CONFIG_X86_64 - if (adev->flags & AMD_IS_APU) + if ((adev->flags & AMD_IS_APU) && !amdgpu_passthrough(adev)) return; #endif if (adev->gmc.xgmi.connected_to_cpu) @@ -5694,7 +5694,7 @@ void amdgpu_device_invalidate_hdp(struct amdgpu_device *adev, struct amdgpu_ring *ring) { #ifdef CONFIG_X86_64 - if (adev->flags & AMD_IS_APU) + if ((adev->flags & AMD_IS_APU) && !amdgpu_passthrough(adev)) return; #endif if (adev->gmc.xgmi.connected_to_cpu) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c index a2f8ed0e6a64..f1b794d5d87d 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c @@ -788,7 +788,7 @@ static int gmc_v10_0_mc_init(struct amdgpu_device *adev) adev->gmc.aper_size = pci_resource_len(adev->pdev, 0); #ifdef CONFIG_X86_64 - if (adev->flags & AMD_IS_APU) { + if ((adev->flags & AMD_IS_APU) && !amdgpu_passthrough(adev)) { adev->gmc.aper_base = adev->gfxhub.funcs->get_mc_fb_offset(adev); adev->gmc.aper_size = adev->gmc.real_vram_size; } diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c index ab8adbff9e2d..5206e2da334a 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c @@ -381,8 +381,9 @@ static int gmc_v7_0_mc_init(struct amdgpu_device *adev) adev->gmc.aper_size = pci_resource_len(adev->pdev, 0); #ifdef CONFIG_X86_64 - if (adev->flags & AMD_IS_APU && - adev->gmc.real_vram_size > adev->gmc.aper_size) { + if ((adev->flags & AMD_IS_APU) && + adev->gmc.real_vram_size > adev->gmc.aper_size && + !amdgpu_passthrough(adev)) { adev->gmc.aper_base = ((u64)RREG32(mmMC_VM_FB_OFFSET)) << 22; adev->gmc.aper_size = adev->gmc.real_vram_size; } diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c index 054733838292..d07d36786836 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c @@ -581,7 +581,7 @@ static int gmc_v8_0_mc_init(struct amdgpu_device *adev) adev->gmc.aper_size = pci_resource_len(adev->pdev, 0); #ifdef CONFIG_X86_64 - if (adev->flags & AMD_IS_APU) { + if ((adev->flags & AMD_IS_APU) && !amdgpu_passthrough(adev)) { adev->gmc.aper_base = ((u64)RREG32(mmMC_VM_FB_OFFSET)) << 22; adev->gmc.aper_size = adev->gmc.real_vram_size; } diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index 34ee75cf7954..2fb24178eaef 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -1420,7 +1420,7 @@ static int gmc_v9_0_mc_init(struct amdgpu_device *adev) */ /* check whether both host-gpu and gpu-gpu xgmi links exist */ - if ((adev->flags & AMD_IS_APU) || + if (((adev->flags & AMD_IS_APU) && !amdgpu_passthrough(adev)) || (adev->gmc.xgmi.supported && adev->gmc.xgmi.connected_to_cpu)) { adev->gmc.aper_base = -- 2.35.1
[PATCH AUTOSEL 5.17 05/49] drm/amdgpu: conduct a proper cleanup of PDB bo
From: Guchun Chen [ Upstream commit 2d505453f38e18d42ba7d5428aaa17aaa7752c65 ] Use amdgpu_bo_free_kernel instead of amdgpu_bo_unref to perform a proper cleanup of PDB bo. v2: update subject to be more accurate Signed-off-by: Guchun Chen Reviewed-by: Christian König Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index 88c1eb9ad068..34ee75cf7954 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -1684,7 +1684,7 @@ static int gmc_v9_0_sw_fini(void *handle) amdgpu_gem_force_release(adev); amdgpu_vm_manager_fini(adev); amdgpu_gart_table_vram_free(adev); - amdgpu_bo_unref(&adev->gmc.pdb0_bo); + amdgpu_bo_free_kernel(&adev->gmc.pdb0_bo, NULL, &adev->gmc.ptr_pdb0); amdgpu_bo_fini(adev); return 0; -- 2.35.1
[PATCH AUTOSEL 5.17 02/49] drm/amd: Add USBC connector ID
From: Aurabindo Pillai [ Upstream commit c5c948aa894a831f96fccd025e47186b1ee41615 ] [Why&How] Add a dedicated AMDGPU specific ID for use with newer ASICs that support USB-C output Signed-off-by: Aurabindo Pillai Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/ObjectID.h | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/ObjectID.h b/drivers/gpu/drm/amd/amdgpu/ObjectID.h index 5b393622f592..a0f0a17e224f 100644 --- a/drivers/gpu/drm/amd/amdgpu/ObjectID.h +++ b/drivers/gpu/drm/amd/amdgpu/ObjectID.h @@ -119,6 +119,7 @@ #define CONNECTOR_OBJECT_ID_eDP 0x14 #define CONNECTOR_OBJECT_ID_MXM 0x15 #define CONNECTOR_OBJECT_ID_LVDS_eDP 0x16 +#define CONNECTOR_OBJECT_ID_USBC 0x17 /* deleted */ -- 2.35.1
Re: [PATCH] drm/amdkfd: Cleanup IO links during KFD device removal
Am 2022-04-07 um 12:15 schrieb Mukul Joshi: Currently, the IO-links to the device being removed from topology, are not cleared. As a result, there would be dangling links left in the KFD topology. This patch aims to fix the following: 1. Cleanup all IO links to the device being removed. 2. Ensure that node numbering in sysfs and nodes proximity domain values are consistent after the device is removed: a. Adding a device and removing a GPU device are made mutually exclusive. b. The global proximity domain counter is no longer required to be an atomic counter. A normal 32-bit counter can be used instead. 3. Update generation_count to let user-mode know that topology has changed due to device removal. CC: Shuotao Xu Signed-off-by: Mukul Joshi Looks good to me. I have two nit-picks inline. --- drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 4 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 + drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 79 --- 3 files changed, 74 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c index 1eaabd2cb41b..afc8a7fcdad8 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c @@ -1056,7 +1056,7 @@ static int kfd_parse_subtype_iolink(struct crat_subtype_iolink *iolink, * table, add corresponded reversed direction link now. */ if (props && (iolink->flags & CRAT_IOLINK_FLAGS_BI_DIRECTIONAL)) { - to_dev = kfd_topology_device_by_proximity_domain(id_to); + to_dev = kfd_topology_device_by_proximity_domain_no_lock(id_to); if (!to_dev) return -ENODEV; /* same everything but the other direction */ @@ -2225,7 +2225,7 @@ static int kfd_create_vcrat_image_gpu(void *pcrat_image, */ if (kdev->hive_id) { for (nid = 0; nid < proximity_domain; ++nid) { - peer_dev = kfd_topology_device_by_proximity_domain(nid); + peer_dev = kfd_topology_device_by_proximity_domain_no_lock(nid); if (!peer_dev->gpu) continue; if (peer_dev->gpu->hive_id != kdev->hive_id) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index e1b7e6afa920..8a43def1f638 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -1016,6 +1016,8 @@ int kfd_topology_add_device(struct kfd_dev *gpu); int kfd_topology_remove_device(struct kfd_dev *gpu); struct kfd_topology_device *kfd_topology_device_by_proximity_domain( uint32_t proximity_domain); +struct kfd_topology_device *kfd_topology_device_by_proximity_domain_no_lock( + uint32_t proximity_domain); struct kfd_topology_device *kfd_topology_device_by_id(uint32_t gpu_id); struct kfd_dev *kfd_device_by_id(uint32_t gpu_id); struct kfd_dev *kfd_device_by_pci_dev(const struct pci_dev *pdev); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index 3bdcae239bc0..874a273b81f7 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -46,27 +46,38 @@ static struct list_head topology_device_list; static struct kfd_system_properties sys_props; static DECLARE_RWSEM(topology_lock); -static atomic_t topology_crat_proximity_domain; +static uint32_t topology_crat_proximity_domain; -struct kfd_topology_device *kfd_topology_device_by_proximity_domain( +struct kfd_topology_device *kfd_topology_device_by_proximity_domain_no_lock( uint32_t proximity_domain) I remember we discussed this and I suggested splitting a no_lock version out of this function. But now I don't see it being used anywhere. Was that lost somewhere in refactoring or porting to the upstream branch? Maybe the no_lock version isn't needed any more. { struct kfd_topology_device *top_dev; struct kfd_topology_device *device = NULL; - down_read(&topology_lock); - list_for_each_entry(top_dev, &topology_device_list, list) if (top_dev->proximity_domain == proximity_domain) { device = top_dev; break; } + return device; +} + +struct kfd_topology_device *kfd_topology_device_by_proximity_domain( + uint32_t proximity_domain) +{ + struct kfd_topology_device *device = NULL; + + down_read(&topology_lock); + + device = kfd_topology_device_by_proximity_domain_no_lock( + proximity_domain); up_read(&topology_lock); return device; } + s
Re: [PATCH 1/2] drm/amdkfd: Cleanup IO links during KFD device removal
Am 2022-04-08 um 04:45 schrieb Shuotao Xu: Currently, the IO-links to the device being removed from topology, are not cleared. As a result, there would be dangling links left in the KFD topology. This patch aims to fix the following: 1. Cleanup all IO links to the device being removed. 2. Ensure that node numbering in sysfs and nodes proximity domain values are consistent after the device is removed: a. Adding a device and removing a GPU device are made mutually exclusive. b. The global proximity domain counter is no longer required to be an atomic counter. A normal 32-bit counter can be used instead. 3. Update generation_count to let user-mode know that topology has changed due to device removal. Reviewed-by: Shuotao Xu CC: Shuotao Xu Signed-off-by: Mukul Joshi This looks like Mukul's patch, but with you as the author (otherwise I would have expected a "From: Mukul ..." line at the start of the email). Did you make any changes to it? Regards, Felix --- drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 4 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 + drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 79 --- 3 files changed, 74 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c index 1eaabd2cb41b..afc8a7fcdad8 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c @@ -1056,7 +1056,7 @@ static int kfd_parse_subtype_iolink(struct crat_subtype_iolink *iolink, * table, add corresponded reversed direction link now. */ if (props && (iolink->flags & CRAT_IOLINK_FLAGS_BI_DIRECTIONAL)) { - to_dev = kfd_topology_device_by_proximity_domain(id_to); + to_dev = kfd_topology_device_by_proximity_domain_no_lock(id_to); if (!to_dev) return -ENODEV; /* same everything but the other direction */ @@ -2225,7 +2225,7 @@ static int kfd_create_vcrat_image_gpu(void *pcrat_image, */ if (kdev->hive_id) { for (nid = 0; nid < proximity_domain; ++nid) { - peer_dev = kfd_topology_device_by_proximity_domain(nid); + peer_dev = kfd_topology_device_by_proximity_domain_no_lock(nid); if (!peer_dev->gpu) continue; if (peer_dev->gpu->hive_id != kdev->hive_id) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index e1b7e6afa920..8a43def1f638 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -1016,6 +1016,8 @@ int kfd_topology_add_device(struct kfd_dev *gpu); int kfd_topology_remove_device(struct kfd_dev *gpu); struct kfd_topology_device *kfd_topology_device_by_proximity_domain( uint32_t proximity_domain); +struct kfd_topology_device *kfd_topology_device_by_proximity_domain_no_lock( + uint32_t proximity_domain); struct kfd_topology_device *kfd_topology_device_by_id(uint32_t gpu_id); struct kfd_dev *kfd_device_by_id(uint32_t gpu_id); struct kfd_dev *kfd_device_by_pci_dev(const struct pci_dev *pdev); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index 3bdcae239bc0..874a273b81f7 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -46,27 +46,38 @@ static struct list_head topology_device_list; static struct kfd_system_properties sys_props; static DECLARE_RWSEM(topology_lock); -static atomic_t topology_crat_proximity_domain; +static uint32_t topology_crat_proximity_domain; -struct kfd_topology_device *kfd_topology_device_by_proximity_domain( +struct kfd_topology_device *kfd_topology_device_by_proximity_domain_no_lock( uint32_t proximity_domain) { struct kfd_topology_device *top_dev; struct kfd_topology_device *device = NULL; - down_read(&topology_lock); - list_for_each_entry(top_dev, &topology_device_list, list) if (top_dev->proximity_domain == proximity_domain) { device = top_dev; break; } + return device; +} + +struct kfd_topology_device *kfd_topology_device_by_proximity_domain( + uint32_t proximity_domain) +{ + struct kfd_topology_device *device = NULL; + + down_read(&topology_lock); + + device = kfd_topology_device_by_proximity_domain_no_lock( + proximity_domain); up_read(&topology_lock); return device; } + struct kfd_topology_device *kfd_topology_device_by_id(uint32_t gpu_id) { struct kfd_topo
Re: [PATCHv2] drm/amdgpu: disable ASPM on Intel AlderLake based systems
Hi Richard, Thank you for the patch! Yet something to improve: [auto build test ERROR on drm/drm-next] [also build test ERROR on v5.18-rc2 next-20220411] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/intel-lab-lkp/linux/commits/Richard-Gong/drm-amdgpu-disable-ASPM-on-Intel-AlderLake-based-systems/20220409-030656 base: git://anongit.freedesktop.org/drm/drm drm-next config: um-allmodconfig (https://download.01.org/0day-ci/archive/20220412/202204120618.4i6douw9-...@intel.com/config) compiler: gcc-11 (Debian 11.2.0-19) 11.2.0 reproduce (this is a W=1 build): # https://github.com/intel-lab-lkp/linux/commit/bb9a037cafa91918c2ece823591d1d04b812ae17 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Richard-Gong/drm-amdgpu-disable-ASPM-on-Intel-AlderLake-based-systems/20220409-030656 git checkout bb9a037cafa91918c2ece823591d1d04b812ae17 # save the config file to linux build tree mkdir build_dir make W=1 O=build_dir ARCH=um SHELL=/bin/bash If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All errors (new ones prefixed by >>): In file included from arch/x86/um/asm/processor.h:41, from include/linux/mutex.h:19, from include/linux/kernfs.h:11, from include/linux/sysfs.h:16, from include/linux/kobject.h:20, from include/linux/pci.h:35, from drivers/gpu/drm/amd/amdgpu/vi.c:24: drivers/gpu/drm/amd/amdgpu/vi.c: In function 'intel_core_apsm_chk': arch/um/include/asm/processor-generic.h:103:19: error: called object is not a function or function pointer 103 | #define cpu_data (&boot_cpu_data) | ~^~~ drivers/gpu/drm/amd/amdgpu/vi.c:1144:34: note: in expansion of macro 'cpu_data' 1144 | struct cpuinfo_x86 *c = &cpu_data(0); | ^~~~ >> drivers/gpu/drm/amd/amdgpu/vi.c:1146:18: error: invalid use of undefined >> type 'struct cpuinfo_x86' 1146 | return (c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE); | ^~ drivers/gpu/drm/amd/amdgpu/vi.c:1146:33: error: invalid use of undefined type 'struct cpuinfo_x86' 1146 | return (c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE); | ^~ drivers/gpu/drm/amd/amdgpu/vi.c:1150:1: error: control reaches end of non-void function [-Werror=return-type] 1150 | } | ^ cc1: some warnings being treated as errors vim +1146 drivers/gpu/drm/amd/amdgpu/vi.c 1140 1141 static bool intel_core_apsm_chk(void) 1142 { 1143 #if IS_ENABLED(CONFIG_X86_64) 1144 struct cpuinfo_x86 *c = &cpu_data(0); 1145 > 1146 return (c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE); 1147 #else 1148 return false; 1149 #endif 1150 } 1151 -- 0-DAY CI Kernel Test Service https://01.org/lkp
Re: AMDGPU: RX 6500 XT: System reset when loading module [SOLVED]
Hi all, I wanted to put a capper on this just in case anyone was interested, or in case any future people find this thread, because I did find a resolution. Turns out the way to stop the system from crashing was to disable Global C-State Control in the BIOS. Christian, Alex, you guys seem to have been on the right track in that it was something power related. I haven't yet been able to figure out what Global C-State Control exactly does. My best guess as to what was happening: once the GPU power management code was loaded and the GPU dropped into a very low power state, the CPU saw this and decided to match it, lowering its own power state to such a point that it hard resets the system. (Just a wild theory anyway :-) If anyone knows what this feature really does, or has any better theories as to why it doesn't play nice with AMD GPUs, please do share! Thanks and kind regards, -- Cal Peake
gcc inserts __builtin_popcount, causes 'modpost: "__popcountdi2" ... amdgpu.ko] undefined'
Current linux-5.17.1 on fresh gcc-12 fails to build with errors like: ERROR: modpost: "__popcountdi2" [drivers/net/ethernet/broadcom/bnx2x/bnx2x.ko] undefined! ERROR: modpost: "__popcountdi2" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined! It is occasionally seen by others on previous gcc versions as well: https://lkml.org/lkml/2021/7/11/261 https://lkml.org/lkml/2018/10/24/403 '__popcountdi2' are inserted by gcc for code like the following from 'drivers/gpu/drm/amd/display/modules/hdcp/hdcp1_execution.c': static inline enum mod_hdcp_status validate_bksv(struct mod_hdcp *hdcp) { uint64_t n = 0; uint8_t count = 0; u8 bksv[sizeof(n)] = { }; memcpy(bksv, hdcp->auth.msg.hdcp1.bksv, sizeof(hdcp->auth.msg.hdcp1.bksv)); n = *(uint64_t *)bksv; /* Here gcc inserts 'count = __builtin_popcount(n);' */ while (n) { count++; n &= (n - 1); } return (count == 20) ? MOD_HDCP_STATUS_SUCCESS : MOD_HDCP_STATUS_HDCP1_INVALID_BKSV; } Note that gcc can insert it regardless of -mno-* options. How should linux.git handle it? A few options come to mind: - Perhaps use libgcc.a directly. - Just implement '__popcountdi2'. Example definition from libgcc: https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libgcc/libgcc2.c;hb=HEAD#l846 - Or workaround it with -fno-builtin-popcount in Makefiles. CCing people who can help routing it and/or deciding on the fix: amd-gfx@lists.freedesktop.org, Joe Perches, linux-kbu...@vger.kernel.org, Jakub Jelinek, Segher Boessenkool, Thomas Gleixner,Peter Zijlstra, Andy Lutomirski. WDYT? Thanks! -- Sergei
Re: gcc inserts __builtin_popcount, causes 'modpost: "__popcountdi2" ... amdgpu.ko] undefined'
On Mon, Apr 11, 2022 at 10:08:15PM +0100, Sergei Trofimovich wrote: > Current linux-5.17.1 on fresh gcc-12 fails to build with errors like: > > ERROR: modpost: "__popcountdi2" > [drivers/net/ethernet/broadcom/bnx2x/bnx2x.ko] undefined! > ERROR: modpost: "__popcountdi2" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] > undefined! > > It is occasionally seen by others on previous gcc versions as well: > > https://lkml.org/lkml/2021/7/11/261 > https://lkml.org/lkml/2018/10/24/403 > > '__popcountdi2' are inserted by gcc for code like the following > from 'drivers/gpu/drm/amd/display/modules/hdcp/hdcp1_execution.c': > > static inline enum mod_hdcp_status validate_bksv(struct mod_hdcp *hdcp) > { > uint64_t n = 0; > uint8_t count = 0; > u8 bksv[sizeof(n)] = { }; > > memcpy(bksv, hdcp->auth.msg.hdcp1.bksv, > sizeof(hdcp->auth.msg.hdcp1.bksv)); > n = *(uint64_t *)bksv; > > /* Here gcc inserts 'count = __builtin_popcount(n);' */ > while (n) { > count++; > n &= (n - 1); > } > > return (count == 20) ? MOD_HDCP_STATUS_SUCCESS : >MOD_HDCP_STATUS_HDCP1_INVALID_BKSV; > } > > Note that gcc can insert it regardless of -mno-* options. Just FYI, this has been added in GCC 9 for https://gcc.gnu.org/PR82479 If the kernel implements its own __popcount?i2, it can perhaps with runtime patching use hw instructions when those are available and fallback to the generic version. Jakub
Vega 56 failing to process EDID from VR Headset
Hi, I have an Oculus Rift S, that I am trying to get working in Linux. I have an AMD Vega 56 graphics card. The VR headset plugs into a display port of the Vega56. The amdgpu driver sees the connection, and tries to process it. The problem is it cannot process the EDID, so fails to recognise the VR headset, and the VR headset does not work as a result. Please find the EDID below. I am guessing that the following is causing the problem: Established Timings I & II: none Standard Timings: none Forcing the driver to understand the Detailed mode, to which it is failing. If it helps, when attached to windows 10, it uses 1440x2560, portrait mode. Some dmesg lines that may be useful: // We should pick 1440x2560 as Windows picks that, but for some reason is rejects it with error 10. [10402.650734] [drm:create_validate_stream_for_sink [amdgpu]] Mode 1440x2560 (clk 571570) failed DC validation with error 10 (No DP link bandwidth) [10402.650991] [drm:update_stream_scaling_settings [amdgpu]] Destination Rectangle x:0 y:0 width:1440 height:2560 [10402.651225] [drm:create_validate_stream_for_sink [amdgpu]] Mode 1440x2560 (clk 571570) failed DC validation with error 10 (No DP link bandwidth) Can anyone help give me with some pointers as to how to get the amdgpu driver to accept this EDID? edid-decode (hex): 00 ff ff ff ff ff ff 00 3e d2 12 00 00 00 00 00 01 1d 01 04 a5 00 00 00 0e 00 00 00 00 00 00 00 00 00 00 00 00 00 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 45 df a0 38 51 00 ee a5 d2 0c f1 0c 00 00 00 00 00 10 00 00 00 ff 00 31 57 4d 47 48 35 32 30 47 41 39 35 32 00 00 00 fc 00 52 69 66 74 20 53 0a 20 20 20 20 20 20 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 33 EDID version: 1.4 Manufacturer: OVR Model 18 Serial Number 0 Made in week 1 of 2019 Digital display 8 bits per primary color channel DisplayPort interface Image size is variable Gamma: 1.00 Supported color formats: RGB 4:4:4, YCrCb 4:4:4 Default (sRGB) color space is primary color space First detailed timing includes the native pixel format and preferred refresh rate Color Characteristics Red: 0., 0. Green: 0., 0. Blue: 0., 0. White: 0., 0. Established Timings I & II: none Standard Timings: none Detailed mode: Clock 571.570 MHz, digital composite 1440 1650 1662 1752 (210 12 90) 2560 2623 2624 4078 ( 63 1 1454) -hsync VertFreq: 80.000 Hz, HorFreq: 326.239 kHz Display Product Serial Number: 1WMGH520GA952 Display Product Name: Rift S Dummy Descriptor Checksum: 0x33
[igt-dev] [PATCH i-g-t v2] tests/drm_buddy: Add drm buddy test cases
Add a set of drm buddy test cases to validate the drm/drm_buddy.c memory allocator. v2: sorted in alphabetical order Signed-off-by: Arunpravin Paneer Selvam Reviewed-by: Matthew Auld --- tests/drm_buddy.c | 14 ++ tests/meson.build | 1 + 2 files changed, 15 insertions(+) create mode 100644 tests/drm_buddy.c diff --git a/tests/drm_buddy.c b/tests/drm_buddy.c new file mode 100644 index ..06876e0c --- /dev/null +++ b/tests/drm_buddy.c @@ -0,0 +1,14 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2019 Intel Corporation + */ + +#include "igt.h" +#include "igt_kmod.h" + +IGT_TEST_DESCRIPTION("Basic sanity check of DRM's buddy allocator (struct drm_buddy)"); + +igt_main +{ + igt_kselftests("test-drm_buddy", NULL, NULL, NULL); +} diff --git a/tests/meson.build b/tests/meson.build index b0eab3d6..7261e9aa 100644 --- a/tests/meson.build +++ b/tests/meson.build @@ -8,6 +8,7 @@ test_progs = [ 'debugfs_test', 'dmabuf', 'device_reset', + 'drm_buddy', 'drm_import_export', 'drm_mm', 'drm_read', -- 2.25.1
Re: AMDGPU: regression on 5.17.1
On Sat, Apr 9, 2022 at 12:28 PM Michele Ballabio wrote: > > On Tue, 5 Apr 2022 10:23:16 -0400 > Alex Deucher wrote: > > > On Mon, Apr 4, 2022 at 3:39 PM Michele Ballabio > > wrote: > > > > > > On Mon, 4 Apr 2022 13:03:41 -0400 > > > Alex Deucher wrote: > > > > > > > On Sun, Apr 3, 2022 at 10:19 AM Michele Ballabio > > > > wrote: > > > > > > > > > > Hi, > > > > > I've hit a regression on 5.17.1 (haven't tested 5.17.0, > > > > > but 5.16-stable didn't have this problem). > > > > > > > > > > The machine is a Ryzen 5 1600 with AMD graphics (RX 560). > > > > > > > > > > The regression I hit seems to trigger when the machine is left > > > > > idle at boot (I don't boot straight to X, I boot to a tty, login > > > > > and then start X). The machine after a while blanks the screen. > > > > > Usually, the screen unblanks as the keyboard is hit or the mouse > > > > > moves, but with kernel 5.17.1 the screen does not wake up. The > > > > > machine seems to run mostly fine: I can login from ssh, but I > > > > > cannot reboot or halt it: a sysrq sequence is needed for that. > > > > > Note that if the screen goes blank under X, it wakes up fine. > > > > > > > > > > Below a dmesg and two traces from syslog (they're quite > > > > > similar). > > > > > > > > Can you bisect? Does setting amdgpu.runpm=0 help? > > > > > > I can try to bisect, should I narrow the search to drivers/gpu/drm/ > > > ? > > > > I would just do a full bisect if possible in case the change happens > > to be outside of drm. > > > > > > > > Setting amdgpu.runpm=0 works, the display now unblanks without > > > problems. > > > > Hi, > I bisected this, and the first bad commit is > [087451f372bf76d971184caa258807b7c35aac8f] drm/amdgpu: use generic fb > helpers instead of setting up AMD own's. > > Let me know if you need some more testing. Thanks. Do the attached patches fix the issue? Thanks, Alex From be5c2680ed4b3b4121af4f692c905d88d7b6cc00 Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Tue, 28 Dec 2021 17:26:24 -0500 Subject: [PATCH 1/2] drm/amdgpu: don't runtime suspend if there are displays attached We normally runtime suspend when there are displays attached if they are in the DPMS off state, however, if something wakes the GPU we send a hotplug event on resume (in case any displays were connected while the GPU was in suspend) which can cause userspace to light up the displays again soon after they were turned off. Prior to commit 087451f372bf76 ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's."), the driver took a runtime pm reference when the fbdev emulation was enabled because we didn't implement proper shadowing support for vram access when the device was off so the device never runtime suspended when there was a console bound. Once that commit landed, we now utilize the core fb helper implementation which properly handles the emulation, so runtime pm now suspends in cases where it did not before. Ultimately, we need to sort out why runtime suspend in not working in this case for some users, but this should restore similar behavior to before. Fixes: 087451f372bf76 ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's.") Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 22 ++ 1 file changed, 18 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 4efaa183abcd..cf795a9b8aef 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -2508,6 +2508,8 @@ static int amdgpu_pmops_runtime_idle(struct device *dev) { struct drm_device *drm_dev = dev_get_drvdata(dev); struct amdgpu_device *adev = drm_to_adev(drm_dev); + struct drm_connector *list_connector; + struct drm_connector_list_iter iter; /* we don't want the main rpm_idle to call suspend - we want to autosuspend */ int ret = 1; @@ -2516,6 +2518,22 @@ static int amdgpu_pmops_runtime_idle(struct device *dev) return -EBUSY; } + /* XXX: Return busy if any displays are connected to avoid + * possible display wake ups after runtime resume due to + * hotplug events in case any displays were connected while + * the GPU was in suspend. Remove this once that is fixed. + */ + mutex_lock(&drm_dev->mode_config.mutex); + drm_connector_list_iter_begin(drm_dev, &iter); + drm_for_each_connector_iter(list_connector, &iter) { + if (list_connector->status == connector_status_connected) { + ret = -EBUSY; + break; + } + } + drm_connector_list_iter_end(&iter); + mutex_unlock(&drm_dev->mode_config.mutex); + if (amdgpu_device_has_dc_support(adev)) { struct drm_crtc *crtc; @@ -2527,11 +2545,7 @@ static int amdgpu_pmops_runtime_idle(struct device *dev) if (ret < 0) break; } - } else { - struct drm_connector *list_connector; - struct drm_connector_list_iter iter; - mutex_lock(&drm_dev->mode_config.mutex); drm_modeset_lock(&drm_dev->mode_conf
[linux-next:master] BUILD REGRESSION d12d7e1cfe38e0c36d28c7a9fbbc436ad0d17c14
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master branch HEAD: d12d7e1cfe38e0c36d28c7a9fbbc436ad0d17c14 Add linux-next specific files for 20220411 Error/Warning reports: https://lore.kernel.org/linux-mm/202203160358.yulpl6b4-...@intel.com https://lore.kernel.org/linux-mm/202204081656.6x4pfen4-...@intel.com https://lore.kernel.org/llvm/202203241958.uw9bwfmd-...@intel.com https://lore.kernel.org/llvm/202204030606.gml0qvdb-...@intel.com Error/Warning: (recently discovered and may have been fixed) ERROR: dtschema minimum version is v2022.3 drivers/bus/mhi/host/main.c:792:13: warning: parameter 'event_quota' set but not used [-Wunused-but-set-parameter] drivers/gpu/drm/amd/amdgpu/../display/dc/virtual/virtual_link_hwss.c:32:6: warning: no previous prototype for 'virtual_setup_stream_attribute' [-Wmissing-prototypes] powerpc-linux-ld: fs/btrfs/raid56.c:970: undefined reference to `__udivdi3' Unverified Error/Warning (likely false positive, please contact us if interested): ERROR: modpost: "__aeabi_uldivmod" [fs/btrfs/btrfs.ko] undefined! ERROR: modpost: "__hexagon_udivdi3" [fs/btrfs/btrfs.ko] undefined! ERROR: modpost: "__udivdi3" [fs/btrfs/btrfs.ko] undefined! Makefile:684: arch/h8300/Makefile: No such file or directory arch/Kconfig:10: can't open file "arch/h8300/Kconfig" arch/s390/include/asm/spinlock.h:81:3: error: unexpected token in '.rept' directive arch/s390/include/asm/spinlock.h:81:3: error: unknown directive arch/s390/include/asm/spinlock.h:81:3: error: unmatched '.endr' directive arch/s390/lib/spinlock.c:78:3: error: unexpected token in '.rept' directive arch/s390/lib/spinlock.c:78:3: error: unknown directive arch/s390/lib/spinlock.c:78:3: error: unmatched '.endr' directive drivers/dma-buf/st-dma-fence-unwrap.c:125:13: warning: variable 'err' set but not used [-Wunused-but-set-variable] drivers/gpu/drm/amd/amdgpu/../display/dc/dcn31/dcn31_hubp.c:57:6: warning: no previous prototype for 'hubp31_program_extended_blank' [-Wmissing-prototypes] drivers/hwmon/da9055-hwmon.c:201:9: warning: Call to function 'sprintf' is insecure as it does not provide bounding of the memory buffer or security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'sprintf_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling] drivers/hwmon/vt8231.c:634:9: warning: Call to function 'sprintf' is insecure as it does not provide bounding of the memory buffer or security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'sprintf_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling] drivers/misc/cb710/debug.c:96:1: warning: Call to function 'sprintf' is insecure as it does not provide bounding of the memory buffer or security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'sprintf_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling] drivers/misc/mei/main.c:1100:9: warning: Call to function 'sprintf' is insecure as it does not provide bounding of the memory buffer or security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'sprintf_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling] drivers/pci/vgaarb.c:213:17: warning: Value stored to 'dev' during its initialization is never read [clang-analyzer-deadcode.DeadStores] drivers/staging/rtl8723bs/os_dep/ioctl_linux.c:589:29: warning: array subscript 'struct ndis_802_11_wep[0]' is partly outside array bounds of 'unsigned char[25]' [-Warray-bounds] drivers/tty/synclink_gt.c:3430:2: warning: Call to function 'sprintf' is insecure as it does not provide bounding of the memory buffer or security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'sprintf_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling] kernel/module/main.c:2189:4: warning: Null pointer passed as 1st argument to memory copy function [clang-analyzer-unix.cstring.NullArg] kernel/module/main.c:924:9: warning: Call to function 'sprintf' is insecure as it does not provide bounding of the memory buffer or security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'spr
Re: [PATCH] drm/radeon: remove r600_blit_shaders.[c|h]
On Sat, Apr 9, 2022 at 1:11 PM Tom Rix wrote: > > The only use of the global variables in r600_blit_shaders.c > were in the old drivers/gpu/drm/radeon/r600_blit.c > This file was removed in > commit 8333f607a631 ("drm/radeon: remove UMS support") > > So remove the r600_blit_shaders.[c|h] files > > Signed-off-by: Tom Rix Applied. Thanks! Alex
Patch "drm/amdkfd: Fix -Wstrict-prototypes from amdgpu_amdkfd_gfx_10_0_get_functions()" has been added to the 5.4-stable tree
This is a note to let you know that I've just added the patch titled drm/amdkfd: Fix -Wstrict-prototypes from amdgpu_amdkfd_gfx_10_0_get_functions() to the 5.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: drm-amdkfd-fix-wstrict-prototypes-from-amdgpu_amdkfd_gfx_10_0_get_functions.patch and it can be found in the queue-5.4 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let know about it. >From foo@baz Mon Apr 11 06:48:58 PM CEST 2022 From: Nathan Chancellor Date: Mon, 11 Apr 2022 09:43:08 -0700 Subject: drm/amdkfd: Fix -Wstrict-prototypes from amdgpu_amdkfd_gfx_10_0_get_functions() To: Greg Kroah-Hartman , Sasha Levin Cc: Felix Kuehling , Alex Deucher , Nick Desaulniers , amd-gfx@lists.freedesktop.org, l...@lists.linux.dev, sta...@vger.kernel.org, Nathan Chancellor Message-ID: <20220411164308.2491139-3-nat...@kernel.org> From: Nathan Chancellor This patch is for linux-5.4.y only, it has no equivalent change upstream. When building x86_64 allmodconfig with tip of tree clang, there is an instance of -Wstrict-prototypes: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c:168:59: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes] struct kfd2kgd_calls *amdgpu_amdkfd_gfx_10_0_get_functions() ^ void 1 error generated. amdgpu_amdkfd_gfx_10_0_get_functions() is prototyped properly in drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h but its definition in amdgpu_amdkfd_gfx_v10.c does not have the argument types specified, which causes the warning. GCC does not warn because it permits an old-style definition if the prototype has the argument types. This code was eliminated by commit e392c887df97 ("drm/amdkfd: Use array to probe kfd2kgd_calls"), which was a part of a larger series that does not look very suitable for stable. Just fix this one location, as it was the only instance of this new warning across a variety of builds. Fixes: 6bdadb207224 ("drm/amdgpu: Add navi10 kfd support for amdgpu (v3)") Signed-off-by: Nathan Chancellor Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c @@ -165,7 +165,7 @@ static const struct kfd2kgd_calls kfd2kg .get_tile_config = amdgpu_amdkfd_get_tile_config, }; -struct kfd2kgd_calls *amdgpu_amdkfd_gfx_10_0_get_functions() +struct kfd2kgd_calls *amdgpu_amdkfd_gfx_10_0_get_functions(void) { return (struct kfd2kgd_calls *)&kfd2kgd; } Patches currently in stable-queue which might be from nat...@kernel.org are queue-5.4/drm-amdkfd-fix-wstrict-prototypes-from-amdgpu_amdkfd_gfx_10_0_get_functions.patch queue-5.4/drm-amdkfd-add-missing-void-argument-to-function-kgd2kfd_init.patch queue-5.4/arm64-module-remove-noload-from-linker-script.patch queue-5.4/tools-build-filter-out-options-and-warnings-not-supported-by-clang.patch queue-5.4/riscv-module-remove-noload.patch queue-5.4/tools-build-use-shell-instead-of-to-get-embedded-libperl-s-ccopts.patch
Patch "drm/amdkfd: add missing void argument to function kgd2kfd_init" has been added to the 5.4-stable tree
This is a note to let you know that I've just added the patch titled drm/amdkfd: add missing void argument to function kgd2kfd_init to the 5.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: drm-amdkfd-add-missing-void-argument-to-function-kgd2kfd_init.patch and it can be found in the queue-5.4 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let know about it. >From foo@baz Mon Apr 11 06:48:58 PM CEST 2022 From: Nathan Chancellor Date: Mon, 11 Apr 2022 09:43:07 -0700 Subject: drm/amdkfd: add missing void argument to function kgd2kfd_init To: Greg Kroah-Hartman , Sasha Levin Cc: Felix Kuehling , Alex Deucher , Nick Desaulniers , amd-gfx@lists.freedesktop.org, l...@lists.linux.dev, sta...@vger.kernel.org, Colin Ian King , Randy Dunlap , Nathan Chancellor Message-ID: <20220411164308.2491139-2-nat...@kernel.org> From: Nathan Chancellor From: Colin Ian King commit 63617d8b125ed9f674133dd000b6df58d6b2965a upstream. Function kgd2kfd_init is missing a void argument, add it to clean up the non-ANSI function declaration. Acked-by: Randy Dunlap Signed-off-by: Colin Ian King Signed-off-by: Alex Deucher Signed-off-by: Nathan Chancellor Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/amd/amdkfd/kfd_module.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/drivers/gpu/drm/amd/amdkfd/kfd_module.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_module.c @@ -82,7 +82,7 @@ static void kfd_exit(void) kfd_chardev_exit(); } -int kgd2kfd_init() +int kgd2kfd_init(void) { return kfd_init(); } Patches currently in stable-queue which might be from nat...@kernel.org are queue-5.4/drm-amdkfd-fix-wstrict-prototypes-from-amdgpu_amdkfd_gfx_10_0_get_functions.patch queue-5.4/drm-amdkfd-add-missing-void-argument-to-function-kgd2kfd_init.patch queue-5.4/arm64-module-remove-noload-from-linker-script.patch queue-5.4/tools-build-filter-out-options-and-warnings-not-supported-by-clang.patch queue-5.4/riscv-module-remove-noload.patch queue-5.4/tools-build-use-shell-instead-of-to-get-embedded-libperl-s-ccopts.patch
Re: [PATCH 5.4 0/2] Fix two instances of -Wstrict-prototypes in drm/amd
On Mon, Apr 11, 2022 at 09:43:06AM -0700, Nathan Chancellor wrote: > Hi everyone, > > These two patches resolve two instances of -Wstrict-prototypes with > newer versions of clang that are present in 5.4. The main Makefile makes > this a hard error. > > The first patch is upstream commit 63617d8b125e ("drm/amdkfd: add > missing void argument to function kgd2kfd_init"), which showed up in > 5.5. > > The second patch has no upstream equivalent, as the code in question was > removed in commit e392c887df97 ("drm/amdkfd: Use array to probe > kfd2kgd_calls") upstream, which is part of a larger series that did not > look reasonable for stable. I opted to just fix the warning in the same > manner as the prior patch, which is less risky and accomplishes the same > end result of no warning. > > Colin Ian King (1): > drm/amdkfd: add missing void argument to function kgd2kfd_init > > Nathan Chancellor (1): > drm/amdkfd: Fix -Wstrict-prototypes from > amdgpu_amdkfd_gfx_10_0_get_functions() > > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c | 2 +- > drivers/gpu/drm/amd/amdkfd/kfd_module.c| 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > > base-commit: 2845ff3fd34499603249676495c524a35e795b45 > -- > 2.35.1 > Now queued up, thanks. greg k-h
RE: [PATCH 5.4 0/2] Fix two instances of -Wstrict-prototypes in drm/amd
[Public] > -Original Message- > From: Nathan Chancellor > Sent: Monday, April 11, 2022 12:43 PM > To: Greg Kroah-Hartman ; Sasha Levin > > Cc: Kuehling, Felix ; Deucher, Alexander > ; Nick Desaulniers > ; amd-gfx@lists.freedesktop.org; > l...@lists.linux.dev; sta...@vger.kernel.org; Nathan Chancellor > > Subject: [PATCH 5.4 0/2] Fix two instances of -Wstrict-prototypes in drm/amd > > Hi everyone, > > These two patches resolve two instances of -Wstrict-prototypes with newer > versions of clang that are present in 5.4. The main Makefile makes this a hard > error. > > The first patch is upstream commit 63617d8b125e ("drm/amdkfd: add missing > void argument to function kgd2kfd_init"), which showed up in 5.5. > > The second patch has no upstream equivalent, as the code in question was > removed in commit e392c887df97 ("drm/amdkfd: Use array to probe > kfd2kgd_calls") upstream, which is part of a larger series that did not look > reasonable for stable. I opted to just fix the warning in the same manner as > the prior patch, which is less risky and accomplishes the same end result of > no > warning. > > Colin Ian King (1): > drm/amdkfd: add missing void argument to function kgd2kfd_init > > Nathan Chancellor (1): > drm/amdkfd: Fix -Wstrict-prototypes from > amdgpu_amdkfd_gfx_10_0_get_functions() Series is: Acked-by: Alex Deucher > > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c | 2 +- > drivers/gpu/drm/amd/amdkfd/kfd_module.c| 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > > base-commit: 2845ff3fd34499603249676495c524a35e795b45 > -- > 2.35.1
[PATCH 5.4 2/2] drm/amdkfd: Fix -Wstrict-prototypes from amdgpu_amdkfd_gfx_10_0_get_functions()
This patch is for linux-5.4.y only, it has no equivalent change upstream. When building x86_64 allmodconfig with tip of tree clang, there is an instance of -Wstrict-prototypes: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c:168:59: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes] struct kfd2kgd_calls *amdgpu_amdkfd_gfx_10_0_get_functions() ^ void 1 error generated. amdgpu_amdkfd_gfx_10_0_get_functions() is prototyped properly in drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h but its definition in amdgpu_amdkfd_gfx_v10.c does not have the argument types specified, which causes the warning. GCC does not warn because it permits an old-style definition if the prototype has the argument types. This code was eliminated by commit e392c887df97 ("drm/amdkfd: Use array to probe kfd2kgd_calls"), which was a part of a larger series that does not look very suitable for stable. Just fix this one location, as it was the only instance of this new warning across a variety of builds. Fixes: 6bdadb207224 ("drm/amdgpu: Add navi10 kfd support for amdgpu (v3)") Signed-off-by: Nathan Chancellor --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c index ce30d4e8bf25..f7c4337c1ffe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c @@ -165,7 +165,7 @@ static const struct kfd2kgd_calls kfd2kgd = { .get_tile_config = amdgpu_amdkfd_get_tile_config, }; -struct kfd2kgd_calls *amdgpu_amdkfd_gfx_10_0_get_functions() +struct kfd2kgd_calls *amdgpu_amdkfd_gfx_10_0_get_functions(void) { return (struct kfd2kgd_calls *)&kfd2kgd; } -- 2.35.1
[PATCH 5.4 1/2] drm/amdkfd: add missing void argument to function kgd2kfd_init
From: Colin Ian King commit 63617d8b125ed9f674133dd000b6df58d6b2965a upstream. Function kgd2kfd_init is missing a void argument, add it to clean up the non-ANSI function declaration. Acked-by: Randy Dunlap Signed-off-by: Colin Ian King Signed-off-by: Alex Deucher Signed-off-by: Nathan Chancellor --- drivers/gpu/drm/amd/amdkfd/kfd_module.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_module.c b/drivers/gpu/drm/amd/amdkfd/kfd_module.c index 986ff52d5750..f4b7f7e6c40e 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_module.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_module.c @@ -82,7 +82,7 @@ static void kfd_exit(void) kfd_chardev_exit(); } -int kgd2kfd_init() +int kgd2kfd_init(void) { return kfd_init(); } -- 2.35.1
[PATCH 5.4 0/2] Fix two instances of -Wstrict-prototypes in drm/amd
Hi everyone, These two patches resolve two instances of -Wstrict-prototypes with newer versions of clang that are present in 5.4. The main Makefile makes this a hard error. The first patch is upstream commit 63617d8b125e ("drm/amdkfd: add missing void argument to function kgd2kfd_init"), which showed up in 5.5. The second patch has no upstream equivalent, as the code in question was removed in commit e392c887df97 ("drm/amdkfd: Use array to probe kfd2kgd_calls") upstream, which is part of a larger series that did not look reasonable for stable. I opted to just fix the warning in the same manner as the prior patch, which is less risky and accomplishes the same end result of no warning. Colin Ian King (1): drm/amdkfd: add missing void argument to function kgd2kfd_init Nathan Chancellor (1): drm/amdkfd: Fix -Wstrict-prototypes from amdgpu_amdkfd_gfx_10_0_get_functions() drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_module.c| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) base-commit: 2845ff3fd34499603249676495c524a35e795b45 -- 2.35.1
Re: [PATCH] drm/amd/amdgpu: Not request init data for MS_HYPERV with vega10
On Mon, Apr 11, 2022 at 11:28 AM Paul Menzel wrote: > > Dear Yongqiang, > > > Thank you for your patch. > > Am 11.04.22 um 15:59 schrieb Yongqiang Sun: > > MS_HYPERV with vega10 doesn't have the interface to process > > request init data msg. > > Should some Hyper-V folks be added to the reviewers list too? > > > Check hypervisor type to not send the request for MS_HYPERV. > > Please add a blank line between paragraphs. > > > Signed-off-by: Yongqiang Sun > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 12 ++-- > > 1 file changed, 10 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c > > index 933c41f77c92..56b130ec44a9 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c > > @@ -23,6 +23,10 @@ > > > > #include > > > > +#ifdef CONFIG_X86 > > +#include > > +#endif > > + > > #include > > > > #include "amdgpu.h" > > @@ -721,8 +725,12 @@ void amdgpu_detect_virtualization(struct amdgpu_device > > *adev) > > break; > > case CHIP_VEGA10: > > soc15_set_virt_ops(adev); > > - /* send a dummy GPU_INIT_DATA request to host on > > vega10 */ > > - amdgpu_virt_request_init_data(adev); > > +#ifdef CONFIG_X86 > > + /* not send GPU_INIT_DATA with MS_HYPERV*/ > > + if (hypervisor_is_type(X86_HYPER_MS_HYPERV) == false) > > +#endif > > Why guard everything with CONFIG_X86? (If it’s needed, it should be done > in C code.) X86_HYPER_MS_HYPERV only available on x86. Alex > > > + /* send a dummy GPU_INIT_DATA request to host > > on vega10 */ > > + amdgpu_virt_request_init_data(adev); > > break; > > case CHIP_VEGA20: > > case CHIP_ARCTURUS: > > > Kind regards, > > Paul
Re: [EXTERNAL] [PATCH 2/2] drm/amdkfd: Add PCIe Hotplug Support for AMDKFD
On 2022-04-08 21:28, Shuotao Xu wrote: On Apr 8, 2022, at 11:28 PM, Andrey Grodzovsky wrote: [Some people who received this message don't often get email from andrey.grodzov...@amd.com. Learn why this is important at http://aka.ms/LearnAboutSenderIdentification.] On 2022-04-08 04:45, Shuotao Xu wrote: Adding PCIe Hotplug Support for AMDKFD: the support of hot-plug of GPU devices can open doors for many advanced applications in data center in the next few years, such as for GPU resource disaggregation. Current AMDKFD does not support hotplug out b/o the following reasons: 1. During PCIe removal, decrement KFD lock which was incremented at the beginning of hw fini; otherwise kfd_open later is going to fail. I assumed you read my comment last time, still you do same approach. More in details bellow Aha, I like your fix:) I was not familiar with drm APIs so just only half understood your comment last time. BTW, I tried hot-plugging out a GPU when rocm application is still running. From dmesg, application is still trying to access the removed kfd device, and are met with some errors. Application us supposed to keep running, it holds the drm_device reference as long as it has an open FD to the device and final cleanup will come only after the app will die thus releasing the FD and the last drm_device reference. Application would hang and not exiting in this case. For graphic apps what i usually see is a crash because of sigsev when the app tries to access an unmapped MMIO region on the device. I haven't tested for compute stack and so there might be something I haven't covered. Hang could mean for example waiting on a fence which is not being signaled - please provide full dmesg from this case. Do you have any good suggestions on how to fix it down the line? (HIP runtime/libhsakmt or driver) [64036.631333] amdgpu: amdgpu_vm_bo_update failed [64036.631702] amdgpu: validate_invalid_user_pages: update PTE failed [64036.640754] amdgpu: amdgpu_vm_bo_update failed [64036.641120] amdgpu: validate_invalid_user_pages: update PTE failed [64036.650394] amdgpu: amdgpu_vm_bo_update failed [64036.650765] amdgpu: validate_invalid_user_pages: update PTE failed This just probably means trying to update PTEs after the physical device is gone - we usually avoid this by first trying to do all HW shutdowns early before PCI remove completion but when it's really tricky by protecting HW access sections with drm_dev_enter/exit scope. For this particular error it would be the best to flush info->restore_userptr_work before the end of amdgpu_pci_remove (rejecting new process creation and calling cancel_delayed_work_sync(&process_info->restore_userptr_work) for all running processes) somewhere in amdgpu_pci_remove. Andrey Really appreciate your help! Best, Shuotao 2. Remove redudant p2p/io links in sysfs when device is hotplugged out. 3. New kfd node_id is not properly assigned after a new device is added after a gpu is hotplugged out in a system. libhsakmt will find this anomaly, (i.e. node_from != in iolinks), when taking a topology_snapshot, thus returns fault to the rocm stack. -- This patch fixes issue 1; another patch by Mukul fixes issues 2&3. -- Tested on a 4-GPU MI100 gpu nodes with kernel 5.13.0-kfd; kernel 5.16.0-kfd is unstable out of box for MI100. --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 5 + drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 7 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 + drivers/gpu/drm/amd/amdkfd/kfd_device.c| 13 + 4 files changed, 26 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index c18c4be1e4ac..d50011bdb5c4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -213,6 +213,11 @@ int amdgpu_amdkfd_resume(struct amdgpu_device *adev, bool run_pm) return r; } +int amdgpu_amdkfd_resume_processes(void) +{ + return kgd2kfd_resume_processes(); +} + int amdgpu_amdkfd_pre_reset(struct amdgpu_device *adev) { int r = 0; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index f8b9f27adcf5..803306e011c3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -140,6 +140,7 @@ void amdgpu_amdkfd_fini(void); void amdgpu_amdkfd_suspend(struct amdgpu_device *adev, bool run_pm); int amdgpu_amdkfd_resume_iommu(struct amdgpu_device *adev); int amdgpu_amdkfd_resume(struct amdgpu_device *adev, bool run_pm); +int amdgpu_amdkfd_resume_processes(void); void amdgpu_amdkfd_interrupt(struct amdgpu_device *adev, const void *ih_ring_entry); void amdgpu_amdkfd_device_probe(struct amdgpu_device *adev); @@ -347,6 +348,7 @@ void kgd2kfd_device_exit(struct kfd_dev *kfd); void kgd2kfd_suspend(struct kfd_dev *kfd, bool
Re: [PATCH] drm/amd/amdgpu: Not request init data for MS_HYPERV with vega10
Dear Yongqiang, Thank you for your patch. Am 11.04.22 um 15:59 schrieb Yongqiang Sun: MS_HYPERV with vega10 doesn't have the interface to process request init data msg. Should some Hyper-V folks be added to the reviewers list too? Check hypervisor type to not send the request for MS_HYPERV. Please add a blank line between paragraphs. Signed-off-by: Yongqiang Sun --- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c index 933c41f77c92..56b130ec44a9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c @@ -23,6 +23,10 @@ #include +#ifdef CONFIG_X86 +#include +#endif + #include #include "amdgpu.h" @@ -721,8 +725,12 @@ void amdgpu_detect_virtualization(struct amdgpu_device *adev) break; case CHIP_VEGA10: soc15_set_virt_ops(adev); - /* send a dummy GPU_INIT_DATA request to host on vega10 */ - amdgpu_virt_request_init_data(adev); +#ifdef CONFIG_X86 + /* not send GPU_INIT_DATA with MS_HYPERV*/ + if (hypervisor_is_type(X86_HYPER_MS_HYPERV) == false) +#endif Why guard everything with CONFIG_X86? (If it’s needed, it should be done in C code.) + /* send a dummy GPU_INIT_DATA request to host on vega10 */ + amdgpu_virt_request_init_data(adev); break; case CHIP_VEGA20: case CHIP_ARCTURUS: Kind regards, Paul
Re: [PATCH v2 ] drm/amdgpu: fix discovery ip failed
On Mon, Apr 11, 2022 at 11:25 AM Zhang, Jesse(Jie) wrote: > > [AMD Official Use Only] > > > > + Deucher, Alexander > > > > From: Zhang, Jesse(Jie) > Sent: Monday, 11 April 2022 11:15 pm > To: amd-gfx@lists.freedesktop.org > Cc: Huang, Ray ; Zhang, Yifan > Subject: [PATCH ] drm/amdgpu: fix discovery ip failed > > > > [AMD Official Use Only] > > > > Fix discovery ip failed, and the log: > >56.129549] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected > number_instance (64) from ip discovery blob > [ 56.130129] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected > hw_id (45056) from ip discovery blob > [ 56.130701] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected > number_instance (66) from ip discovery blob > [ 56.131283] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected > hw_id (45568) from ip discovery blob > [ 56.131855] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected > number_instance (66) from ip discovery blob > [ 56.132436] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected > hw_id (480) from ip discovery blob > [ 56.133053] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected > hw_id (608) from ip discovery blob > [ 56.133626] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected > hw_id (640) from ip discovery blob > [ 56.134207] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected > number_instance (64) from ip discovery blob > [ 56.134780] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected > number_instance (64) from ip discovery blob > [ 56.135360] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected > hw_id (28672) from ip discovery blob > Please add a description of what the fix is doing and add your signed-off-by. Alex > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c > > index 766006a075ec..a778b0392e9f 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c > > @@ -798,7 +798,7 @@ static int amdgpu_discovery_sysfs_ips(struct > amdgpu_device *adev, > > res = kobject_add(&ip_hw_instance->kobj, NULL, > > "%d", ip_hw_instance->num_instance); > > next_ip: > > - ip_offset += sizeof(*ip) + 4 * (ip->num_base_address > - 1); > > + ip_offset += struct_size(ip, base_address, > ip->num_base_address); > > } > > } > > > > @@ -1063,7 +1063,7 @@ int amdgpu_discovery_reg_base_init(struct amdgpu_device > *adev) > > } > > > > next_ip: > > - ip_offset += sizeof(*ip) + 4 * (ip->num_base_address > - 1); > > + ip_offset += struct_size(ip, base_address, > ip->num_base_address); > > } > > } > > > > @@ -1113,7 +1113,7 @@ int amdgpu_discovery_get_ip_version(struct > amdgpu_device *adev, int hw_id, int n > > *revision = ip->revision; > > return 0; > > } > > - ip_offset += sizeof(*ip) + 4 * (ip->num_base_address > - 1); > > + ip_offset += struct_size(ip, base_address, > ip->num_base_address); > > } > > }
RE: [PATCH v2 ] drm/amdgpu: fix discovery ip failed
[AMD Official Use Only] + Deucher, Alexander From: Zhang, Jesse(Jie) Sent: Monday, 11 April 2022 11:15 pm To: amd-gfx@lists.freedesktop.org Cc: Huang, Ray ; Zhang, Yifan Subject: [PATCH ] drm/amdgpu: fix discovery ip failed [AMD Official Use Only] Fix discovery ip failed, and the log: 56.129549] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected number_instance (64) from ip discovery blob [ 56.130129] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected hw_id (45056) from ip discovery blob [ 56.130701] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected number_instance (66) from ip discovery blob [ 56.131283] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected hw_id (45568) from ip discovery blob [ 56.131855] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected number_instance (66) from ip discovery blob [ 56.132436] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected hw_id (480) from ip discovery blob [ 56.133053] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected hw_id (608) from ip discovery blob [ 56.133626] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected hw_id (640) from ip discovery blob [ 56.134207] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected number_instance (64) from ip discovery blob [ 56.134780] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected number_instance (64) from ip discovery blob [ 56.135360] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected hw_id (28672) from ip discovery blob diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c index 766006a075ec..a778b0392e9f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c @@ -798,7 +798,7 @@ static int amdgpu_discovery_sysfs_ips(struct amdgpu_device *adev, res = kobject_add(&ip_hw_instance->kobj, NULL, "%d", ip_hw_instance->num_instance); next_ip: - ip_offset += sizeof(*ip) + 4 * (ip->num_base_address - 1); + ip_offset += struct_size(ip, base_address, ip->num_base_address); } } @@ -1063,7 +1063,7 @@ int amdgpu_discovery_reg_base_init(struct amdgpu_device *adev) } next_ip: - ip_offset += sizeof(*ip) + 4 * (ip->num_base_address - 1); + ip_offset += struct_size(ip, base_address, ip->num_base_address); } } @@ -1113,7 +1113,7 @@ int amdgpu_discovery_get_ip_version(struct amdgpu_device *adev, int hw_id, int n *revision = ip->revision; return 0; } - ip_offset += sizeof(*ip) + 4 * (ip->num_base_address - 1); + ip_offset += struct_size(ip, base_address, ip->num_base_address); } }
Re: [PATCH] drm/amd/amdgpu: Not request init data for MS_HYPERV with vega10
On Mon, Apr 11, 2022 at 9:52 AM Yongqiang Sun wrote: > > MS_HYPERV with vega10 doesn't have the interface to process > request init data msg. > Check hypervisor type to not send the request for MS_HYPERV. > > Signed-off-by: Yongqiang Sun > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 12 ++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c > index 933c41f77c92..56b130ec44a9 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c > @@ -23,6 +23,10 @@ > > #include > > +#ifdef CONFIG_X86 > +#include > +#endif > + > #include > > #include "amdgpu.h" > @@ -721,8 +725,12 @@ void amdgpu_detect_virtualization(struct amdgpu_device > *adev) > break; > case CHIP_VEGA10: > soc15_set_virt_ops(adev); > - /* send a dummy GPU_INIT_DATA request to host on > vega10 */ > - amdgpu_virt_request_init_data(adev); > +#ifdef CONFIG_X86 > + /* not send GPU_INIT_DATA with MS_HYPERV*/ > + if (hypervisor_is_type(X86_HYPER_MS_HYPERV) == false) Preferred coding style would be: if (!hypervisor_is_type(X86_HYPER_MS_HYPERV)) With that fixed: Acked-by: Alex Deucher > +#endif > + /* send a dummy GPU_INIT_DATA request to host > on vega10 */ > + amdgpu_virt_request_init_data(adev); > break; > case CHIP_VEGA20: > case CHIP_ARCTURUS: > -- > 2.25.1 >
Re: [PATCH ] drm/amdgpu: fix discovery ip failed
Dear Jie, Thank you for your patch. Am 11.04.22 um 17:15 schrieb Zhang, Jesse(Jie): You might want to add a space before the (. [AMD Official Use Only] Please send a patch with `git format-patch` or similar. Fix discovery ip failed, and the log: On what system? 56.129549] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected number_instance (64) from ip discovery blob [ 56.130129] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected hw_id (45056) from ip discovery blob [ 56.130701] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected number_instance (66) from ip discovery blob [ 56.131283] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected hw_id (45568) from ip discovery blob [ 56.131855] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected number_instance (66) from ip discovery blob [ 56.132436] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected hw_id (480) from ip discovery blob [ 56.133053] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected hw_id (608) from ip discovery blob [ 56.133626] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected hw_id (640) from ip discovery blob [ 56.134207] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected number_instance (64) from ip discovery blob [ 56.134780] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected number_instance (64) from ip discovery blob [ 56.135360] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected hw_id (28672) from ip discovery blob Please describe the reason for the failure, and your fix. A Signed-off-by line is missing. diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c index 766006a075ec..a778b0392e9f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c @@ -798,7 +798,7 @@ static int amdgpu_discovery_sysfs_ips(struct amdgpu_device *adev, res = kobject_add(&ip_hw_instance->kobj, NULL, "%d", ip_hw_instance->num_instance); next_ip: - ip_offset += sizeof(*ip) + 4 * (ip->num_base_address - 1); + ip_offset += struct_size(ip, base_address, ip->num_base_address); } } @@ -1063,7 +1063,7 @@ int amdgpu_discovery_reg_base_init(struct amdgpu_device *adev) } next_ip: - ip_offset += sizeof(*ip) + 4 * (ip->num_base_address - 1); + ip_offset += struct_size(ip, base_address, ip->num_base_address); } } @@ -1113,7 +1113,7 @@ int amdgpu_discovery_get_ip_version(struct amdgpu_device *adev, int hw_id, int n *revision = ip->revision; return 0; } - ip_offset += sizeof(*ip) + 4 * (ip->num_base_address - 1); + ip_offset += struct_size(ip, base_address, ip->num_base_address); } } Kind regards, Paul
Re: [PATCH v2] drm/amd/pm: Disable fan control if not supported
Dear Elena, Am 11.04.22 um 15:32 schrieb Elena Sakhnovitch: On Sienna Cichild, not all platforms use PMFW based fan control. On such ASICs fan control by PMFW will be disabled in PPTable. Disable hwmon knobs for fan control also as it is not possible to report or control fan speed on such platforms through driver. v2: FEATURE_FAN_CONTROL_MASK is replaced with FEATURE_FAN_CONTROL_BIT Please add a blank line before the v2 line. Also please give one specific example for a platform without PMFW based fan control. Signed-off-by: Elena Sakhnovitch --- .../drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c index ab3e9d8b831e..ddc388b061b6 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c @@ -371,6 +371,18 @@ static void sienna_cichlid_check_bxco_support(struct smu_context *smu) } } +static void sienna_cichlid_check_fan_support(struct smu_context *smu) +{ + struct smu_table_context *table_context = &smu->smu_table; + PPTable_t *pptable = table_context->driver_pptable; + /* No sort of fan control possible if PPTable has it disabled */ + smu->adev->pm.no_fan = + !(pptable->FeaturesToRun[1] & (1U << FEATURE_FAN_CONTROL_BIT)); + if (smu->adev->pm.no_fan) + dev_info_once(smu->adev->dev, + "PMFW based fan control disabled"); Maybe clarify: … according to PPTable. +} + static int sienna_cichlid_check_powerplay_table(struct smu_context *smu) { struct smu_table_context *table_context = &smu->smu_table; @@ -381,6 +393,7 @@ static int sienna_cichlid_check_powerplay_table(struct smu_context *smu) smu->dc_controlled_by_gpio = true; sienna_cichlid_check_bxco_support(smu); + sienna_cichlid_check_fan_support(smu); table_context->thermal_controller_type = powerplay_table->thermal_controller_type; @@ -410,7 +423,7 @@ static int sienna_cichlid_append_powerplay_table(struct smu_context *smu) GET_PPTABLE_MEMBER(I2cControllers, &table_member); memcpy(table_member, smc_dpm_table->I2cControllers, sizeof(*smc_dpm_table) - sizeof(smc_dpm_table->table_header)); - + Unrelated, and should be a separate patch. return 0; } Kind regards, Paul
[PATCH ] drm/amdgpu: fix discovery ip failed
[AMD Official Use Only] Fix discovery ip failed, and the log: 56.129549] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected number_instance (64) from ip discovery blob [ 56.130129] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected hw_id (45056) from ip discovery blob [ 56.130701] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected number_instance (66) from ip discovery blob [ 56.131283] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected hw_id (45568) from ip discovery blob [ 56.131855] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected number_instance (66) from ip discovery blob [ 56.132436] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected hw_id (480) from ip discovery blob [ 56.133053] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected hw_id (608) from ip discovery blob [ 56.133626] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected hw_id (640) from ip discovery blob [ 56.134207] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected number_instance (64) from ip discovery blob [ 56.134780] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected number_instance (64) from ip discovery blob [ 56.135360] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected hw_id (28672) from ip discovery blob diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c index 766006a075ec..a778b0392e9f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c @@ -798,7 +798,7 @@ static int amdgpu_discovery_sysfs_ips(struct amdgpu_device *adev, res = kobject_add(&ip_hw_instance->kobj, NULL, "%d", ip_hw_instance->num_instance); next_ip: - ip_offset += sizeof(*ip) + 4 * (ip->num_base_address - 1); + ip_offset += struct_size(ip, base_address, ip->num_base_address); } } @@ -1063,7 +1063,7 @@ int amdgpu_discovery_reg_base_init(struct amdgpu_device *adev) } next_ip: - ip_offset += sizeof(*ip) + 4 * (ip->num_base_address - 1); + ip_offset += struct_size(ip, base_address, ip->num_base_address); } } @@ -1113,7 +1113,7 @@ int amdgpu_discovery_get_ip_version(struct amdgpu_device *adev, int hw_id, int n *revision = ip->revision; return 0; } - ip_offset += sizeof(*ip) + 4 * (ip->num_base_address - 1); + ip_offset += struct_size(ip, base_address, ip->num_base_address); } }
回复: [PATCH 1/3] drm/amdgpu: add poison consumption flag for RAS IH
[AMD Official Use Only] Hi Tao, According to the series patches, I have one question, is the ras_ih_flag set according to poison mode configuration, if yes, driver will handle poison once get ecc_irq interrupt, but at this moment there may no app to consumes it, this seems conflict the poison consumption definition. Regards, Stanley > -邮件原件- > 发件人: Zhou1, Tao > 发送时间: Monday, April 11, 2022 7:08 PM > 收件人: amd-gfx@lists.freedesktop.org; Lazar, Lijo ; > Ziya, Mohammad zafar ; Zhang, Hawking > ; Yang, Stanley ; Chai, > Thomas > 抄送: Zhou1, Tao > 主题: [PATCH 1/3] drm/amdgpu: add poison consumption flag for RAS IH > > So we can distinguish RAS poison consumption interrupt from UE interrupt. > > Signed-off-by: Tao Zhou > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h > index 606df8869b89..380f4c3020c7 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h > @@ -314,6 +314,11 @@ enum amdgpu_ras_ret { > AMDGPU_RAS_PT, > }; > > +enum amdgpu_ras_ih_flag { > + AMDGPU_RAS_IH_POISON_CONSUMPTION = 0, > + AMDGPU_RAS_IH_LAST, > +}; > + > struct ras_common_if { > enum amdgpu_ras_block block; > enum amdgpu_ras_error_type type; > @@ -419,6 +424,8 @@ struct ras_ih_data { > unsigned int aligned_element_size; > unsigned int rptr; > unsigned int wptr; > + /* interrupt type flag */ > + unsigned int flag; > }; > > struct ras_manager { > -- > 2.35.1
Re: [PATCH] drm/amdkfd: Asynchronously free events
Am 2022-04-11 um 08:52 schrieb philip yang: On 2022-04-07 22:39, Felix Kuehling wrote: The synchronize_rcu call in destroy_events can take several ms, which noticeably slows down applications destroying many events. Use kfree_rcu to free the event structure asynchronously and eliminate the synchronize_rcu call in the user thread. Will need similar fix in kfd_smi_ev_release, as I noticed sometime it is slower to exit the app receiving SMI event. Reviewed-by: Philip Yang Thanks. In the case of kfd_smi_ev_release you'd need to use call_rcu because it's more than a simple kfree when you're freeing the client->fifo. That would have some implications for module unloading, too. Maybe you can simplify it by using DECLARE_KFIFO in struct kfd_smi_client, since we create the fifo with a fixed size anway. That way the cleanup is just a kfree_rcu(client). Regards, Felix Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 3 +-- drivers/gpu/drm/amd/amdkfd/kfd_events.h | 2 ++ 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c index 0fef24b0b915..75847c5d5957 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c @@ -262,8 +262,7 @@ static void destroy_event(struct kfd_process *p, struct kfd_event *ev) p->signal_event_count--; idr_remove(&p->event_idr, ev->event_id); - synchronize_rcu(); - kfree(ev); + kfree_rcu(ev, rcu); } static void destroy_events(struct kfd_process *p) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.h b/drivers/gpu/drm/amd/amdkfd/kfd_events.h index 55d376f56021..1c62c8dd6460 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.h @@ -70,6 +70,8 @@ struct kfd_event { struct kfd_hsa_memory_exception_data memory_exception_data; struct kfd_hsa_hw_exception_data hw_exception_data; }; + + struct rcu_head rcu; /* for asynchronous kfree_rcu */ }; #define KFD_EVENT_TIMEOUT_IMMEDIATE 0
[PATCH] drm/amd/amdgpu: Not request init data for MS_HYPERV with vega10
MS_HYPERV with vega10 doesn't have the interface to process request init data msg. Check hypervisor type to not send the request for MS_HYPERV. Signed-off-by: Yongqiang Sun --- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c index 933c41f77c92..56b130ec44a9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c @@ -23,6 +23,10 @@ #include +#ifdef CONFIG_X86 +#include +#endif + #include #include "amdgpu.h" @@ -721,8 +725,12 @@ void amdgpu_detect_virtualization(struct amdgpu_device *adev) break; case CHIP_VEGA10: soc15_set_virt_ops(adev); - /* send a dummy GPU_INIT_DATA request to host on vega10 */ - amdgpu_virt_request_init_data(adev); +#ifdef CONFIG_X86 + /* not send GPU_INIT_DATA with MS_HYPERV*/ + if (hypervisor_is_type(X86_HYPER_MS_HYPERV) == false) +#endif + /* send a dummy GPU_INIT_DATA request to host on vega10 */ + amdgpu_virt_request_init_data(adev); break; case CHIP_VEGA20: case CHIP_ARCTURUS: -- 2.25.1
Re: [PATCH v2] drm/amd/pm: Disable fan control if not supported
On 4/11/2022 7:02 PM, Elena Sakhnovitch wrote: On Sienna Cichild, not all platforms use PMFW based fan control. On such ASICs fan control by PMFW will be disabled in PPTable. Disable hwmon knobs for fan control also as it is not possible to report or control fan speed on such platforms through driver. v2: FEATURE_FAN_CONTROL_MASK is replaced with FEATURE_FAN_CONTROL_BIT Signed-off-by: Elena Sakhnovitch --- .../drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c index ab3e9d8b831e..ddc388b061b6 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c @@ -371,6 +371,18 @@ static void sienna_cichlid_check_bxco_support(struct smu_context *smu) } } +static void sienna_cichlid_check_fan_support(struct smu_context *smu) +{ + struct smu_table_context *table_context = &smu->smu_table; + PPTable_t *pptable = table_context->driver_pptable; + /* No sort of fan control possible if PPTable has it disabled */ + smu->adev->pm.no_fan = + !(pptable->FeaturesToRun[1] & (1U << FEATURE_FAN_CONTROL_BIT)); #define FEATURE_FAN_CONTROL_BIT 32 Given the above, this doesn't work. You may cast FeaturesToRun as uin64_t* and use a mask like 1ULL << FEATURE_FAN_CONTROL_BIT Thanks, Lijo + if (smu->adev->pm.no_fan) + dev_info_once(smu->adev->dev, + "PMFW based fan control disabled"); +} + static int sienna_cichlid_check_powerplay_table(struct smu_context *smu) { struct smu_table_context *table_context = &smu->smu_table; @@ -381,6 +393,7 @@ static int sienna_cichlid_check_powerplay_table(struct smu_context *smu) smu->dc_controlled_by_gpio = true; sienna_cichlid_check_bxco_support(smu); + sienna_cichlid_check_fan_support(smu); table_context->thermal_controller_type = powerplay_table->thermal_controller_type; @@ -410,7 +423,7 @@ static int sienna_cichlid_append_powerplay_table(struct smu_context *smu) GET_PPTABLE_MEMBER(I2cControllers, &table_member); memcpy(table_member, smc_dpm_table->I2cControllers, sizeof(*smc_dpm_table) - sizeof(smc_dpm_table->table_header)); - + return 0; }
[PATCH] drm/amd/amdgpu: Not request init data for MS_HYPERV with vega10
MS_HYPERV with vega10 doesn't have the interface to process request init data msg. Check hypervisor type to not send the request for MS_HYPERV. Signed-off-by: Yongqiang Sun --- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c index 933c41f77c92..56b130ec44a9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c @@ -23,6 +23,10 @@ #include +#ifdef CONFIG_X86 +#include +#endif + #include #include "amdgpu.h" @@ -721,8 +725,12 @@ void amdgpu_detect_virtualization(struct amdgpu_device *adev) break; case CHIP_VEGA10: soc15_set_virt_ops(adev); - /* send a dummy GPU_INIT_DATA request to host on vega10 */ - amdgpu_virt_request_init_data(adev); +#ifdef CONFIG_X86 + /* not send GPU_INIT_DATA with MS_HYPERV*/ + if (hypervisor_is_type(X86_HYPER_MS_HYPERV) == false) +#endif + /* send a dummy GPU_INIT_DATA request to host on vega10 */ + amdgpu_virt_request_init_data(adev); break; case CHIP_VEGA20: case CHIP_ARCTURUS: -- 2.25.1
Re: [PATCH v3] drm: add a check to verify the size alignment
Am 11.04.22 um 15:49 schrieb Arunpravin Paneer Selvam: On 11/04/22 7:02 pm, Matthew Auld wrote: On 11/04/2022 13:42, Christian König wrote: Am 11.04.22 um 11:47 schrieb Matthew Auld: On 11/04/2022 08:38, Arunpravin Paneer Selvam wrote: Add a simple check to reject any size not aligned to the min_page_size. when size is not aligned to min_page_size, driver module should handle in their own way either to round_up() the size value to min_page_size or just to enable WARN_ON(). If we dont handle the alignment properly, we may hit the following bug, Unigine Heaven has allocation requests for example required pages are 257 and alignment request is 256. To allocate the left over 1 page, continues the iteration to find the order value which is 0 and when it compares with min_order = 8, triggers the BUG_ON(order < min_order). v2: add more commit description v3: remove WARN_ON() Signed-off-by: Arunpravin Paneer Selvam Suggested-by: Matthew Auld Reviewed-by: Matthew Auld Question here is who will be pushing that to drm-misc-next? Should I take care of that? Yes, please do. I think it's time that Arun should request push permission for drm-misc-next. How to get push permission for drm-misc-next, should I send request mail to maintainers, may be next time I will push myself. See here https://drm.pages.freedesktop.org/maintainer-tools/commit-access.html Regards, Christian. Thanks, Arun Thanks, Christian.
Re: [PATCH v3] drm: add a check to verify the size alignment
On 11/04/22 7:02 pm, Matthew Auld wrote: > On 11/04/2022 13:42, Christian König wrote: >> >> Am 11.04.22 um 11:47 schrieb Matthew Auld: >>> On 11/04/2022 08:38, Arunpravin Paneer Selvam wrote: Add a simple check to reject any size not aligned to the min_page_size. when size is not aligned to min_page_size, driver module should handle in their own way either to round_up() the size value to min_page_size or just to enable WARN_ON(). If we dont handle the alignment properly, we may hit the following bug, Unigine Heaven has allocation requests for example required pages are 257 and alignment request is 256. To allocate the left over 1 page, continues the iteration to find the order value which is 0 and when it compares with min_order = 8, triggers the BUG_ON(order < min_order). v2: add more commit description v3: remove WARN_ON() Signed-off-by: Arunpravin Paneer Selvam Suggested-by: Matthew Auld >>> Reviewed-by: Matthew Auld >>> >> >> Question here is who will be pushing that to drm-misc-next? Should I >> take care of that? > > Yes, please do. > >> >> I think it's time that Arun should request push permission for >> drm-misc-next. How to get push permission for drm-misc-next, should I send request mail to maintainers, may be next time I will push myself. Thanks, Arun >> >> Thanks, >> Christian.
[PATCH v2] drm/amd/pm: Disable fan control if not supported
On Sienna Cichild, not all platforms use PMFW based fan control. On such ASICs fan control by PMFW will be disabled in PPTable. Disable hwmon knobs for fan control also as it is not possible to report or control fan speed on such platforms through driver. v2: FEATURE_FAN_CONTROL_MASK is replaced with FEATURE_FAN_CONTROL_BIT Signed-off-by: Elena Sakhnovitch --- .../drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c index ab3e9d8b831e..ddc388b061b6 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c @@ -371,6 +371,18 @@ static void sienna_cichlid_check_bxco_support(struct smu_context *smu) } } +static void sienna_cichlid_check_fan_support(struct smu_context *smu) +{ + struct smu_table_context *table_context = &smu->smu_table; + PPTable_t *pptable = table_context->driver_pptable; + /* No sort of fan control possible if PPTable has it disabled */ + smu->adev->pm.no_fan = + !(pptable->FeaturesToRun[1] & (1U << FEATURE_FAN_CONTROL_BIT)); + if (smu->adev->pm.no_fan) + dev_info_once(smu->adev->dev, + "PMFW based fan control disabled"); +} + static int sienna_cichlid_check_powerplay_table(struct smu_context *smu) { struct smu_table_context *table_context = &smu->smu_table; @@ -381,6 +393,7 @@ static int sienna_cichlid_check_powerplay_table(struct smu_context *smu) smu->dc_controlled_by_gpio = true; sienna_cichlid_check_bxco_support(smu); + sienna_cichlid_check_fan_support(smu); table_context->thermal_controller_type = powerplay_table->thermal_controller_type; @@ -410,7 +423,7 @@ static int sienna_cichlid_append_powerplay_table(struct smu_context *smu) GET_PPTABLE_MEMBER(I2cControllers, &table_member); memcpy(table_member, smc_dpm_table->I2cControllers, sizeof(*smc_dpm_table) - sizeof(smc_dpm_table->table_header)); - + return 0; } -- 2.25.1