Re: [PATCH] drm/amdgpu: no DC support for headless chips
On Thu, Dec 23, 2021 at 9:54 PM Chen, Guchun wrote: > > [Public] > > For the first two CHIP_HAINAN and CHIP_TOPAZ, using asic_type is fine. But > for CHIP_ARCTURUS and CHIP_ALDEBARAN, I wonder if there is any dc hardware > harvesting info carried by harvest table in VBIOS. If that's the case, I > think we can drop these two, as we can promise it by checking > AMD_HARVEST_IP_DMU_MASK in amdgpu_device_has_dc_support. There is no IP discovery table for these chips, but they don't have any display IPs in the hardcoded IP discovery info in the driver. I don't think this should affect them, but I wasn't sure.. Alex > > Regards, > Guchun > > -Original Message- > From: amd-gfx On Behalf Of Alex > Deucher > Sent: Friday, December 24, 2021 3:20 AM > To: amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander ; tarequemd.ha...@yahoo.com > Subject: [PATCH] drm/amdgpu: no DC support for headless chips > > Chips with no display hardware should return false for DC support. > > Fixes: f7f12b25823c0d ("drm/amdgpu: default to true in > amdgpu_device_asic_has_dc_support") > Signed-off-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 > 1 file changed, 8 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index 9dc86c5a1cad..58e2034984de 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -3166,6 +3166,14 @@ static void amdgpu_device_detect_sriov_bios(struct > amdgpu_device *adev) bool amdgpu_device_asic_has_dc_support(enum > amd_asic_type asic_type) { > switch (asic_type) { > +#ifdef CONFIG_DRM_AMDGPU_SI > + case CHIP_HAINAN: > +#endif > + case CHIP_TOPAZ: > + case CHIP_ARCTURUS: > + case CHIP_ALDEBARAN: > + /* chips with no display hardware */ > + return false; > #if defined(CONFIG_DRM_AMD_DC) > case CHIP_TAHITI: > case CHIP_PITCAIRN: > -- > 2.33.1
RE: [PATCH] drm/amdgpu: put SMU into proper state on runpm suspending for BOCO capable platform
[Public] Never mind. I checked the code in amdgpu_device_supports_boco, it's only true on dGPU. So pls skip my question. Regards, Guchun -Original Message- From: Chen, Guchun Sent: Friday, December 24, 2021 12:42 PM To: Evan Quan ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Quan, Evan Subject: RE: [PATCH] drm/amdgpu: put SMU into proper state on runpm suspending for BOCO capable platform [Public] Thank you Evan. Shall we limit it to dGPU only? With above concern clarified, the patch is: Reviewed-by: Guchun Chen Regards, Guchun -Original Message- From: amd-gfx On Behalf Of Evan Quan Sent: Friday, December 24, 2021 11:17 AM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Quan, Evan ; Chen, Guchun Subject: [PATCH] drm/amdgpu: put SMU into proper state on runpm suspending for BOCO capable platform By setting mp1_state as PP_MP1_STATE_UNLOAD, MP1 will do some proper cleanups and put itself into a state ready for PNP(which fits the scenario BOCO stands for). That can address some random resuming failure observed on BOCO capable platforms. Signed-off-by: Evan Quan Change-Id: I9804c4f04b6d2ef737b076cabf85d2880179efe2 --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index e431c7f10755..ad8370b41e74 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -2235,12 +2235,27 @@ static int amdgpu_pmops_runtime_suspend(struct device *dev) if (amdgpu_device_supports_px(drm_dev)) drm_dev->switch_power_state = DRM_SWITCH_POWER_CHANGING; + /* +* By setting mp1_state as PP_MP1_STATE_UNLOAD, MP1 will do some +* proper cleanups and put itself into a state ready for PNP. That +* can address some random resuming failure observed on BOCO capable +* platforms. +* TODO: this may be also needed for PX capable platform. +*/ + if (amdgpu_device_supports_boco(drm_dev)) + adev->mp1_state = PP_MP1_STATE_UNLOAD; + ret = amdgpu_device_suspend(drm_dev, false); if (ret) { adev->in_runpm = false; + if (amdgpu_device_supports_boco(drm_dev)) + adev->mp1_state = PP_MP1_STATE_NONE; return ret; } + if (amdgpu_device_supports_boco(drm_dev)) + adev->mp1_state = PP_MP1_STATE_NONE; + if (amdgpu_device_supports_px(drm_dev)) { /* Only need to handle PCI state in the driver for ATPX * PCI core handles it for _PR3. -- 2.29.0
Re: [PATCH] drm/amdgpu: put SMU into proper state on runpm suspending for BOCO capable platform
On 12/24/2021 8:46 AM, Evan Quan wrote: By setting mp1_state as PP_MP1_STATE_UNLOAD, MP1 will do some proper cleanups and put itself into a state ready for PNP(which fits the scenario BOCO stands for). "BOCO similar to PNP" is not correct. Mention this as a workaround. With that changed Reviewed-by: Lijo Lazar Thanks, Lijo That can address some random resuming failure observed on BOCO capable platforms. Signed-off-by: Evan Quan Change-Id: I9804c4f04b6d2ef737b076cabf85d2880179efe2 --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index e431c7f10755..ad8370b41e74 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -2235,12 +2235,27 @@ static int amdgpu_pmops_runtime_suspend(struct device *dev) if (amdgpu_device_supports_px(drm_dev)) drm_dev->switch_power_state = DRM_SWITCH_POWER_CHANGING; + /* +* By setting mp1_state as PP_MP1_STATE_UNLOAD, MP1 will do some +* proper cleanups and put itself into a state ready for PNP. That +* can address some random resuming failure observed on BOCO capable +* platforms. +* TODO: this may be also needed for PX capable platform. +*/ + if (amdgpu_device_supports_boco(drm_dev)) + adev->mp1_state = PP_MP1_STATE_UNLOAD; + ret = amdgpu_device_suspend(drm_dev, false); if (ret) { adev->in_runpm = false; + if (amdgpu_device_supports_boco(drm_dev)) + adev->mp1_state = PP_MP1_STATE_NONE; return ret; } + if (amdgpu_device_supports_boco(drm_dev)) + adev->mp1_state = PP_MP1_STATE_NONE; + if (amdgpu_device_supports_px(drm_dev)) { /* Only need to handle PCI state in the driver for ATPX * PCI core handles it for _PR3.
RE: [PATCH] drm/amdgpu: put SMU into proper state on runpm suspending for BOCO capable platform
[Public] Thank you Evan. Shall we limit it to dGPU only? With above concern clarified, the patch is: Reviewed-by: Guchun Chen Regards, Guchun -Original Message- From: amd-gfx On Behalf Of Evan Quan Sent: Friday, December 24, 2021 11:17 AM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Quan, Evan ; Chen, Guchun Subject: [PATCH] drm/amdgpu: put SMU into proper state on runpm suspending for BOCO capable platform By setting mp1_state as PP_MP1_STATE_UNLOAD, MP1 will do some proper cleanups and put itself into a state ready for PNP(which fits the scenario BOCO stands for). That can address some random resuming failure observed on BOCO capable platforms. Signed-off-by: Evan Quan Change-Id: I9804c4f04b6d2ef737b076cabf85d2880179efe2 --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index e431c7f10755..ad8370b41e74 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -2235,12 +2235,27 @@ static int amdgpu_pmops_runtime_suspend(struct device *dev) if (amdgpu_device_supports_px(drm_dev)) drm_dev->switch_power_state = DRM_SWITCH_POWER_CHANGING; + /* +* By setting mp1_state as PP_MP1_STATE_UNLOAD, MP1 will do some +* proper cleanups and put itself into a state ready for PNP. That +* can address some random resuming failure observed on BOCO capable +* platforms. +* TODO: this may be also needed for PX capable platform. +*/ + if (amdgpu_device_supports_boco(drm_dev)) + adev->mp1_state = PP_MP1_STATE_UNLOAD; + ret = amdgpu_device_suspend(drm_dev, false); if (ret) { adev->in_runpm = false; + if (amdgpu_device_supports_boco(drm_dev)) + adev->mp1_state = PP_MP1_STATE_NONE; return ret; } + if (amdgpu_device_supports_boco(drm_dev)) + adev->mp1_state = PP_MP1_STATE_NONE; + if (amdgpu_device_supports_px(drm_dev)) { /* Only need to handle PCI state in the driver for ATPX * PCI core handles it for _PR3. -- 2.29.0
Re: [PATCH] drm/amdgpu: put SMU into proper state on runpm suspending for BOCO capable platform
On Thu, Dec 23, 2021 at 10:17 PM Evan Quan wrote: > > By setting mp1_state as PP_MP1_STATE_UNLOAD, MP1 will do some proper cleanups > and > put itself into a state ready for PNP(which fits the scenario BOCO stands > for). > That can address some random resuming failure observed on BOCO capable > platforms. > > Signed-off-by: Evan Quan > Change-Id: I9804c4f04b6d2ef737b076cabf85d2880179efe2 Acked-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 15 +++ > 1 file changed, 15 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > index e431c7f10755..ad8370b41e74 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > @@ -2235,12 +2235,27 @@ static int amdgpu_pmops_runtime_suspend(struct device > *dev) > if (amdgpu_device_supports_px(drm_dev)) > drm_dev->switch_power_state = DRM_SWITCH_POWER_CHANGING; > > + /* > +* By setting mp1_state as PP_MP1_STATE_UNLOAD, MP1 will do some > +* proper cleanups and put itself into a state ready for PNP. That > +* can address some random resuming failure observed on BOCO capable > +* platforms. > +* TODO: this may be also needed for PX capable platform. > +*/ > + if (amdgpu_device_supports_boco(drm_dev)) > + adev->mp1_state = PP_MP1_STATE_UNLOAD; > + > ret = amdgpu_device_suspend(drm_dev, false); > if (ret) { > adev->in_runpm = false; > + if (amdgpu_device_supports_boco(drm_dev)) > + adev->mp1_state = PP_MP1_STATE_NONE; > return ret; > } > > + if (amdgpu_device_supports_boco(drm_dev)) > + adev->mp1_state = PP_MP1_STATE_NONE; > + > if (amdgpu_device_supports_px(drm_dev)) { > /* Only need to handle PCI state in the driver for ATPX > * PCI core handles it for _PR3. > -- > 2.29.0 >
[PATCH] drm/amdgpu: put SMU into proper state on runpm suspending for BOCO capable platform
By setting mp1_state as PP_MP1_STATE_UNLOAD, MP1 will do some proper cleanups and put itself into a state ready for PNP(which fits the scenario BOCO stands for). That can address some random resuming failure observed on BOCO capable platforms. Signed-off-by: Evan Quan Change-Id: I9804c4f04b6d2ef737b076cabf85d2880179efe2 --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index e431c7f10755..ad8370b41e74 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -2235,12 +2235,27 @@ static int amdgpu_pmops_runtime_suspend(struct device *dev) if (amdgpu_device_supports_px(drm_dev)) drm_dev->switch_power_state = DRM_SWITCH_POWER_CHANGING; + /* +* By setting mp1_state as PP_MP1_STATE_UNLOAD, MP1 will do some +* proper cleanups and put itself into a state ready for PNP. That +* can address some random resuming failure observed on BOCO capable +* platforms. +* TODO: this may be also needed for PX capable platform. +*/ + if (amdgpu_device_supports_boco(drm_dev)) + adev->mp1_state = PP_MP1_STATE_UNLOAD; + ret = amdgpu_device_suspend(drm_dev, false); if (ret) { adev->in_runpm = false; + if (amdgpu_device_supports_boco(drm_dev)) + adev->mp1_state = PP_MP1_STATE_NONE; return ret; } + if (amdgpu_device_supports_boco(drm_dev)) + adev->mp1_state = PP_MP1_STATE_NONE; + if (amdgpu_device_supports_px(drm_dev)) { /* Only need to handle PCI state in the driver for ATPX * PCI core handles it for _PR3. -- 2.29.0
RE: [PATCH] drm/amdgpu: no DC support for headless chips
[Public] For the first two CHIP_HAINAN and CHIP_TOPAZ, using asic_type is fine. But for CHIP_ARCTURUS and CHIP_ALDEBARAN, I wonder if there is any dc hardware harvesting info carried by harvest table in VBIOS. If that's the case, I think we can drop these two, as we can promise it by checking AMD_HARVEST_IP_DMU_MASK in amdgpu_device_has_dc_support. Regards, Guchun -Original Message- From: amd-gfx On Behalf Of Alex Deucher Sent: Friday, December 24, 2021 3:20 AM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; tarequemd.ha...@yahoo.com Subject: [PATCH] drm/amdgpu: no DC support for headless chips Chips with no display hardware should return false for DC support. Fixes: f7f12b25823c0d ("drm/amdgpu: default to true in amdgpu_device_asic_has_dc_support") Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 9dc86c5a1cad..58e2034984de 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3166,6 +3166,14 @@ static void amdgpu_device_detect_sriov_bios(struct amdgpu_device *adev) bool amdgpu_device_asic_has_dc_support(enum amd_asic_type asic_type) { switch (asic_type) { +#ifdef CONFIG_DRM_AMDGPU_SI + case CHIP_HAINAN: +#endif + case CHIP_TOPAZ: + case CHIP_ARCTURUS: + case CHIP_ALDEBARAN: + /* chips with no display hardware */ + return false; #if defined(CONFIG_DRM_AMD_DC) case CHIP_TAHITI: case CHIP_PITCAIRN: -- 2.33.1
Re: [PATCH v2] drm/amd/display: move calcs folder into DML
[AMD Official Use Only] Hi Isabbasso, The patch fails to compile when there is No DCN because the calc object files fail to generate since dml depends on the CONFIG_DRM_AMD_DC_DCN being enabled (Makefile inside dc folder): ifdef CONFIG_DRM_AMD_DC_DCN DC_LIBS += dcn20 DC_LIBS += dsc DC_LIBS += dcn10 dml DC_LIBS += dcn21 DC_LIBS += dcn30 DC_LIBS += dcn301 DC_LIBS += dcn302 DC_LIBS += dcn303 endif A few changes need to be made to the patch, which are: -The Makefile in dc needs the line: DC_LIBS+= dml/calcs -the Makefile in the calcs folder that the patch deletes can be placed inside of dc/dml/calcs instead of adding it to the Makefiles in dc/dml Could you revise your patch based on these changes. Regards, Jasdeep From: isabba...@riseup.net Sent: December 20, 2021 6:23 PM To: Deucher, Alexander ; Koenig, Christian ; dan...@ffwll.ch ; Wentland, Harry ; Siqueira, Rodrigo ; Li, Sun peng (Leo) ; Pan, Xinhui ; Zhuo, Qingqing (Lillian) ; Dhillon, Jasdeep ; m...@igalia.com Cc: amd-gfx@lists.freedesktop.org ; ~lkcamp/patc...@lists.sr.ht <~lkcamp/patc...@lists.sr.ht> Subject: Re: [PATCH v2] drm/amd/display: move calcs folder into DML On 2021-12-20 20:20, Isabella Basso wrote: > The calcs folder has FPU code on it, which should be isolated inside the > DML folder as per > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fseries%2F93042%2Fdata=04%7C01%7Cjasdeep.dhillon%40amd.com%7C01959e019f6e45e25a6208d9c40fc233%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637756394247493762%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=bB4zotGMArbsTzZNDr0u2uw3VBD4jxNornJMol9YJlA%3Dreserved=0. > > This commit aims single-handedly to correct the location of such FPU > code and does not refactor any functions. > > Signed-off-by: Isabella Basso > --- > drivers/gpu/drm/amd/display/dc/Makefile | 2 +- > drivers/gpu/drm/amd/display/dc/calcs/Makefile | 68 --- > drivers/gpu/drm/amd/display/dc/dml/Makefile | 13 +++- > .../amd/display/dc/{ => dml}/calcs/bw_fixed.c | 0 > .../display/dc/{ => dml}/calcs/calcs_logger.h | 0 > .../display/dc/{ => dml}/calcs/custom_float.c | 0 > .../display/dc/{ => dml}/calcs/dce_calcs.c| 0 > .../dc/{ => dml}/calcs/dcn_calc_auto.c| 0 > .../dc/{ => dml}/calcs/dcn_calc_auto.h| 0 > .../dc/{ => dml}/calcs/dcn_calc_math.c| 0 > .../display/dc/{ => dml}/calcs/dcn_calcs.c| 0 > 11 files changed, 13 insertions(+), 70 deletions(-) > delete mode 100644 drivers/gpu/drm/amd/display/dc/calcs/Makefile > rename drivers/gpu/drm/amd/display/dc/{ => dml}/calcs/bw_fixed.c (100%) > rename drivers/gpu/drm/amd/display/dc/{ => dml}/calcs/calcs_logger.h (100%) > rename drivers/gpu/drm/amd/display/dc/{ => dml}/calcs/custom_float.c (100%) > rename drivers/gpu/drm/amd/display/dc/{ => dml}/calcs/dce_calcs.c (100%) > rename drivers/gpu/drm/amd/display/dc/{ => dml}/calcs/dcn_calc_auto.c (100%) > rename drivers/gpu/drm/amd/display/dc/{ => dml}/calcs/dcn_calc_auto.h (100%) > rename drivers/gpu/drm/amd/display/dc/{ => dml}/calcs/dcn_calc_math.c (100%) > rename drivers/gpu/drm/amd/display/dc/{ => dml}/calcs/dcn_calcs.c (100%) > > diff --git a/drivers/gpu/drm/amd/display/dc/Makefile > b/drivers/gpu/drm/amd/display/dc/Makefile > index b1f0d6260226..1872adc96a00 100644 > --- a/drivers/gpu/drm/amd/display/dc/Makefile > +++ b/drivers/gpu/drm/amd/display/dc/Makefile > @@ -23,7 +23,7 @@ > # Makefile for Display Core (dc) component. > # > > -DC_LIBS = basics bios calcs clk_mgr dce gpio irq virtual > +DC_LIBS = basics bios clk_mgr dce gpio irq virtual > > ifdef CONFIG_DRM_AMD_DC_DCN > DC_LIBS += dcn20 > diff --git a/drivers/gpu/drm/amd/display/dc/calcs/Makefile > b/drivers/gpu/drm/amd/display/dc/calcs/Makefile > deleted file mode 100644 > index f3c00f479e1c.. > --- a/drivers/gpu/drm/amd/display/dc/calcs/Makefile > +++ /dev/null > @@ -1,68 +0,0 @@ > -# > -# Copyright 2017 Advanced Micro Devices, Inc. > -# Copyright 2019 Raptor Engineering, LLC > -# > -# Permission is hereby granted, free of charge, to any person obtaining a > -# copy of this software and associated documentation files (the "Software"), > -# to deal in the Software without restriction, including without limitation > -# the rights to use, copy, modify, merge, publish, distribute, sublicense, > -# and/or sell copies of the Software, and to permit persons to whom the > -# Software is furnished to do so, subject to the following conditions: > -# > -# The above copyright notice and this permission notice shall be included in > -# all copies or substantial portions of the Software. > -# > -# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > -# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > -# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > -# THE COPYRIGHT HOLDER(S) OR
Re: [bisected][regression] drm: amdgpu: system hangs on tty login prompt
The attached patch should fix this. Alex On Thu, Dec 23, 2021 at 2:04 PM Tareque Md.Hanif wrote: > > Hi, > > I have been testing the 5.16-rc6 kernel and it hangs on tty login prompt on > my laptop. Sometimes it lets me to login but "startx" hangs the system. The > system is booting fine in 5.15.7. > > So I bisected the bug. > > first bad commit: [f7f12b25823c0dce1165b390522d29f99c4585b4] drm/amdgpu: > default to true in amdgpu_device_asic_has_dc_support > > Reverting this commit fixes the issue. > > Device Information: > Laptop model: Dell Inspiron 15 5567 > GPU 0: Intel HD Graphics 620 > GPU 1: AMD ATI Radeon R7 M445 > > Regards, > Tareque. > From 7b076f1ab562211d2f21d22f2ff9920503796f06 Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Thu, 23 Dec 2021 14:13:02 -0500 Subject: [PATCH] drm/amdgpu: no DC support for headless chips Chips with no display hardware should return false for DC support. Fixes: f7f12b25823c0d ("drm/amdgpu: default to true in amdgpu_device_asic_has_dc_support") Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 9dc86c5a1cad..58e2034984de 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3166,6 +3166,14 @@ static void amdgpu_device_detect_sriov_bios(struct amdgpu_device *adev) bool amdgpu_device_asic_has_dc_support(enum amd_asic_type asic_type) { switch (asic_type) { +#ifdef CONFIG_DRM_AMDGPU_SI + case CHIP_HAINAN: +#endif + case CHIP_TOPAZ: + case CHIP_ARCTURUS: + case CHIP_ALDEBARAN: + /* chips with no display hardware */ + return false; #if defined(CONFIG_DRM_AMD_DC) case CHIP_TAHITI: case CHIP_PITCAIRN: -- 2.33.1
[PATCH] drm/amdgpu: no DC support for headless chips
Chips with no display hardware should return false for DC support. Fixes: f7f12b25823c0d ("drm/amdgpu: default to true in amdgpu_device_asic_has_dc_support") Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 9dc86c5a1cad..58e2034984de 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3166,6 +3166,14 @@ static void amdgpu_device_detect_sriov_bios(struct amdgpu_device *adev) bool amdgpu_device_asic_has_dc_support(enum amd_asic_type asic_type) { switch (asic_type) { +#ifdef CONFIG_DRM_AMDGPU_SI + case CHIP_HAINAN: +#endif + case CHIP_TOPAZ: + case CHIP_ARCTURUS: + case CHIP_ALDEBARAN: + /* chips with no display hardware */ + return false; #if defined(CONFIG_DRM_AMD_DC) case CHIP_TAHITI: case CHIP_PITCAIRN: -- 2.33.1
[PATCH] SWDEV-311259 - dc: move FPU associated DCN302 code to DML
[Why & How] As part of the FPU isolation work documented in https://patchwork.freedesktop.org/series/93042/, isolate code that uses FPU in DCN302 to DML, where all FPU code should locate. Signed-off-by: Jasdeep Dhillon --- CMakeLists.txt | 1 + dc/dcn302/Makefile | 43 ++-- dc/dcn302/dcn302_resource.c | 340 +--- dc/dcn302/dcn302_resource.h | 3 + dc/dml/Makefile | 2 + dc/dml/dcn302/dcn302_fpu.c | 381 dc/dml/dcn302/dcn302_fpu.h | 32 +++ 7 files changed, 448 insertions(+), 354 deletions(-) create mode 100644 dc/dml/dcn302/dcn302_fpu.c create mode 100644 dc/dml/dcn302/dcn302_fpu.h diff --git a/CMakeLists.txt b/CMakeLists.txt index 01c6724df..dfee2cd38 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -243,6 +243,7 @@ list (APPEND srcs "dc/dcn301/dcn301_hubbub.c") list (APPEND srcs "dc/dcn302/dcn302_init.c") list (APPEND srcs "dc/dcn302/dcn302_hwseq.c") list (APPEND srcs "dc/dcn302/dcn302_resource.c") +list (APPEND srcs "dc/dml/dcn302/dcn302_fpu.c") list (APPEND srcs "dc/dcn303/dcn303_init.c") list (APPEND srcs "dc/dcn303/dcn303_hwseq.c") list (APPEND srcs "dc/dcn303/dcn303_resource.c") diff --git a/dc/dcn302/Makefile b/dc/dcn302/Makefile index 101620a88..35a6ffbdd 100644 --- a/dc/dcn302/Makefile +++ b/dc/dcn302/Makefile @@ -1,42 +1,37 @@ # # (c) Copyright 2020 Advanced Micro Devices, Inc. All the rights reserved # -# All rights reserved. This notice is intended as a precaution against -# inadvertent publication and does not imply publication or any waiver -# of confidentiality. The year included in the foregoing notice is the -# year of creation of the work. -# # Authors: AMD # # Makefile for dcn302. DCN3_02 = dcn302_init.o dcn302_hwseq.o dcn302_resource.o -ifdef CONFIG_X86 -CFLAGS_$(AMDDALPATH)/dc/dcn302/dcn302_resource.o := -msse -endif +#ifdef CONFIG_X86 +#CFLAGS_$(AMDDALPATH)/dc/dcn302/dcn302_resource.o := -msse +#endif -ifdef CONFIG_PPC64 -CFLAGS_$(AMDDALPATH)/dc/dcn302/dcn302_resource.o := -mhard-float -maltivec -endif +#ifdef CONFIG_PPC64 +#CFLAGS_$(AMDDALPATH)/dc/dcn302/dcn302_resource.o := -mhard-float -maltivec +#endif -ifdef CONFIG_CC_IS_GCC -ifeq ($(call cc-ifversion, -lt, 0701, y), y) -IS_OLD_GCC = 1 -endif -CFLAGS_$(AMDDALPATH)/dc/dcn302/dcn302_resource.o += -mhard-float -endif +#ifdef CONFIG_CC_IS_GCC +#ifeq ($(call cc-ifversion, -lt, 0701, y), y) +#IS_OLD_GCC = 1 +#endif +#CFLAGS_$(AMDDALPATH)/dc/dcn302/dcn302_resource.o += -mhard-float +#endif -ifdef CONFIG_X86 -ifdef IS_OLD_GCC +#ifdef CONFIG_X86 +#ifdef IS_OLD_GCC # Stack alignment mismatch, proceed with caution. # GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3 # (8B stack alignment). -CFLAGS_$(AMDDALPATH)/dc/dcn302/dcn302_resource.o += -mpreferred-stack-boundary=4 -else -CFLAGS_$(AMDDALPATH)/dc/dcn302/dcn302_resource.o += -msse2 -endif -endif +#CFLAGS_$(AMDDALPATH)/dc/dcn302/dcn302_resource.o += -mpreferred-stack-boundary=4 +#else +#CFLAGS_$(AMDDALPATH)/dc/dcn302/dcn302_resource.o += -msse2 +#endif +#endif AMD_DAL_DCN3_02 = $(addprefix $(AMDDALPATH)/dc/dcn302/,$(DCN3_02)) diff --git a/dc/dcn302/dcn302_resource.c b/dc/dcn302/dcn302_resource.c index a0f02aa9c..8fa290164 100644 --- a/dc/dcn302/dcn302_resource.c +++ b/dc/dcn302/dcn302_resource.c @@ -65,6 +65,8 @@ #include "resource.h" #include "vm_helper.h" +#include "dml/dcn302/dcn302_fpu.h" + #include "include_legacy/dcn3/sienna_cichlid_ip_offset.h" #include "include_legacy/dcn302/dcn_3_0_2_offset.h" #include "include_legacy/dcn302/dcn_3_0_2_sh_mask.h" @@ -81,164 +83,6 @@ #define DC_LOGGER_INIT(logger) #endif -struct _vcs_dpi_ip_params_st dcn3_02_ip = { - .use_min_dcfclk = 0, - .clamp_min_dcfclk = 0, - .odm_capable = 1, - .gpuvm_enable = 1, - .hostvm_enable = 0, - .gpuvm_max_page_table_levels = 4, - .hostvm_max_page_table_levels = 4, - .hostvm_cached_page_table_levels = 0, - .pte_group_size_bytes = 2048, - .num_dsc = 5, - .rob_buffer_size_kbytes = 184, - .det_buffer_size_kbytes = 184, - .dpte_buffer_size_in_pte_reqs_luma = 64, - .dpte_buffer_size_in_pte_reqs_chroma = 34, - .pde_proc_buffer_size_64k_reqs = 48, - .dpp_output_buffer_pixels = 2560, - .opp_output_buffer_lines = 1, - .pixel_chunk_size_kbytes = 8, - .pte_enable = 1, - .max_page_table_levels = 2, - .pte_chunk_size_kbytes = 2, // ? - .meta_chunk_size_kbytes = 2, - .writeback_chunk_size_kbytes = 8, - .line_buffer_size_bits = 789504, - .is_line_buffer_bpp_fixed = 0, // ? - .line_buffer_fixed_bpp = 0, // ? - .dcc_supported = true, -
[RFC v3 5/8] drm/amd/virt: For SRIOV send GPU reset directly to TDR queue.
No need to to trigger another work queue inside the work queue. v3: Problem: Extra reset caused by host side FLR notification following guest side triggered reset. Fix: Preven qeuing flr_work from mailbox irq if guest already executing a reset. Suggested-by: Liu Shaoyun Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 9 ++--- drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 9 ++--- drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c | 9 ++--- 3 files changed, 18 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c index 23b066bcffb2..bdeb8e933bb4 100644 --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c @@ -276,7 +276,7 @@ static void xgpu_ai_mailbox_flr_work(struct work_struct *work) if (amdgpu_device_should_recover_gpu(adev) && (!amdgpu_device_has_job_running(adev) || adev->sdma_timeout == MAX_SCHEDULE_TIMEOUT)) - amdgpu_device_gpu_recover(adev, NULL); + amdgpu_device_gpu_recover_imp(adev, NULL); } static int xgpu_ai_set_mailbox_rcv_irq(struct amdgpu_device *adev, @@ -301,8 +301,11 @@ static int xgpu_ai_mailbox_rcv_irq(struct amdgpu_device *adev, switch (event) { case IDH_FLR_NOTIFICATION: - if (amdgpu_sriov_runtime(adev)) - schedule_work(>virt.flr_work); + if (amdgpu_sriov_runtime(adev) && !amdgpu_in_reset(adev)) + WARN_ONCE(!queue_work(adev->reset_domain.wq, + >virt.flr_work), + "Failed to queue work! at %s", + __FUNCTION__ ); break; case IDH_QUERY_ALIVE: xgpu_ai_mailbox_send_ack(adev); diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c index a35e6d87e537..dd8dc0f6028c 100644 --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c @@ -308,7 +308,7 @@ static void xgpu_nv_mailbox_flr_work(struct work_struct *work) adev->gfx_timeout == MAX_SCHEDULE_TIMEOUT || adev->compute_timeout == MAX_SCHEDULE_TIMEOUT || adev->video_timeout == MAX_SCHEDULE_TIMEOUT)) - amdgpu_device_gpu_recover(adev, NULL); + amdgpu_device_gpu_recover_imp(adev, NULL); } static int xgpu_nv_set_mailbox_rcv_irq(struct amdgpu_device *adev, @@ -336,8 +336,11 @@ static int xgpu_nv_mailbox_rcv_irq(struct amdgpu_device *adev, switch (event) { case IDH_FLR_NOTIFICATION: - if (amdgpu_sriov_runtime(adev)) - schedule_work(>virt.flr_work); + if (amdgpu_sriov_runtime(adev) && !amdgpu_in_reset(adev)) + WARN_ONCE(!queue_work(adev->reset_domain.wq, + >virt.flr_work), + "Failed to queue work! at %s", + __FUNCTION__ ); break; /* READY_TO_ACCESS_GPU is fetched by kernel polling, IRQ can ignore * it byfar since that polling thread will handle it, diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c index aef9d059ae52..c2afb72f97ac 100644 --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c @@ -521,7 +521,7 @@ static void xgpu_vi_mailbox_flr_work(struct work_struct *work) /* Trigger recovery due to world switch failure */ if (amdgpu_device_should_recover_gpu(adev)) - amdgpu_device_gpu_recover(adev, NULL); + amdgpu_device_gpu_recover_imp(adev, NULL); } static int xgpu_vi_set_mailbox_rcv_irq(struct amdgpu_device *adev, @@ -550,8 +550,11 @@ static int xgpu_vi_mailbox_rcv_irq(struct amdgpu_device *adev, r = xgpu_vi_mailbox_rcv_msg(adev, IDH_FLR_NOTIFICATION); /* only handle FLR_NOTIFY now */ - if (!r) - schedule_work(>virt.flr_work); + if (!r && !amdgpu_in_reset(adev)) + WARN_ONCE(!queue_work(adev->reset_domain.wq, + >virt.flr_work), + "Failed to queue work! at %s", + __FUNCTION__ ); } return 0; -- 2.25.1
RE: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV
[AMD Official Use Only] I have a discussion with Andrey about this offline. It seems dangerous to remove the in_gpu_reset and reset_semm directly inside the flr_work. In the case when the reset is triggered from host side , gpu need to be locked while host perform reset after flr_work reply the host with READY_TO_RESET. The original comments seems need to be updated. Regards Shaoyun.liu -Original Message- From: amd-gfx On Behalf Of Andrey Grodzovsky Sent: Wednesday, December 22, 2021 5:14 PM To: dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org Cc: Liu, Monk ; Grodzovsky, Andrey ; Chen, Horace ; Koenig, Christian ; dan...@ffwll.ch Subject: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV Since now flr work is serialized against GPU resets there is no need for this. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 11 --- drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 11 --- 2 files changed, 22 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c index 487cd654b69e..7d59a66e3988 100644 --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c @@ -248,15 +248,7 @@ static void xgpu_ai_mailbox_flr_work(struct work_struct *work) struct amdgpu_device *adev = container_of(virt, struct amdgpu_device, virt); int timeout = AI_MAILBOX_POLL_FLR_TIMEDOUT; - /* block amdgpu_gpu_recover till msg FLR COMPLETE received, -* otherwise the mailbox msg will be ruined/reseted by -* the VF FLR. -*/ - if (!down_write_trylock(>reset_sem)) - return; - amdgpu_virt_fini_data_exchange(adev); - atomic_set(>in_gpu_reset, 1); xgpu_ai_mailbox_trans_msg(adev, IDH_READY_TO_RESET, 0, 0, 0); @@ -269,9 +261,6 @@ static void xgpu_ai_mailbox_flr_work(struct work_struct *work) } while (timeout > 1); flr_done: - atomic_set(>in_gpu_reset, 0); - up_write(>reset_sem); - /* Trigger recovery for world switch failure if no TDR */ if (amdgpu_device_should_recover_gpu(adev) && (!amdgpu_device_has_job_running(adev) || diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c index e3869067a31d..f82c066c8e8d 100644 --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c @@ -277,15 +277,7 @@ static void xgpu_nv_mailbox_flr_work(struct work_struct *work) struct amdgpu_device *adev = container_of(virt, struct amdgpu_device, virt); int timeout = NV_MAILBOX_POLL_FLR_TIMEDOUT; - /* block amdgpu_gpu_recover till msg FLR COMPLETE received, -* otherwise the mailbox msg will be ruined/reseted by -* the VF FLR. -*/ - if (!down_write_trylock(>reset_sem)) - return; - amdgpu_virt_fini_data_exchange(adev); - atomic_set(>in_gpu_reset, 1); xgpu_nv_mailbox_trans_msg(adev, IDH_READY_TO_RESET, 0, 0, 0); @@ -298,9 +290,6 @@ static void xgpu_nv_mailbox_flr_work(struct work_struct *work) } while (timeout > 1); flr_done: - atomic_set(>in_gpu_reset, 0); - up_write(>reset_sem); - /* Trigger recovery for world switch failure if no TDR */ if (amdgpu_device_should_recover_gpu(adev) && (!amdgpu_device_has_job_running(adev) || -- 2.25.1
Re: [PATCH] drm/amd/display: fix dereference before NULL check
On 2021-12-16 13:14, José Expósito wrote: > The "plane_state" pointer was access before checking if it was NULL. > > Avoid a possible NULL pointer dereference by accessing the plane > address after the check. > > Addresses-Coverity-ID: 1474582 ("Dereference before null check") > Fixes: 3f68c01be9a22 ("drm/amd/display: add cyan_skillfish display support") > Signed-off-by: José Expósito Reviewed-by: Harry Wentland Harry > --- > drivers/gpu/drm/amd/display/dc/dcn201/dcn201_hwseq.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/display/dc/dcn201/dcn201_hwseq.c > b/drivers/gpu/drm/amd/display/dc/dcn201/dcn201_hwseq.c > index cfd09b3f705e..fe22530242d2 100644 > --- a/drivers/gpu/drm/amd/display/dc/dcn201/dcn201_hwseq.c > +++ b/drivers/gpu/drm/amd/display/dc/dcn201/dcn201_hwseq.c > @@ -134,11 +134,12 @@ void dcn201_update_plane_addr(const struct dc *dc, > struct pipe_ctx *pipe_ctx) > PHYSICAL_ADDRESS_LOC addr; > struct dc_plane_state *plane_state = pipe_ctx->plane_state; > struct dce_hwseq *hws = dc->hwseq; > - struct dc_plane_address uma = plane_state->address; > + struct dc_plane_address uma; > > if (plane_state == NULL) > return; > > + uma = plane_state->address; > addr_patched = patch_address_for_sbs_tb_stereo(pipe_ctx, ); > > plane_address_in_gpu_space_to_uma(hws, );
Re: [PATCH v3] drm/amdgpu: Call amdgpu_device_unmap_mmio() if device is unplugged to prevent crash in GPU initialization failure
Adding back public list and Leslie specifically. Lijo is right and it's not MTTR release only, also all the unmaps in amdgpu_device_unmap_mmio since this patch makes call to amdgpu_device_unmap_mmio conditioned on device unplugged we need then to still take care of unmapping even when device is NOT unplugged - for this i suggest to look at 'drm/amdgpu: Unmap all MMIO mappings' and just conditionally call all the deleted unmaps in the patch where the condition is 'if (drm_dev_enter(dev))'. Andrey On 2021-12-23 8:47 a.m., Lazar, Lijo wrote: [AMD Official Use Only] Actually, I was asking specifically about this - gmc_v9_0_sw_init -> amdgpu_bo_init->adev->gmc.vram_mtrr = arch_phys_wc_add(adev->gmc.aper_base, adev->gmc.aper_size); As per this patch, if the driver load failed due to some error which happens during hw_init(), this action is not undone. I was thinking why it's not considered important to skip this on driver failure to load. For ex: when some MTRR register is reserved for this mapping. Thanks, Lijo -Original Message- From: Grodzovsky, Andrey Sent: Friday, December 17, 2021 9:05 PM To: Lazar, Lijo Subject: Re: [PATCH v3] drm/amdgpu: Call amdgpu_device_unmap_mmio() if device is unplugged to prevent crash in GPU initialization failure We do unmap on unload, this function is a collection of all MMIO unmapps we already do on unload, it's just does them early on in case of device hot removal. Before pci_driver.remove callback (amdgpu_pci_remove) finish execution. Andrey On 2021-12-17 10:23 a.m., Lazar, Lijo wrote: [AMD Official Use Only] As a side note, even if it's a failed driver load, why it is not important to undo the mappings created during driver load? I'm wondering what is the impact on a system like MI200 A+A. Thanks, Lijo -Original Message- From: amd-gfx On Behalf Of Andrey Grodzovsky Sent: Friday, December 17, 2021 8:32 PM To: Koenig, Christian ; Shi, Leslie ; Pan, Xinhui ; Deucher, Alexander ; amd-gfx@lists.freedesktop.org Cc: Chen, Guchun Subject: Re: [PATCH v3] drm/amdgpu: Call amdgpu_device_unmap_mmio() if device is unplugged to prevent crash in GPU initialization failure Reviewed-by: Andrey Grodzovsky Andrey On 2021-12-17 3:49 a.m., Christian König wrote: Am 17.12.21 um 03:26 schrieb Leslie Shi: [Why] In amdgpu_driver_load_kms, when amdgpu_device_init returns error during driver modprobe, it will start the error handle path immediately and call into amdgpu_device_unmap_mmio as well to release mapped VRAM. However, in the following release callback, driver stills visits the unmapped memory like vcn.inst[i].fw_shared_cpu_addr in vcn_v3_0_sw_fini. So a kernel crash occurs. [How] call amdgpu_device_unmap_mmio() if device is unplugged to prevent invalid memory address in vcn_v3_0_sw_fini() when GPU initialization failure. Signed-off-by: Leslie Shi Looks sane to me, but Andrey should probably nood as well. Acked-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index f31caec669e7..f6a47b927cfd 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3899,7 +3899,9 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev) amdgpu_gart_dummy_page_fini(adev); - amdgpu_device_unmap_mmio(adev); + if (drm_dev_is_unplugged(adev_to_drm(adev))) + amdgpu_device_unmap_mmio(adev); + } void amdgpu_device_fini_sw(struct amdgpu_device *adev)
RE: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV
[AMD Official Use Only] @Chen, Horace @Chen, JingWen @Deng, Emily Please take a review on Andrey's patch Thanks --- Monk Liu | Cloud GPU & Virtualization Solution | AMD --- we are hiring software manager for CVS core team --- -Original Message- From: Koenig, Christian Sent: Thursday, December 23, 2021 4:42 PM To: Grodzovsky, Andrey ; dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org Cc: dan...@ffwll.ch; Liu, Monk ; Chen, Horace Subject: Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV Am 22.12.21 um 23:14 schrieb Andrey Grodzovsky: > Since now flr work is serialized against GPU resets there is no need > for this. > > Signed-off-by: Andrey Grodzovsky Acked-by: Christian König > --- > drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 11 --- > drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 11 --- > 2 files changed, 22 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c > b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c > index 487cd654b69e..7d59a66e3988 100644 > --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c > +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c > @@ -248,15 +248,7 @@ static void xgpu_ai_mailbox_flr_work(struct work_struct > *work) > struct amdgpu_device *adev = container_of(virt, struct amdgpu_device, > virt); > int timeout = AI_MAILBOX_POLL_FLR_TIMEDOUT; > > - /* block amdgpu_gpu_recover till msg FLR COMPLETE received, > - * otherwise the mailbox msg will be ruined/reseted by > - * the VF FLR. > - */ > - if (!down_write_trylock(>reset_sem)) > - return; > - > amdgpu_virt_fini_data_exchange(adev); > - atomic_set(>in_gpu_reset, 1); > > xgpu_ai_mailbox_trans_msg(adev, IDH_READY_TO_RESET, 0, 0, 0); > > @@ -269,9 +261,6 @@ static void xgpu_ai_mailbox_flr_work(struct work_struct > *work) > } while (timeout > 1); > > flr_done: > - atomic_set(>in_gpu_reset, 0); > - up_write(>reset_sem); > - > /* Trigger recovery for world switch failure if no TDR */ > if (amdgpu_device_should_recover_gpu(adev) > && (!amdgpu_device_has_job_running(adev) || diff --git > a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c > b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c > index e3869067a31d..f82c066c8e8d 100644 > --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c > +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c > @@ -277,15 +277,7 @@ static void xgpu_nv_mailbox_flr_work(struct work_struct > *work) > struct amdgpu_device *adev = container_of(virt, struct amdgpu_device, > virt); > int timeout = NV_MAILBOX_POLL_FLR_TIMEDOUT; > > - /* block amdgpu_gpu_recover till msg FLR COMPLETE received, > - * otherwise the mailbox msg will be ruined/reseted by > - * the VF FLR. > - */ > - if (!down_write_trylock(>reset_sem)) > - return; > - > amdgpu_virt_fini_data_exchange(adev); > - atomic_set(>in_gpu_reset, 1); > > xgpu_nv_mailbox_trans_msg(adev, IDH_READY_TO_RESET, 0, 0, 0); > > @@ -298,9 +290,6 @@ static void xgpu_nv_mailbox_flr_work(struct work_struct > *work) > } while (timeout > 1); > > flr_done: > - atomic_set(>in_gpu_reset, 0); > - up_write(>reset_sem); > - > /* Trigger recovery for world switch failure if no TDR */ > if (amdgpu_device_should_recover_gpu(adev) > && (!amdgpu_device_has_job_running(adev) ||
Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV
Am 22.12.21 um 23:14 schrieb Andrey Grodzovsky: Since now flr work is serialized against GPU resets there is no need for this. Signed-off-by: Andrey Grodzovsky Acked-by: Christian König --- drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 11 --- drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 11 --- 2 files changed, 22 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c index 487cd654b69e..7d59a66e3988 100644 --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c @@ -248,15 +248,7 @@ static void xgpu_ai_mailbox_flr_work(struct work_struct *work) struct amdgpu_device *adev = container_of(virt, struct amdgpu_device, virt); int timeout = AI_MAILBOX_POLL_FLR_TIMEDOUT; - /* block amdgpu_gpu_recover till msg FLR COMPLETE received, -* otherwise the mailbox msg will be ruined/reseted by -* the VF FLR. -*/ - if (!down_write_trylock(>reset_sem)) - return; - amdgpu_virt_fini_data_exchange(adev); - atomic_set(>in_gpu_reset, 1); xgpu_ai_mailbox_trans_msg(adev, IDH_READY_TO_RESET, 0, 0, 0); @@ -269,9 +261,6 @@ static void xgpu_ai_mailbox_flr_work(struct work_struct *work) } while (timeout > 1); flr_done: - atomic_set(>in_gpu_reset, 0); - up_write(>reset_sem); - /* Trigger recovery for world switch failure if no TDR */ if (amdgpu_device_should_recover_gpu(adev) && (!amdgpu_device_has_job_running(adev) || diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c index e3869067a31d..f82c066c8e8d 100644 --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c @@ -277,15 +277,7 @@ static void xgpu_nv_mailbox_flr_work(struct work_struct *work) struct amdgpu_device *adev = container_of(virt, struct amdgpu_device, virt); int timeout = NV_MAILBOX_POLL_FLR_TIMEDOUT; - /* block amdgpu_gpu_recover till msg FLR COMPLETE received, -* otherwise the mailbox msg will be ruined/reseted by -* the VF FLR. -*/ - if (!down_write_trylock(>reset_sem)) - return; - amdgpu_virt_fini_data_exchange(adev); - atomic_set(>in_gpu_reset, 1); xgpu_nv_mailbox_trans_msg(adev, IDH_READY_TO_RESET, 0, 0, 0); @@ -298,9 +290,6 @@ static void xgpu_nv_mailbox_flr_work(struct work_struct *work) } while (timeout > 1); flr_done: - atomic_set(>in_gpu_reset, 0); - up_write(>reset_sem); - /* Trigger recovery for world switch failure if no TDR */ if (amdgpu_device_should_recover_gpu(adev) && (!amdgpu_device_has_job_running(adev) ||
Re: [RFC v2 4/8] drm/amdgpu: Serialize non TDR gpu recovery with TDRs
Am 22.12.21 um 23:05 schrieb Andrey Grodzovsky: Use reset domain wq also for non TDR gpu recovery trigers such as sysfs and RAS. We must serialize all possible GPU recoveries to gurantee no concurrency there. For TDR call the original recovery function directly since it's already executed from within the wq. For others just use a wrapper to qeueue work and wait on it to finish. v2: Rename to amdgpu_recover_work_struct Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 33 +- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 2 +- 3 files changed, 35 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index b5ff76aae7e0..8e96b9a14452 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1296,6 +1296,8 @@ bool amdgpu_device_has_job_running(struct amdgpu_device *adev); bool amdgpu_device_should_recover_gpu(struct amdgpu_device *adev); int amdgpu_device_gpu_recover(struct amdgpu_device *adev, struct amdgpu_job* job); +int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, + struct amdgpu_job *job); void amdgpu_device_pci_config_reset(struct amdgpu_device *adev); int amdgpu_device_pci_reset(struct amdgpu_device *adev); bool amdgpu_device_need_post(struct amdgpu_device *adev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 7c063fd37389..258ec3c0b2af 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4979,7 +4979,7 @@ static void amdgpu_device_recheck_guilty_jobs( * Returns 0 for success or an error on failure. */ -int amdgpu_device_gpu_recover(struct amdgpu_device *adev, +int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, struct amdgpu_job *job) { struct list_head device_list, *device_list_handle = NULL; @@ -5237,6 +5237,37 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, return r; } +struct amdgpu_recover_work_struct { + struct work_struct base; + struct amdgpu_device *adev; + struct amdgpu_job *job; + int ret; +}; + +static void amdgpu_device_queue_gpu_recover_work(struct work_struct *work) +{ + struct amdgpu_recover_work_struct *recover_work = container_of(work, struct amdgpu_recover_work_struct, base); + + recover_work->ret = amdgpu_device_gpu_recover_imp(recover_work->adev, recover_work->job); +} +/* + * Serialize gpu recover into reset domain single threaded wq + */ +int amdgpu_device_gpu_recover(struct amdgpu_device *adev, + struct amdgpu_job *job) +{ + struct amdgpu_recover_work_struct work = {.adev = adev, .job = job}; + + INIT_WORK(, amdgpu_device_queue_gpu_recover_work); + + if (!queue_work(adev->reset_domain.wq, )) + return -EAGAIN; + + flush_work(); + + return work.ret; +} + /** * amdgpu_device_get_pcie_info - fence pcie info about the PCIE slot * diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index bfc47bea23db..38c9fd7b7ad4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -63,7 +63,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job) ti.process_name, ti.tgid, ti.task_name, ti.pid); if (amdgpu_device_should_recover_gpu(ring->adev)) { - amdgpu_device_gpu_recover(ring->adev, job); + amdgpu_device_gpu_recover_imp(ring->adev, job); } else { drm_sched_suspend_timeout(>sched); if (amdgpu_sriov_vf(adev))
Re: [RFC v2 3/8] drm/amdgpu: Fix crash on modprobe
Am 22.12.21 um 23:05 schrieb Andrey Grodzovsky: Restrict jobs resubmission to suspend case only since schedulers not initialised yet on probe. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 5527c68c51de..8ebd954e06c6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -582,7 +582,7 @@ void amdgpu_fence_driver_hw_init(struct amdgpu_device *adev) if (!ring || !ring->fence_drv.initialized) continue; - if (!ring->no_scheduler) { + if (adev->in_suspend && !ring->no_scheduler) { Please add a TODO comment, something like "restructure resume to make that unnecessary". With that done the patch is Reviewed-by: Christian König as well. Christian. drm_sched_resubmit_jobs(>sched); drm_sched_start(>sched, true); }
Re: [RFC v2 2/8] drm/amdgpu: Move scheduler init to after XGMI is ready
Am 22.12.21 um 23:05 schrieb Andrey Grodzovsky: Before we initialize schedulers we must know which reset domain are we in - for single device there iis a single domain per device and so single wq per device. For XGMI the reset domain spans the entire XGMI hive and so the reset wq is per hive. Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 45 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 34 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 + 3 files changed, 51 insertions(+), 30 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 0f3e6c078f88..7c063fd37389 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2284,6 +2284,47 @@ static int amdgpu_device_fw_loading(struct amdgpu_device *adev) return r; } +static int amdgpu_device_init_schedulers(struct amdgpu_device *adev) +{ + long timeout; + int r, i; + + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { + struct amdgpu_ring *ring = adev->rings[i]; + + /* No need to setup the GPU scheduler for rings that don't need it */ + if (!ring || ring->no_scheduler) + continue; + + switch (ring->funcs->type) { + case AMDGPU_RING_TYPE_GFX: + timeout = adev->gfx_timeout; + break; + case AMDGPU_RING_TYPE_COMPUTE: + timeout = adev->compute_timeout; + break; + case AMDGPU_RING_TYPE_SDMA: + timeout = adev->sdma_timeout; + break; + default: + timeout = adev->video_timeout; + break; + } + + r = drm_sched_init(>sched, _sched_ops, + ring->num_hw_submission, amdgpu_job_hang_limit, + timeout, adev->reset_domain.wq, ring->sched_score, ring->name); + if (r) { + DRM_ERROR("Failed to create scheduler on ring %s.\n", + ring->name); + return r; + } + } + + return 0; +} + + /** * amdgpu_device_ip_init - run init for hardware IPs * @@ -2412,6 +2453,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) } } + r = amdgpu_device_init_schedulers(adev); + if (r) + goto init_failed; + /* Don't init kfd if whole hive need to be reset during init */ if (!adev->gmc.xgmi.pending_reset) amdgpu_amdkfd_device_init(adev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 3b7e86ea7167..5527c68c51de 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -456,8 +456,6 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring, atomic_t *sched_score) { struct amdgpu_device *adev = ring->adev; - long timeout; - int r; if (!adev) return -EINVAL; @@ -477,36 +475,12 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring, spin_lock_init(>fence_drv.lock); ring->fence_drv.fences = kcalloc(num_hw_submission * 2, sizeof(void *), GFP_KERNEL); - if (!ring->fence_drv.fences) - return -ENOMEM; - /* No need to setup the GPU scheduler for rings that don't need it */ - if (ring->no_scheduler) - return 0; + ring->num_hw_submission = num_hw_submission; + ring->sched_score = sched_score; - switch (ring->funcs->type) { - case AMDGPU_RING_TYPE_GFX: - timeout = adev->gfx_timeout; - break; - case AMDGPU_RING_TYPE_COMPUTE: - timeout = adev->compute_timeout; - break; - case AMDGPU_RING_TYPE_SDMA: - timeout = adev->sdma_timeout; - break; - default: - timeout = adev->video_timeout; - break; - } - - r = drm_sched_init(>sched, _sched_ops, - num_hw_submission, amdgpu_job_hang_limit, - timeout, NULL, sched_score, ring->name); - if (r) { - DRM_ERROR("Failed to create scheduler on ring %s.\n", - ring->name); - return r; - } + if (!ring->fence_drv.fences) + return -ENOMEM; return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index 4d380e79752c..a4b8279e3011 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++