RE: [PATCH] drm/amdgpu: Move common initialization operations of each ras block to one function
[AMD Official Use Only] Hi Stanley, Thanks for your suggestion. I add a comment after your comment. -Original Message- From: Yang, Stanley Sent: Tuesday, March 1, 2022 9:50 PM To: Chai, Thomas ; amd-gfx@lists.freedesktop.org Cc: Zhou1, Tao ; Zhang, Hawking ; Clements, John ; Chai, Thomas ; Chai, Thomas Subject: 回复: [PATCH] drm/amdgpu: Move common initialization operations of each ras block to one function [AMD Official Use Only] Hi yipe, One suggestion for this patch, please check my comment. Regards, Stanley > -邮件原件- > 发件人: amd-gfx 代表 yipechai > 发送时间: Tuesday, March 1, 2022 5:46 PM > 收件人: amd-gfx@lists.freedesktop.org > 抄送: Zhou1, Tao ; Zhang, Hawking > ; Clements, John ; Chai, > Thomas ; Chai, Thomas > 主题: [PATCH] drm/amdgpu: Move common initialization operations of each > ras block to one function > > Define amdgpu_ras_sw_init function to initialize all ras blocks. > > Signed-off-by: yipechai > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 + > drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c| 2 - > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 143 > - > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h| 1 + > drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 21 --- > drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 16 --- > drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 28 > drivers/gpu/drm/amd/amdgpu/mca_v3_0.c | 6 - > drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 17 --- > 9 files changed, 148 insertions(+), 92 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index 6113ddc765a7..72550e9f6058 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -2402,6 +2402,12 @@ static int amdgpu_device_ip_init(struct > amdgpu_device *adev) > } > } > > + r = amdgpu_ras_sw_init(adev); > + if (r) { > + DRM_ERROR("amdgpu_ras_early_init failed (%d).\n", r); > + goto init_failed; > + } > [Yang, Stanley] : This is ras blocks early init, I think it's more > reasonable to move amdgpu_ras_sw_init before amdgpu_ras_init function. [Thomas] Sorry, I will change this error message print. I also agree with you to initialize all ras operations before amdgpu_ras_init, but the initialization place of each ras instance is different. The ras block instances of mmhub umc sdma and hdp are initialized in IP .early_init function, but gfx and mca ras block instances are initialized in the IP .sw_init function. Then, after calling all IP .sw_init initialization and then calling amdgpu_ras_sw_init may be the earliest place to initialize all ras blocks. > + > if (amdgpu_sriov_vf(adev)) > amdgpu_virt_init_data_exchange(adev); > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c > index ab75e189bc0b..544241f357b2 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c > @@ -440,8 +440,6 @@ int amdgpu_gmc_ras_early_init(struct amdgpu_device > *adev) { > if (!adev->gmc.xgmi.connected_to_cpu) { > adev->gmc.xgmi.ras = _ras; > - amdgpu_ras_register_ras_block(adev, > >gmc.xgmi.ras->ras_block); > - adev->gmc.xgmi.ras_if = >gmc.xgmi.ras- > >ras_block.ras_comm; > } > > return 0; > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > index d3875618ebf5..89075ab9e82e 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > @@ -2299,8 +2299,6 @@ int amdgpu_ras_init(struct amdgpu_device *adev) > case CHIP_ALDEBARAN: > if (!adev->gmc.xgmi.connected_to_cpu) { > adev->nbio.ras = _v7_4_ras; > - amdgpu_ras_register_ras_block(adev, > >nbio.ras->ras_block); > - adev->nbio.ras_if = >nbio.ras- > >ras_block.ras_comm; > } > break; > default: > @@ -2533,6 +2531,147 @@ void amdgpu_ras_suspend(struct amdgpu_device > *adev) > amdgpu_ras_disable_all_features(adev, 1); } > > +int amdgpu_ras_sw_init(struct amdgpu_device *adev) { > + int err = 0; > + > + if (!amdgpu_ras_asic_supported(adev)) > + return 0; > + > + if (adev->nbio.ras) { > + err = amdgpu_ras_register_ras_block(adev, > >nbio.ras->ras_block); > + if (err) { > +
撤回: [PATCH] drm/amdgpu: Move common initialization operations of each ras block to one function
Yang, Stanley 将撤回邮件“[PATCH] drm/amdgpu: Move common initialization operations of each ras block to one function”。
回复: [PATCH] drm/amdgpu: Move common initialization operations of each ras block to one function
[AMD Official Use Only] Hi yipe, One suggestion for this patch, please check my comment. Regards, Stanley > -邮件原件- > 发件人: amd-gfx 代表 yipechai > 发送时间: Tuesday, March 1, 2022 5:46 PM > 收件人: amd-gfx@lists.freedesktop.org > 抄送: Zhou1, Tao ; Zhang, Hawking > ; Clements, John ; > Chai, Thomas ; Chai, Thomas > > 主题: [PATCH] drm/amdgpu: Move common initialization operations of each > ras block to one function > > Define amdgpu_ras_sw_init function to initialize all ras blocks. > > Signed-off-by: yipechai > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 + > drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c| 2 - > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 143 > - > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h| 1 + > drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 21 --- > drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 16 --- > drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 28 > drivers/gpu/drm/amd/amdgpu/mca_v3_0.c | 6 - > drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 17 --- > 9 files changed, 148 insertions(+), 92 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index 6113ddc765a7..72550e9f6058 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -2402,6 +2402,12 @@ static int amdgpu_device_ip_init(struct > amdgpu_device *adev) > } > } > > + r = amdgpu_ras_sw_init(adev); > + if (r) { > + DRM_ERROR("amdgpu_ras_early_init failed (%d).\n", r); > + goto init_failed; > + } [Yang, Stanley] : This is ras blocks early init, I think it's more reasonable to move amdgpu_ras_sw_init before amdgpu_ras_init function. > + > if (amdgpu_sriov_vf(adev)) > amdgpu_virt_init_data_exchange(adev); > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c > index ab75e189bc0b..544241f357b2 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c > @@ -440,8 +440,6 @@ int amdgpu_gmc_ras_early_init(struct > amdgpu_device *adev) { > if (!adev->gmc.xgmi.connected_to_cpu) { > adev->gmc.xgmi.ras = _ras; > - amdgpu_ras_register_ras_block(adev, > >gmc.xgmi.ras->ras_block); > - adev->gmc.xgmi.ras_if = >gmc.xgmi.ras- > >ras_block.ras_comm; > } > > return 0; > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > index d3875618ebf5..89075ab9e82e 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > @@ -2299,8 +2299,6 @@ int amdgpu_ras_init(struct amdgpu_device *adev) > case CHIP_ALDEBARAN: > if (!adev->gmc.xgmi.connected_to_cpu) { > adev->nbio.ras = _v7_4_ras; > - amdgpu_ras_register_ras_block(adev, > >nbio.ras->ras_block); > - adev->nbio.ras_if = >nbio.ras- > >ras_block.ras_comm; > } > break; > default: > @@ -2533,6 +2531,147 @@ void amdgpu_ras_suspend(struct > amdgpu_device *adev) > amdgpu_ras_disable_all_features(adev, 1); } > > +int amdgpu_ras_sw_init(struct amdgpu_device *adev) { > + int err = 0; > + > + if (!amdgpu_ras_asic_supported(adev)) > + return 0; > + > + if (adev->nbio.ras) { > + err = amdgpu_ras_register_ras_block(adev, > >nbio.ras->ras_block); > + if (err) { > + dev_err(adev->dev, "Failed to register nbio ras > block!\n"); > + return err; > + } > + adev->nbio.ras_if = >nbio.ras->ras_block.ras_comm; > + } > + > + if (adev->gmc.xgmi.ras) { > + err = amdgpu_ras_register_ras_block(adev, > >gmc.xgmi.ras->ras_block); > + if (err) { > + dev_err(adev->dev, "Failed to register xgmi ras > block!\n"); > + return err; > + } > + adev->gmc.xgmi.ras_if = >gmc.xgmi.ras- > >ras_block.ras_comm; > + } > + > + if (adev->gfx.ras) { > + err = amdgpu_ras_register_ras_block(adev, >gfx.ras- > >ras_block); > + if (err) { > + dev_err(adev->dev, "Failed to register gfx ras > block!\n"); > + return err; > + } > + > + strc
[PATCH] drm/amdgpu: Move common initialization operations of each ras block to one function
Define amdgpu_ras_sw_init function to initialize all ras blocks. Signed-off-by: yipechai --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 + drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c| 2 - drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 143 - drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h| 1 + drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 21 --- drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 16 --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 28 drivers/gpu/drm/amd/amdgpu/mca_v3_0.c | 6 - drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 17 --- 9 files changed, 148 insertions(+), 92 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 6113ddc765a7..72550e9f6058 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2402,6 +2402,12 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) } } + r = amdgpu_ras_sw_init(adev); + if (r) { + DRM_ERROR("amdgpu_ras_early_init failed (%d).\n", r); + goto init_failed; + } + if (amdgpu_sriov_vf(adev)) amdgpu_virt_init_data_exchange(adev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c index ab75e189bc0b..544241f357b2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c @@ -440,8 +440,6 @@ int amdgpu_gmc_ras_early_init(struct amdgpu_device *adev) { if (!adev->gmc.xgmi.connected_to_cpu) { adev->gmc.xgmi.ras = _ras; - amdgpu_ras_register_ras_block(adev, >gmc.xgmi.ras->ras_block); - adev->gmc.xgmi.ras_if = >gmc.xgmi.ras->ras_block.ras_comm; } return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index d3875618ebf5..89075ab9e82e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -2299,8 +2299,6 @@ int amdgpu_ras_init(struct amdgpu_device *adev) case CHIP_ALDEBARAN: if (!adev->gmc.xgmi.connected_to_cpu) { adev->nbio.ras = _v7_4_ras; - amdgpu_ras_register_ras_block(adev, >nbio.ras->ras_block); - adev->nbio.ras_if = >nbio.ras->ras_block.ras_comm; } break; default: @@ -2533,6 +2531,147 @@ void amdgpu_ras_suspend(struct amdgpu_device *adev) amdgpu_ras_disable_all_features(adev, 1); } +int amdgpu_ras_sw_init(struct amdgpu_device *adev) +{ + int err = 0; + + if (!amdgpu_ras_asic_supported(adev)) + return 0; + + if (adev->nbio.ras) { + err = amdgpu_ras_register_ras_block(adev, >nbio.ras->ras_block); + if (err) { + dev_err(adev->dev, "Failed to register nbio ras block!\n"); + return err; + } + adev->nbio.ras_if = >nbio.ras->ras_block.ras_comm; + } + + if (adev->gmc.xgmi.ras) { + err = amdgpu_ras_register_ras_block(adev, >gmc.xgmi.ras->ras_block); + if (err) { + dev_err(adev->dev, "Failed to register xgmi ras block!\n"); + return err; + } + adev->gmc.xgmi.ras_if = >gmc.xgmi.ras->ras_block.ras_comm; + } + + if (adev->gfx.ras) { + err = amdgpu_ras_register_ras_block(adev, >gfx.ras->ras_block); + if (err) { + dev_err(adev->dev, "Failed to register gfx ras block!\n"); + return err; + } + + strcpy(adev->gfx.ras->ras_block.ras_comm.name, "gfx"); + adev->gfx.ras->ras_block.ras_comm.block = AMDGPU_RAS_BLOCK__GFX; + adev->gfx.ras->ras_block.ras_comm.type = AMDGPU_RAS_ERROR__MULTI_UNCORRECTABLE; + adev->gfx.ras_if = >gfx.ras->ras_block.ras_comm; + + /* If not define special ras_late_init function, use gfx default ras_late_init */ + if (!adev->gfx.ras->ras_block.ras_late_init) + adev->gfx.ras->ras_block.ras_late_init = amdgpu_gfx_ras_late_init; + + /* If not defined special ras_cb function, use default ras_cb */ + if (!adev->gfx.ras->ras_block.ras_cb) + adev->gfx.ras->ras_block.ras_cb = amdgpu_gfx_process_ras_data_cb; + } + + if (adev->umc.ras) { + err = amdgpu_ras_register_ras_block(adev, >umc.ras->ras_block); + if (err) { + dev_err(adev->dev, "Failed to register umc ras block!\n"); + return err; + } + + strcpy(adev->umc.ras->ras_block.ras_comm.name, "umc"); +