RE: [PATCH] drm/amdgpu: Fix the warning info when removing amdgpu device
[AMD Official Use Only - General] Thanks, will update it. Regards, Lyndon > -Original Message- > From: Christian König > Sent: Tuesday, March 7, 2023 2:22 PM > To: Chen, Guchun ; Li, Lyndon > ; amd-gfx@lists.freedesktop.org > Cc: Xu, Feifei ; Ma, Jun ; > Prosyak, Vitaly ; Deucher, Alexander > ; Koenig, Christian > > Subject: Re: [PATCH] drm/amdgpu: Fix the warning info when removing > amdgpu device > > The commit message reads a bit bumpy. Generally best practice are: > > Short (72 chars or less) summary > > More detailed explanatory text. Wrap it to 72 characters. The blank line > separating the summary from the body is critical (unless you omit the body > entirely). > > Write your commit message in the imperative: "Fix bug" and not "Fixed bug" > or "Fixes bug." This convention matches up with commit messages > generated by commands like git merge and git revert. > > Further paragraphs come after blank lines. > > - Bullet points are okay, too. > - Typically a hyphen or asterisk is used for the bullet, followed by a > single space. Use a hanging indent. > > Apart from that the patch is Acked-by: Christian König > > > Regards, > Christian. > > Am 07.03.23 um 03:23 schrieb Chen, Guchun: > > Reviewed-by: Guchun Chen > > > > Regards, > > Guchun > > > > -Original Message- > > From: lyndonli > > Sent: Tuesday, March 7, 2023 10:12 AM > > To: amd-gfx@lists.freedesktop.org > > Cc: Prosyak, Vitaly ; Koenig, Christian > > ; Deucher, Alexander > > ; Chen, Guchun > ; Xu, > > Feifei ; Ma, Jun ; Li, Lyndon > > > > Subject: [PATCH] drm/amdgpu: Fix the warning info when removing > amdgpu > > device > > > > Actually, the drm_dev_enter in psp_cmd_submit_buf does not protect > anything. > > And it is not used to prevent concurrent access. > > If DRM device is unplugged, it will always check the condition in WARN_ON. > > We'd better not keep adding commands to the list. > > Simply moving the drm_dev_enter/drm_dev_exit higher level will not > solve the issue. > > Because psp_cmd_submit_buf is called in many places, such as > psp_hw_init-->psp_load_fw, psp_suspend-->psp_xgmi_terminate, > amdgpu_device_gpu_recover-->amdgpu_ras_suspend. > > So drop drm_dev_enter/drm_dev_exit in psp_cmd_submit_buf. > > > > When removing amdgpu, the calling order as follows: > > amdgpu_pci_remove > > drm_dev_unplug > > amdgpu_driver_unload_kms > > amdgpu_device_fini_hw > > amdgpu_device_ip_fini_early > > psp_hw_fini > > psp_ras_terminate > > psp_ta_unloadye > > > psp_cmd_submit_buf > > > > [ 4507.740388] Call Trace: > > [ 4507.740389] > > [ 4507.740391] psp_ta_unload+0x44/0x70 [amdgpu] [ 4507.740485] > > psp_ras_terminate+0x4d/0x70 [amdgpu] [ 4507.740575] > > psp_hw_fini+0x28/0xa0 [amdgpu] [ 4507.740662] > > amdgpu_device_fini_hw+0x328/0x442 [amdgpu] [ 4507.740791] > > amdgpu_driver_unload_kms+0x51/0x60 [amdgpu] [ 4507.740875] > > amdgpu_pci_remove+0x5a/0x140 [amdgpu] [ 4507.740962] ? > > _raw_spin_unlock_irqrestore+0x27/0x43 > > [ 4507.740965] ? __pm_runtime_resume+0x60/0x90 [ 4507.740968] > > pci_device_remove+0x39/0xb0 [ 4507.740971] device_remove+0x46/0x70 > [ > > 4507.740972] device_release_driver_internal+0xd1/0x160 > > [ 4507.740974] driver_detach+0x4a/0x90 [ 4507.740975] > > bus_remove_driver+0x6c/0xf0 [ 4507.740976] > > driver_unregister+0x31/0x50 [ 4507.740977] > > pci_unregister_driver+0x40/0x90 [ 4507.740978] amdgpu_exit+0x15/0x120 > > [amdgpu] > > > > Signed-off-by: lyndonli > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 17 + > > 1 file changed, 1 insertion(+), 16 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > > index 4c617faaa7c9..02f948adae72 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > > @@ -603,27 +603,14 @@ psp_cmd_submit_buf(struct psp_context *psp, > >struct psp_gfx_cmd_resp *cmd, uint64_t fence_mc_addr) > { > > int ret; > > - int index, idx; > > + int index; > > int timeout = 2; > > bool ras_intr = false; > > bool skip_unsupport = false; > > - bool dev_entered; > > > > if (psp->a
Re: [PATCH] drm/amdgpu: Fix the warning info when removing amdgpu device
The commit message reads a bit bumpy. Generally best practice are: Short (72 chars or less) summary More detailed explanatory text. Wrap it to 72 characters. The blank line separating the summary from the body is critical (unless you omit the body entirely). Write your commit message in the imperative: "Fix bug" and not "Fixed bug" or "Fixes bug." This convention matches up with commit messages generated by commands like git merge and git revert. Further paragraphs come after blank lines. - Bullet points are okay, too. - Typically a hyphen or asterisk is used for the bullet, followed by a single space. Use a hanging indent. Apart from that the patch is Acked-by: Christian König Regards, Christian. Am 07.03.23 um 03:23 schrieb Chen, Guchun: Reviewed-by: Guchun Chen Regards, Guchun -Original Message- From: lyndonli Sent: Tuesday, March 7, 2023 10:12 AM To: amd-gfx@lists.freedesktop.org Cc: Prosyak, Vitaly ; Koenig, Christian ; Deucher, Alexander ; Chen, Guchun ; Xu, Feifei ; Ma, Jun ; Li, Lyndon Subject: [PATCH] drm/amdgpu: Fix the warning info when removing amdgpu device Actually, the drm_dev_enter in psp_cmd_submit_buf does not protect anything. And it is not used to prevent concurrent access. If DRM device is unplugged, it will always check the condition in WARN_ON. We'd better not keep adding commands to the list. Simply moving the drm_dev_enter/drm_dev_exit higher level will not solve the issue. Because psp_cmd_submit_buf is called in many places, such as psp_hw_init-->psp_load_fw, psp_suspend-->psp_xgmi_terminate, amdgpu_device_gpu_recover-->amdgpu_ras_suspend. So drop drm_dev_enter/drm_dev_exit in psp_cmd_submit_buf. When removing amdgpu, the calling order as follows: amdgpu_pci_remove drm_dev_unplug amdgpu_driver_unload_kms amdgpu_device_fini_hw amdgpu_device_ip_fini_early psp_hw_fini psp_ras_terminate psp_ta_unloadye psp_cmd_submit_buf [ 4507.740388] Call Trace: [ 4507.740389] [ 4507.740391] psp_ta_unload+0x44/0x70 [amdgpu] [ 4507.740485] psp_ras_terminate+0x4d/0x70 [amdgpu] [ 4507.740575] psp_hw_fini+0x28/0xa0 [amdgpu] [ 4507.740662] amdgpu_device_fini_hw+0x328/0x442 [amdgpu] [ 4507.740791] amdgpu_driver_unload_kms+0x51/0x60 [amdgpu] [ 4507.740875] amdgpu_pci_remove+0x5a/0x140 [amdgpu] [ 4507.740962] ? _raw_spin_unlock_irqrestore+0x27/0x43 [ 4507.740965] ? __pm_runtime_resume+0x60/0x90 [ 4507.740968] pci_device_remove+0x39/0xb0 [ 4507.740971] device_remove+0x46/0x70 [ 4507.740972] device_release_driver_internal+0xd1/0x160 [ 4507.740974] driver_detach+0x4a/0x90 [ 4507.740975] bus_remove_driver+0x6c/0xf0 [ 4507.740976] driver_unregister+0x31/0x50 [ 4507.740977] pci_unregister_driver+0x40/0x90 [ 4507.740978] amdgpu_exit+0x15/0x120 [amdgpu] Signed-off-by: lyndonli --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 17 + 1 file changed, 1 insertion(+), 16 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index 4c617faaa7c9..02f948adae72 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -603,27 +603,14 @@ psp_cmd_submit_buf(struct psp_context *psp, struct psp_gfx_cmd_resp *cmd, uint64_t fence_mc_addr) { int ret; - int index, idx; + int index; int timeout = 2; bool ras_intr = false; bool skip_unsupport = false; - bool dev_entered; if (psp->adev->no_hw_access) return 0; - dev_entered = drm_dev_enter(adev_to_drm(psp->adev), &idx); - /* -* We allow sending PSP messages LOAD_ASD and UNLOAD_TA without acquiring -* a lock in drm_dev_enter during driver unload because we must call -* drm_dev_unplug as the beginning of unload driver sequence . It is very -* crucial that userspace can't access device instances anymore. -*/ - if (!dev_entered) - WARN_ON(psp->cmd_buf_mem->cmd_id != GFX_CMD_ID_LOAD_ASD && - psp->cmd_buf_mem->cmd_id != GFX_CMD_ID_UNLOAD_TA && - psp->cmd_buf_mem->cmd_id != GFX_CMD_ID_INVOKE_CMD); - memset(psp->cmd_buf_mem, 0, PSP_CMD_BUFFER_SIZE); memcpy(psp->cmd_buf_mem, cmd, sizeof(struct psp_gfx_cmd_resp)); @@ -687,8 +674,6 @@ psp_cmd_submit_buf(struct psp_context *psp, } exit: - if (dev_entered) - drm_dev_exit(idx); return ret; } -- 2.34.1
RE: [PATCH] drm/amdgpu: Fix the warning info when removing amdgpu device
Reviewed-by: Guchun Chen Regards, Guchun -Original Message- From: lyndonli Sent: Tuesday, March 7, 2023 10:12 AM To: amd-gfx@lists.freedesktop.org Cc: Prosyak, Vitaly ; Koenig, Christian ; Deucher, Alexander ; Chen, Guchun ; Xu, Feifei ; Ma, Jun ; Li, Lyndon Subject: [PATCH] drm/amdgpu: Fix the warning info when removing amdgpu device Actually, the drm_dev_enter in psp_cmd_submit_buf does not protect anything. And it is not used to prevent concurrent access. If DRM device is unplugged, it will always check the condition in WARN_ON. We'd better not keep adding commands to the list. Simply moving the drm_dev_enter/drm_dev_exit higher level will not solve the issue. Because psp_cmd_submit_buf is called in many places, such as psp_hw_init-->psp_load_fw, psp_suspend-->psp_xgmi_terminate, amdgpu_device_gpu_recover-->amdgpu_ras_suspend. So drop drm_dev_enter/drm_dev_exit in psp_cmd_submit_buf. When removing amdgpu, the calling order as follows: amdgpu_pci_remove drm_dev_unplug amdgpu_driver_unload_kms amdgpu_device_fini_hw amdgpu_device_ip_fini_early psp_hw_fini psp_ras_terminate psp_ta_unloadye psp_cmd_submit_buf [ 4507.740388] Call Trace: [ 4507.740389] [ 4507.740391] psp_ta_unload+0x44/0x70 [amdgpu] [ 4507.740485] psp_ras_terminate+0x4d/0x70 [amdgpu] [ 4507.740575] psp_hw_fini+0x28/0xa0 [amdgpu] [ 4507.740662] amdgpu_device_fini_hw+0x328/0x442 [amdgpu] [ 4507.740791] amdgpu_driver_unload_kms+0x51/0x60 [amdgpu] [ 4507.740875] amdgpu_pci_remove+0x5a/0x140 [amdgpu] [ 4507.740962] ? _raw_spin_unlock_irqrestore+0x27/0x43 [ 4507.740965] ? __pm_runtime_resume+0x60/0x90 [ 4507.740968] pci_device_remove+0x39/0xb0 [ 4507.740971] device_remove+0x46/0x70 [ 4507.740972] device_release_driver_internal+0xd1/0x160 [ 4507.740974] driver_detach+0x4a/0x90 [ 4507.740975] bus_remove_driver+0x6c/0xf0 [ 4507.740976] driver_unregister+0x31/0x50 [ 4507.740977] pci_unregister_driver+0x40/0x90 [ 4507.740978] amdgpu_exit+0x15/0x120 [amdgpu] Signed-off-by: lyndonli --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 17 + 1 file changed, 1 insertion(+), 16 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index 4c617faaa7c9..02f948adae72 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -603,27 +603,14 @@ psp_cmd_submit_buf(struct psp_context *psp, struct psp_gfx_cmd_resp *cmd, uint64_t fence_mc_addr) { int ret; - int index, idx; + int index; int timeout = 2; bool ras_intr = false; bool skip_unsupport = false; - bool dev_entered; if (psp->adev->no_hw_access) return 0; - dev_entered = drm_dev_enter(adev_to_drm(psp->adev), &idx); - /* -* We allow sending PSP messages LOAD_ASD and UNLOAD_TA without acquiring -* a lock in drm_dev_enter during driver unload because we must call -* drm_dev_unplug as the beginning of unload driver sequence . It is very -* crucial that userspace can't access device instances anymore. -*/ - if (!dev_entered) - WARN_ON(psp->cmd_buf_mem->cmd_id != GFX_CMD_ID_LOAD_ASD && - psp->cmd_buf_mem->cmd_id != GFX_CMD_ID_UNLOAD_TA && - psp->cmd_buf_mem->cmd_id != GFX_CMD_ID_INVOKE_CMD); - memset(psp->cmd_buf_mem, 0, PSP_CMD_BUFFER_SIZE); memcpy(psp->cmd_buf_mem, cmd, sizeof(struct psp_gfx_cmd_resp)); @@ -687,8 +674,6 @@ psp_cmd_submit_buf(struct psp_context *psp, } exit: - if (dev_entered) - drm_dev_exit(idx); return ret; } -- 2.34.1