RE: [PATCH] drm/amdgpu: Fix the warning info when removing amdgpu device

2023-03-06 Thread Li, Lyndon
[AMD Official Use Only - General]

Thanks, will update it.

Regards,
Lyndon

> -Original Message-
> From: Christian König 
> Sent: Tuesday, March 7, 2023 2:22 PM
> To: Chen, Guchun ; Li, Lyndon
> ; amd-gfx@lists.freedesktop.org
> Cc: Xu, Feifei ; Ma, Jun ;
> Prosyak, Vitaly ; Deucher, Alexander
> ; Koenig, Christian
> 
> Subject: Re: [PATCH] drm/amdgpu: Fix the warning info when removing
> amdgpu device
> 
> The commit message reads a bit bumpy. Generally best practice are:
> 
> Short (72 chars or less) summary
> 
> More detailed explanatory text. Wrap it to 72 characters. The blank line
> separating the summary from the body is critical (unless you omit the body
> entirely).
> 
> Write your commit message in the imperative: "Fix bug" and not "Fixed bug"
> or "Fixes bug." This convention matches up with commit messages
> generated by commands like git merge and git revert.
> 
> Further paragraphs come after blank lines.
> 
> - Bullet points are okay, too.
> - Typically a hyphen or asterisk is used for the bullet, followed by a
>    single space. Use a hanging indent.
> 
> Apart from that the patch is Acked-by: Christian König
> 
> 
> Regards,
> Christian.
> 
> Am 07.03.23 um 03:23 schrieb Chen, Guchun:
> > Reviewed-by: Guchun Chen 
> >
> > Regards,
> > Guchun
> >
> > -Original Message-
> > From: lyndonli 
> > Sent: Tuesday, March 7, 2023 10:12 AM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Prosyak, Vitaly ; Koenig, Christian
> > ; Deucher, Alexander
> > ; Chen, Guchun
> ; Xu,
> > Feifei ; Ma, Jun ; Li, Lyndon
> > 
> > Subject: [PATCH] drm/amdgpu: Fix the warning info when removing
> amdgpu
> > device
> >
> > Actually, the drm_dev_enter in psp_cmd_submit_buf does not protect
> anything.
> > And it is not used to prevent concurrent access.
> > If DRM device is unplugged, it will always check the condition in WARN_ON.
> > We'd better not keep adding commands to the list.
> > Simply moving the drm_dev_enter/drm_dev_exit higher level will not
> solve the issue.
> > Because psp_cmd_submit_buf is called in many places, such as
> psp_hw_init-->psp_load_fw, psp_suspend-->psp_xgmi_terminate,
> amdgpu_device_gpu_recover-->amdgpu_ras_suspend.
> > So drop drm_dev_enter/drm_dev_exit in psp_cmd_submit_buf.
> >
> > When removing amdgpu, the calling order as follows:
> > amdgpu_pci_remove
> > drm_dev_unplug
> > amdgpu_driver_unload_kms
> > amdgpu_device_fini_hw
> > amdgpu_device_ip_fini_early
> > psp_hw_fini
> > psp_ras_terminate
> > psp_ta_unloadye
> >
>   psp_cmd_submit_buf
> >
> > [ 4507.740388] Call Trace:
> > [ 4507.740389]  
> > [ 4507.740391]  psp_ta_unload+0x44/0x70 [amdgpu] [ 4507.740485]
> > psp_ras_terminate+0x4d/0x70 [amdgpu] [ 4507.740575]
> > psp_hw_fini+0x28/0xa0 [amdgpu] [ 4507.740662]
> > amdgpu_device_fini_hw+0x328/0x442 [amdgpu] [ 4507.740791]
> > amdgpu_driver_unload_kms+0x51/0x60 [amdgpu] [ 4507.740875]
> > amdgpu_pci_remove+0x5a/0x140 [amdgpu] [ 4507.740962]  ?
> > _raw_spin_unlock_irqrestore+0x27/0x43
> > [ 4507.740965]  ? __pm_runtime_resume+0x60/0x90 [ 4507.740968]
> > pci_device_remove+0x39/0xb0 [ 4507.740971]  device_remove+0x46/0x70
> [
> > 4507.740972]  device_release_driver_internal+0xd1/0x160
> > [ 4507.740974]  driver_detach+0x4a/0x90 [ 4507.740975]
> > bus_remove_driver+0x6c/0xf0 [ 4507.740976]
> > driver_unregister+0x31/0x50 [ 4507.740977]
> > pci_unregister_driver+0x40/0x90 [ 4507.740978]  amdgpu_exit+0x15/0x120
> > [amdgpu]
> >
> > Signed-off-by: lyndonli 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 17 +
> >   1 file changed, 1 insertion(+), 16 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > index 4c617faaa7c9..02f948adae72 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > @@ -603,27 +603,14 @@ psp_cmd_submit_buf(struct psp_context *psp,
> >struct psp_gfx_cmd_resp *cmd, uint64_t fence_mc_addr)
> {
> > int ret;
> > -   int index, idx;
> > +   int index;
> > int timeout = 2;
> > bool ras_intr = false;
> > bool skip_unsupport = false;
> > -   bool dev_entered;
> >
> > if (psp->a

Re: [PATCH] drm/amdgpu: Fix the warning info when removing amdgpu device

2023-03-06 Thread Christian König

The commit message reads a bit bumpy. Generally best practice are:

Short (72 chars or less) summary

More detailed explanatory text. Wrap it to 72 characters. The blank
line separating the summary from the body is critical (unless you omit
the body entirely).

Write your commit message in the imperative: "Fix bug" and not "Fixed
bug" or "Fixes bug." This convention matches up with commit messages
generated by commands like git merge and git revert.

Further paragraphs come after blank lines.

- Bullet points are okay, too.
- Typically a hyphen or asterisk is used for the bullet, followed by a
  single space. Use a hanging indent.

Apart from that the patch is Acked-by: Christian König 



Regards,
Christian.

Am 07.03.23 um 03:23 schrieb Chen, Guchun:

Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: lyndonli 
Sent: Tuesday, March 7, 2023 10:12 AM
To: amd-gfx@lists.freedesktop.org
Cc: Prosyak, Vitaly ; Koenig, Christian ; Deucher, 
Alexander ; Chen, Guchun ; Xu, Feifei 
; Ma, Jun ; Li, Lyndon 
Subject: [PATCH] drm/amdgpu: Fix the warning info when removing amdgpu device

Actually, the drm_dev_enter in psp_cmd_submit_buf does not protect anything.
And it is not used to prevent concurrent access.
If DRM device is unplugged, it will always check the condition in WARN_ON.
We'd better not keep adding commands to the list.
Simply moving the drm_dev_enter/drm_dev_exit higher level will not solve the 
issue.
Because psp_cmd_submit_buf is called in many places, such as 
psp_hw_init-->psp_load_fw, psp_suspend-->psp_xgmi_terminate, 
amdgpu_device_gpu_recover-->amdgpu_ras_suspend.
So drop drm_dev_enter/drm_dev_exit in psp_cmd_submit_buf.

When removing amdgpu, the calling order as follows:
amdgpu_pci_remove
drm_dev_unplug
amdgpu_driver_unload_kms
amdgpu_device_fini_hw
amdgpu_device_ip_fini_early
psp_hw_fini
psp_ras_terminate
psp_ta_unloadye
psp_cmd_submit_buf

[ 4507.740388] Call Trace:
[ 4507.740389]  
[ 4507.740391]  psp_ta_unload+0x44/0x70 [amdgpu] [ 4507.740485]  
psp_ras_terminate+0x4d/0x70 [amdgpu] [ 4507.740575]  psp_hw_fini+0x28/0xa0 
[amdgpu] [ 4507.740662]  amdgpu_device_fini_hw+0x328/0x442 [amdgpu] [ 
4507.740791]  amdgpu_driver_unload_kms+0x51/0x60 [amdgpu] [ 4507.740875]  
amdgpu_pci_remove+0x5a/0x140 [amdgpu] [ 4507.740962]  ? 
_raw_spin_unlock_irqrestore+0x27/0x43
[ 4507.740965]  ? __pm_runtime_resume+0x60/0x90 [ 4507.740968]  
pci_device_remove+0x39/0xb0 [ 4507.740971]  device_remove+0x46/0x70 [ 
4507.740972]  device_release_driver_internal+0xd1/0x160
[ 4507.740974]  driver_detach+0x4a/0x90
[ 4507.740975]  bus_remove_driver+0x6c/0xf0 [ 4507.740976]  
driver_unregister+0x31/0x50 [ 4507.740977]  pci_unregister_driver+0x40/0x90 [ 
4507.740978]  amdgpu_exit+0x15/0x120 [amdgpu]

Signed-off-by: lyndonli 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 17 +
  1 file changed, 1 insertion(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 4c617faaa7c9..02f948adae72 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -603,27 +603,14 @@ psp_cmd_submit_buf(struct psp_context *psp,
   struct psp_gfx_cmd_resp *cmd, uint64_t fence_mc_addr)  {
int ret;
-   int index, idx;
+   int index;
int timeout = 2;
bool ras_intr = false;
bool skip_unsupport = false;
-   bool dev_entered;
  
  	if (psp->adev->no_hw_access)

return 0;
  
-	dev_entered = drm_dev_enter(adev_to_drm(psp->adev), &idx);

-   /*
-* We allow sending PSP messages LOAD_ASD and UNLOAD_TA without 
acquiring
-* a lock in drm_dev_enter during driver unload because we must call
-* drm_dev_unplug as the beginning  of unload driver sequence . It is 
very
-* crucial that userspace can't access device instances anymore.
-*/
-   if (!dev_entered)
-   WARN_ON(psp->cmd_buf_mem->cmd_id != GFX_CMD_ID_LOAD_ASD &&
-   psp->cmd_buf_mem->cmd_id != GFX_CMD_ID_UNLOAD_TA &&
-   psp->cmd_buf_mem->cmd_id != GFX_CMD_ID_INVOKE_CMD);
-
memset(psp->cmd_buf_mem, 0, PSP_CMD_BUFFER_SIZE);
  
  	memcpy(psp->cmd_buf_mem, cmd, sizeof(struct psp_gfx_cmd_resp)); @@ -687,8 +674,6 @@ psp_cmd_submit_buf(struct psp_context *psp,

}
  
  exit:

-   if (dev_entered)
-   drm_dev_exit(idx);
return ret;
  }
  
--

2.34.1





RE: [PATCH] drm/amdgpu: Fix the warning info when removing amdgpu device

2023-03-06 Thread Chen, Guchun
Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: lyndonli  
Sent: Tuesday, March 7, 2023 10:12 AM
To: amd-gfx@lists.freedesktop.org
Cc: Prosyak, Vitaly ; Koenig, Christian 
; Deucher, Alexander ; 
Chen, Guchun ; Xu, Feifei ; Ma, Jun 
; Li, Lyndon 
Subject: [PATCH] drm/amdgpu: Fix the warning info when removing amdgpu device

Actually, the drm_dev_enter in psp_cmd_submit_buf does not protect anything.
And it is not used to prevent concurrent access.
If DRM device is unplugged, it will always check the condition in WARN_ON.
We'd better not keep adding commands to the list.
Simply moving the drm_dev_enter/drm_dev_exit higher level will not solve the 
issue.
Because psp_cmd_submit_buf is called in many places, such as 
psp_hw_init-->psp_load_fw, psp_suspend-->psp_xgmi_terminate, 
amdgpu_device_gpu_recover-->amdgpu_ras_suspend.
So drop drm_dev_enter/drm_dev_exit in psp_cmd_submit_buf.

When removing amdgpu, the calling order as follows:
amdgpu_pci_remove
drm_dev_unplug
amdgpu_driver_unload_kms
amdgpu_device_fini_hw
amdgpu_device_ip_fini_early
psp_hw_fini
psp_ras_terminate
psp_ta_unloadye
psp_cmd_submit_buf

[ 4507.740388] Call Trace:
[ 4507.740389]  
[ 4507.740391]  psp_ta_unload+0x44/0x70 [amdgpu] [ 4507.740485]  
psp_ras_terminate+0x4d/0x70 [amdgpu] [ 4507.740575]  psp_hw_fini+0x28/0xa0 
[amdgpu] [ 4507.740662]  amdgpu_device_fini_hw+0x328/0x442 [amdgpu] [ 
4507.740791]  amdgpu_driver_unload_kms+0x51/0x60 [amdgpu] [ 4507.740875]  
amdgpu_pci_remove+0x5a/0x140 [amdgpu] [ 4507.740962]  ? 
_raw_spin_unlock_irqrestore+0x27/0x43
[ 4507.740965]  ? __pm_runtime_resume+0x60/0x90 [ 4507.740968]  
pci_device_remove+0x39/0xb0 [ 4507.740971]  device_remove+0x46/0x70 [ 
4507.740972]  device_release_driver_internal+0xd1/0x160
[ 4507.740974]  driver_detach+0x4a/0x90
[ 4507.740975]  bus_remove_driver+0x6c/0xf0 [ 4507.740976]  
driver_unregister+0x31/0x50 [ 4507.740977]  pci_unregister_driver+0x40/0x90 [ 
4507.740978]  amdgpu_exit+0x15/0x120 [amdgpu]

Signed-off-by: lyndonli 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 17 +
 1 file changed, 1 insertion(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 4c617faaa7c9..02f948adae72 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -603,27 +603,14 @@ psp_cmd_submit_buf(struct psp_context *psp,
   struct psp_gfx_cmd_resp *cmd, uint64_t fence_mc_addr)  {
int ret;
-   int index, idx;
+   int index;
int timeout = 2;
bool ras_intr = false;
bool skip_unsupport = false;
-   bool dev_entered;
 
if (psp->adev->no_hw_access)
return 0;
 
-   dev_entered = drm_dev_enter(adev_to_drm(psp->adev), &idx);
-   /*
-* We allow sending PSP messages LOAD_ASD and UNLOAD_TA without 
acquiring
-* a lock in drm_dev_enter during driver unload because we must call
-* drm_dev_unplug as the beginning  of unload driver sequence . It is 
very
-* crucial that userspace can't access device instances anymore.
-*/
-   if (!dev_entered)
-   WARN_ON(psp->cmd_buf_mem->cmd_id != GFX_CMD_ID_LOAD_ASD &&
-   psp->cmd_buf_mem->cmd_id != GFX_CMD_ID_UNLOAD_TA &&
-   psp->cmd_buf_mem->cmd_id != GFX_CMD_ID_INVOKE_CMD);
-
memset(psp->cmd_buf_mem, 0, PSP_CMD_BUFFER_SIZE);
 
memcpy(psp->cmd_buf_mem, cmd, sizeof(struct psp_gfx_cmd_resp)); @@ 
-687,8 +674,6 @@ psp_cmd_submit_buf(struct psp_context *psp,
}
 
 exit:
-   if (dev_entered)
-   drm_dev_exit(idx);
return ret;
 }
 
--
2.34.1