Re: [PATCH 1/2] drm/amdgpu: The call to amdgpu_xgmi_remove_device needs to be earlier than psp_hw_fini

2022-08-25 Thread Zhang, Hawking
I thought I reviewed this one together with another one from you that fixed 
hive refcount leak. You sent them in series.

Anyway, go ahead to submit with my RB.

Thanks.

Regards,
Hawking

From: amd-gfx  on behalf of Chai, Thomas 

Date: Thursday, August 25, 2022 at 00:37
To: amd-gfx@lists.freedesktop.org 
Cc: Wang, Yang(Kevin) , Zhang, Hawking 

Subject: RE: [PATCH 1/2] drm/amdgpu: The call to amdgpu_xgmi_remove_device 
needs to be earlier than psp_hw_fini
[AMD Official Use Only - General]

Ping on this series.

-Original Message-
From: Chai, Thomas 
Sent: Friday, August 12, 2022 5:13 PM
To: amd-gfx@lists.freedesktop.org
Cc: Chai, Thomas ; Zhang, Hawking ; 
Wang, Yang(Kevin) ; Chai, Thomas 
Subject: [PATCH 1/2] drm/amdgpu: The call to amdgpu_xgmi_remove_device needs to 
be earlier than psp_hw_fini

The amdgpu_xgmi_remove_device function will send unload command to psp through 
psp ring to terminate xgmi, but psp ring has been destroyed in psp_hw_fini.

Signed-off-by: YiPeng Chai 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index c84fdef0ac45..2445255bbf01 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2787,6 +2787,9 @@ static int amdgpu_device_ip_fini_early(struct 
amdgpu_device *adev)

 amdgpu_amdkfd_suspend(adev, false);

+   if (adev->gmc.xgmi.num_physical_nodes > 1)
+   amdgpu_xgmi_remove_device(adev);
+
 /* Workaroud for ASICs need to disable SMC first */
 amdgpu_device_smu_fini_early(adev);

@@ -2830,9 +2833,6 @@ static int amdgpu_device_ip_fini(struct amdgpu_device 
*adev)
 if (amdgpu_sriov_vf(adev) && adev->virt.ras_init_done)
 amdgpu_virt_release_ras_err_handler_data(adev);

-   if (adev->gmc.xgmi.num_physical_nodes > 1)
-   amdgpu_xgmi_remove_device(adev);
-
 amdgpu_amdkfd_device_fini_sw(adev);

 for (i = adev->num_ip_blocks - 1; i >= 0; i--) {
--
2.25.1


RE: [PATCH 1/2] drm/amdgpu: The call to amdgpu_xgmi_remove_device needs to be earlier than psp_hw_fini

2022-08-24 Thread Chai, Thomas
[AMD Official Use Only - General]

Ping on this series.

-Original Message-
From: Chai, Thomas  
Sent: Friday, August 12, 2022 5:13 PM
To: amd-gfx@lists.freedesktop.org
Cc: Chai, Thomas ; Zhang, Hawking ; 
Wang, Yang(Kevin) ; Chai, Thomas 
Subject: [PATCH 1/2] drm/amdgpu: The call to amdgpu_xgmi_remove_device needs to 
be earlier than psp_hw_fini

The amdgpu_xgmi_remove_device function will send unload command to psp through 
psp ring to terminate xgmi, but psp ring has been destroyed in psp_hw_fini.

Signed-off-by: YiPeng Chai 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index c84fdef0ac45..2445255bbf01 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2787,6 +2787,9 @@ static int amdgpu_device_ip_fini_early(struct 
amdgpu_device *adev)
 
amdgpu_amdkfd_suspend(adev, false);
 
+   if (adev->gmc.xgmi.num_physical_nodes > 1)
+   amdgpu_xgmi_remove_device(adev);
+
/* Workaroud for ASICs need to disable SMC first */
amdgpu_device_smu_fini_early(adev);
 
@@ -2830,9 +2833,6 @@ static int amdgpu_device_ip_fini(struct amdgpu_device 
*adev)
if (amdgpu_sriov_vf(adev) && adev->virt.ras_init_done)
amdgpu_virt_release_ras_err_handler_data(adev);
 
-   if (adev->gmc.xgmi.num_physical_nodes > 1)
-   amdgpu_xgmi_remove_device(adev);
-
amdgpu_amdkfd_device_fini_sw(adev);
 
for (i = adev->num_ip_blocks - 1; i >= 0; i--) {
--
2.25.1


RE: [PATCH 1/2] drm/amdgpu: The call to amdgpu_xgmi_remove_device needs to be earlier than psp_hw_fini

2022-08-15 Thread Chai, Thomas
[AMD Official Use Only - General]

OK, I will update the patch.

From: Zhang, Hawking 
Sent: Tuesday, August 16, 2022 11:51 AM
To: Chai, Thomas ; amd-gfx@lists.freedesktop.org
Cc: Wang, Yang(Kevin) 
Subject: Re: [PATCH 1/2] drm/amdgpu: The call to amdgpu_xgmi_remove_device 
needs to be earlier than psp_hw_fini

Fixed typo

Regards,
Hawking

From: Zhang, Hawking mailto:hawking.zh...@amd.com>>
Date: Tuesday, August 16, 2022 at 11:49
To: Chai, Thomas mailto:yipeng.c...@amd.com>>, 
amd-gfx@lists.freedesktop.org 
mailto:amd-gfx@lists.freedesktop.org>>
Cc: Wang, Yang(Kevin) mailto:kevinyang.w...@amd.com>>
Subject: RE: [PATCH 1/2] drm/amdgpu: The call to amdgpu_xgmi_remove_device 
needs to be earlier than psp_hw_fini
[AMD Official Use Only - General]

Alternatively, it might be better split xgmi ta terminate from 
xgmi_remove_device. In psp_hw_fini, check ta->fw and num_of_physical_mode to 
terminate xgmi ta. and make amdgpu_xgmi_remove_device only deal with software 
fini, like add_device.

Regards,
Hawking

-Original Message-
From: Chai, Thomas mailto:yipeng.c...@amd.com>>
Sent: Monday, August 15, 2022 15:03
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking mailto:hawking.zh...@amd.com>>; Wang, 
Yang(Kevin) mailto:kevinyang.w...@amd.com>>
Subject: RE: [PATCH 1/2] drm/amdgpu: The call to amdgpu_xgmi_remove_device 
needs to be earlier than psp_hw_fini

[AMD Official Use Only - General]

Ping on this series.

-Original Message-
From: Chai, Thomas mailto:yipeng.c...@amd.com>>
Sent: Friday, August 12, 2022 5:13 PM
To: amd-gfx@lists.freedesktop.org
Cc: Chai, Thomas mailto:yipeng.c...@amd.com>>; Zhang, 
Hawking mailto:hawking.zh...@amd.com>>; Wang, 
Yang(Kevin) mailto:kevinyang.w...@amd.com>>; Chai, 
Thomas mailto:yipeng.c...@amd.com>>
Subject: [PATCH 1/2] drm/amdgpu: The call to amdgpu_xgmi_remove_device needs to 
be earlier than psp_hw_fini

The amdgpu_xgmi_remove_device function will send unload command to psp through 
psp ring to terminate xgmi, but psp ring has been destroyed in psp_hw_fini.

Signed-off-by: YiPeng Chai mailto:yipeng.c...@amd.com>>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index c84fdef0ac45..2445255bbf01 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2787,6 +2787,9 @@ static int amdgpu_device_ip_fini_early(struct 
amdgpu_device *adev)

 amdgpu_amdkfd_suspend(adev, false);

+   if (adev->gmc.xgmi.num_physical_nodes > 1)
+   amdgpu_xgmi_remove_device(adev);
+
 /* Workaroud for ASICs need to disable SMC first */
 amdgpu_device_smu_fini_early(adev);

@@ -2830,9 +2833,6 @@ static int amdgpu_device_ip_fini(struct amdgpu_device 
*adev)
 if (amdgpu_sriov_vf(adev) && adev->virt.ras_init_done)
 amdgpu_virt_release_ras_err_handler_data(adev);

-   if (adev->gmc.xgmi.num_physical_nodes > 1)
-   amdgpu_xgmi_remove_device(adev);
-
 amdgpu_amdkfd_device_fini_sw(adev);

 for (i = adev->num_ip_blocks - 1; i >= 0; i--) {
--
2.25.1


Re: [PATCH 1/2] drm/amdgpu: The call to amdgpu_xgmi_remove_device needs to be earlier than psp_hw_fini

2022-08-15 Thread Zhang, Hawking
Fixed typo

Regards,
Hawking

From: Zhang, Hawking 
Date: Tuesday, August 16, 2022 at 11:49
To: Chai, Thomas , amd-gfx@lists.freedesktop.org 

Cc: Wang, Yang(Kevin) 
Subject: RE: [PATCH 1/2] drm/amdgpu: The call to amdgpu_xgmi_remove_device 
needs to be earlier than psp_hw_fini
[AMD Official Use Only - General]

Alternatively, it might be better split xgmi ta terminate from 
xgmi_remove_device. In psp_hw_fini, check ta->fw and num_of_physical_mode to 
terminate xgmi ta. and make amdgpu_xgmi_remove_device only deal with software 
fini, like add_device.

Regards,
Hawking

-Original Message-
From: Chai, Thomas 
Sent: Monday, August 15, 2022 15:03
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Wang, Yang(Kevin) 

Subject: RE: [PATCH 1/2] drm/amdgpu: The call to amdgpu_xgmi_remove_device 
needs to be earlier than psp_hw_fini

[AMD Official Use Only - General]

Ping on this series.

-Original Message-
From: Chai, Thomas 
Sent: Friday, August 12, 2022 5:13 PM
To: amd-gfx@lists.freedesktop.org
Cc: Chai, Thomas ; Zhang, Hawking ; 
Wang, Yang(Kevin) ; Chai, Thomas 
Subject: [PATCH 1/2] drm/amdgpu: The call to amdgpu_xgmi_remove_device needs to 
be earlier than psp_hw_fini

The amdgpu_xgmi_remove_device function will send unload command to psp through 
psp ring to terminate xgmi, but psp ring has been destroyed in psp_hw_fini.

Signed-off-by: YiPeng Chai 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index c84fdef0ac45..2445255bbf01 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2787,6 +2787,9 @@ static int amdgpu_device_ip_fini_early(struct 
amdgpu_device *adev)

 amdgpu_amdkfd_suspend(adev, false);

+   if (adev->gmc.xgmi.num_physical_nodes > 1)
+   amdgpu_xgmi_remove_device(adev);
+
 /* Workaroud for ASICs need to disable SMC first */
 amdgpu_device_smu_fini_early(adev);

@@ -2830,9 +2833,6 @@ static int amdgpu_device_ip_fini(struct amdgpu_device 
*adev)
 if (amdgpu_sriov_vf(adev) && adev->virt.ras_init_done)
 amdgpu_virt_release_ras_err_handler_data(adev);

-   if (adev->gmc.xgmi.num_physical_nodes > 1)
-   amdgpu_xgmi_remove_device(adev);
-
 amdgpu_amdkfd_device_fini_sw(adev);

 for (i = adev->num_ip_blocks - 1; i >= 0; i--) {
--
2.25.1


RE: [PATCH 1/2] drm/amdgpu: The call to amdgpu_xgmi_remove_device needs to be earlier than psp_hw_fini

2022-08-15 Thread Zhang, Hawking
[AMD Official Use Only - General]

Alternatively, we might be better split xgmi ta terminate from 
xgmi_remove_device. In psp_hw_fini, check ta->fw and num_of_physical_mode to 
terminate xgmi ta. and make amdgpu_xgmi_remove_device only deal with software 
fini, like add_device.

Regards,
Hawking

-Original Message-
From: Chai, Thomas  
Sent: Monday, August 15, 2022 15:03
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Wang, Yang(Kevin) 

Subject: RE: [PATCH 1/2] drm/amdgpu: The call to amdgpu_xgmi_remove_device 
needs to be earlier than psp_hw_fini

[AMD Official Use Only - General]

Ping on this series.

-Original Message-
From: Chai, Thomas  
Sent: Friday, August 12, 2022 5:13 PM
To: amd-gfx@lists.freedesktop.org
Cc: Chai, Thomas ; Zhang, Hawking ; 
Wang, Yang(Kevin) ; Chai, Thomas 
Subject: [PATCH 1/2] drm/amdgpu: The call to amdgpu_xgmi_remove_device needs to 
be earlier than psp_hw_fini

The amdgpu_xgmi_remove_device function will send unload command to psp through 
psp ring to terminate xgmi, but psp ring has been destroyed in psp_hw_fini.

Signed-off-by: YiPeng Chai 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index c84fdef0ac45..2445255bbf01 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2787,6 +2787,9 @@ static int amdgpu_device_ip_fini_early(struct 
amdgpu_device *adev)
 
amdgpu_amdkfd_suspend(adev, false);
 
+   if (adev->gmc.xgmi.num_physical_nodes > 1)
+   amdgpu_xgmi_remove_device(adev);
+
/* Workaroud for ASICs need to disable SMC first */
amdgpu_device_smu_fini_early(adev);
 
@@ -2830,9 +2833,6 @@ static int amdgpu_device_ip_fini(struct amdgpu_device 
*adev)
if (amdgpu_sriov_vf(adev) && adev->virt.ras_init_done)
amdgpu_virt_release_ras_err_handler_data(adev);
 
-   if (adev->gmc.xgmi.num_physical_nodes > 1)
-   amdgpu_xgmi_remove_device(adev);
-
amdgpu_amdkfd_device_fini_sw(adev);
 
for (i = adev->num_ip_blocks - 1; i >= 0; i--) {
--
2.25.1