RE: [PATCH] drm/amdgpu: adjust the kfd reset sequence in reset sriov function

2021-11-30 Thread Liu, Shaoyun
Thanks for the review , change the description as suggested and submitted. 

Shaoyun.liu

-Original Message-
From: Kuehling, Felix  
Sent: Tuesday, November 30, 2021 1:19 AM
To: amd-gfx@lists.freedesktop.org; Liu, Shaoyun 
Subject: Re: [PATCH] drm/amdgpu: adjust the kfd reset sequence in reset sriov 
function

Am 2021-11-29 um 9:40 p.m. schrieb shaoyunl:
> This change revert previous commit
> 7079e7d5c6bf: drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov
> cd547b93c62a: drm/amdgpu: move kfd post_reset out of reset_sriov 
> function

It looks like this is not a straight revert. It moves the 
amdgpu_amdkfd_pre_reset to an earlier place in amdgpu_device_reset_sriov, 
presumably to address the sequence issue that the first patch was originally 
meant to fix. The patch description should mention that.

With that fixed, the patch is

Reviewed-by: Felix Kuehling 


>
> Some register access(GRBM_GFX_CNTL) only be allowed on full access 
> mode. Move kfd_pre_reset and  kfd_post_reset back inside reset_sriov 
> function.
>
> Signed-off-by: shaoyunl 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 1989f9e9379e..3c5afa45173c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4285,6 +4285,8 @@ static int amdgpu_device_reset_sriov(struct 
> amdgpu_device *adev,  {
>   int r;
>  
> + amdgpu_amdkfd_pre_reset(adev);
> +
>   if (from_hypervisor)
>   r = amdgpu_virt_request_full_gpu(adev, true);
>   else
> @@ -4312,6 +4314,7 @@ static int amdgpu_device_reset_sriov(struct 
> amdgpu_device *adev,
>  
>   amdgpu_irq_gpu_reset_resume_helper(adev);
>   r = amdgpu_ib_ring_tests(adev);
> + amdgpu_amdkfd_post_reset(adev);
>  
>  error:
>   if (!r && adev->virt.gim_feature & AMDGIM_FEATURE_GIM_FLR_VRAMLOST) 
> { @@ -5026,7 +5029,8 @@ int amdgpu_device_gpu_recover(struct 
> amdgpu_device *adev,
>  
>   cancel_delayed_work_sync(&tmp_adev->delayed_init_work);
>  
> - amdgpu_amdkfd_pre_reset(tmp_adev);
> + if (!amdgpu_sriov_vf(tmp_adev))
> + amdgpu_amdkfd_pre_reset(tmp_adev);
>  
>   /*
>* Mark these ASICs to be reseted as untracked first @@ -5144,9 
> +5148,9 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>  
>  skip_sched_resume:
>   list_for_each_entry(tmp_adev, device_list_handle, reset_list) {
> - /* unlock kfd */
> - if (!need_emergency_restart)
> - amdgpu_amdkfd_post_reset(tmp_adev);
> + /* unlock kfd: SRIOV would do it separately */
> + if (!need_emergency_restart && !amdgpu_sriov_vf(tmp_adev))
> + amdgpu_amdkfd_post_reset(tmp_adev);
>  
>   /* kfd_post_reset will do nothing if kfd device is not 
> initialized,
>* need to bring up kfd here if it's not be initialized before


Re: [PATCH] drm/amdgpu: adjust the kfd reset sequence in reset sriov function

2021-11-29 Thread Felix Kuehling
Am 2021-11-29 um 9:40 p.m. schrieb shaoyunl:
> This change revert previous commit
> 7079e7d5c6bf: drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov
> cd547b93c62a: drm/amdgpu: move kfd post_reset out of reset_sriov function

It looks like this is not a straight revert. It moves the
amdgpu_amdkfd_pre_reset to an earlier place in
amdgpu_device_reset_sriov, presumably to address the sequence issue that
the first patch was originally meant to fix. The patch description
should mention that.

With that fixed, the patch is

Reviewed-by: Felix Kuehling 


>
> Some register access(GRBM_GFX_CNTL) only be allowed on full access
> mode. Move kfd_pre_reset and  kfd_post_reset back inside reset_sriov
> function.
>
> Signed-off-by: shaoyunl 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 1989f9e9379e..3c5afa45173c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4285,6 +4285,8 @@ static int amdgpu_device_reset_sriov(struct 
> amdgpu_device *adev,
>  {
>   int r;
>  
> + amdgpu_amdkfd_pre_reset(adev);
> +
>   if (from_hypervisor)
>   r = amdgpu_virt_request_full_gpu(adev, true);
>   else
> @@ -4312,6 +4314,7 @@ static int amdgpu_device_reset_sriov(struct 
> amdgpu_device *adev,
>  
>   amdgpu_irq_gpu_reset_resume_helper(adev);
>   r = amdgpu_ib_ring_tests(adev);
> + amdgpu_amdkfd_post_reset(adev);
>  
>  error:
>   if (!r && adev->virt.gim_feature & AMDGIM_FEATURE_GIM_FLR_VRAMLOST) {
> @@ -5026,7 +5029,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device 
> *adev,
>  
>   cancel_delayed_work_sync(&tmp_adev->delayed_init_work);
>  
> - amdgpu_amdkfd_pre_reset(tmp_adev);
> + if (!amdgpu_sriov_vf(tmp_adev))
> + amdgpu_amdkfd_pre_reset(tmp_adev);
>  
>   /*
>* Mark these ASICs to be reseted as untracked first
> @@ -5144,9 +5148,9 @@ int amdgpu_device_gpu_recover(struct amdgpu_device 
> *adev,
>  
>  skip_sched_resume:
>   list_for_each_entry(tmp_adev, device_list_handle, reset_list) {
> - /* unlock kfd */
> - if (!need_emergency_restart)
> - amdgpu_amdkfd_post_reset(tmp_adev);
> + /* unlock kfd: SRIOV would do it separately */
> + if (!need_emergency_restart && !amdgpu_sriov_vf(tmp_adev))
> + amdgpu_amdkfd_post_reset(tmp_adev);
>  
>   /* kfd_post_reset will do nothing if kfd device is not 
> initialized,
>* need to bring up kfd here if it's not be initialized before


[PATCH] drm/amdgpu: adjust the kfd reset sequence in reset sriov function

2021-11-29 Thread shaoyunl
This change revert previous commit
7079e7d5c6bf: drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov
cd547b93c62a: drm/amdgpu: move kfd post_reset out of reset_sriov function

Some register access(GRBM_GFX_CNTL) only be allowed on full access
mode. Move kfd_pre_reset and  kfd_post_reset back inside reset_sriov
function.

Signed-off-by: shaoyunl 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 1989f9e9379e..3c5afa45173c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4285,6 +4285,8 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device 
*adev,
 {
int r;
 
+   amdgpu_amdkfd_pre_reset(adev);
+
if (from_hypervisor)
r = amdgpu_virt_request_full_gpu(adev, true);
else
@@ -4312,6 +4314,7 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device 
*adev,
 
amdgpu_irq_gpu_reset_resume_helper(adev);
r = amdgpu_ib_ring_tests(adev);
+   amdgpu_amdkfd_post_reset(adev);
 
 error:
if (!r && adev->virt.gim_feature & AMDGIM_FEATURE_GIM_FLR_VRAMLOST) {
@@ -5026,7 +5029,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 
cancel_delayed_work_sync(&tmp_adev->delayed_init_work);
 
-   amdgpu_amdkfd_pre_reset(tmp_adev);
+   if (!amdgpu_sriov_vf(tmp_adev))
+   amdgpu_amdkfd_pre_reset(tmp_adev);
 
/*
 * Mark these ASICs to be reseted as untracked first
@@ -5144,9 +5148,9 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 
 skip_sched_resume:
list_for_each_entry(tmp_adev, device_list_handle, reset_list) {
-   /* unlock kfd */
-   if (!need_emergency_restart)
-   amdgpu_amdkfd_post_reset(tmp_adev);
+   /* unlock kfd: SRIOV would do it separately */
+   if (!need_emergency_restart && !amdgpu_sriov_vf(tmp_adev))
+   amdgpu_amdkfd_post_reset(tmp_adev);
 
/* kfd_post_reset will do nothing if kfd device is not 
initialized,
 * need to bring up kfd here if it's not be initialized before
-- 
2.17.1