RE: [PATCH v2] drm/amdgpu: fix the memory corruption on S3

2017-06-29 Thread Deucher, Alexander
> -Original Message-
> From: Christian König [mailto:deathsim...@vodafone.de]
> Sent: Thursday, June 29, 2017 4:17 AM
> To: Huang, Ray; amd-gfx@lists.freedesktop.org; Deucher, Alexander; Koenig,
> Christian
> Cc: Huan, Alvin; Qiao, Joe(Markham); Jiang, Sonny; Wang, Ken; Yuan, Xiaojie
> Subject: Re: [PATCH v2] drm/amdgpu: fix the memory corruption on S3
> 
> Am 29.06.2017 um 10:09 schrieb Huang Rui:
> > psp->cmd will be used on resume phase, so we can not free it on hw_init.
> > Otherwise, a memory corruption will be triggered.
> >
> > Signed-off-by: Huang Rui 
> > ---
> >
> > V1 -> V2:
> > - remove "cmd" variable.
> > - fix typo of check.
> >
> > Alex, Christian,
> >
> > This is the final fix for vega10 S3. The random memory corruption issue is
> root
> > caused.
> >
> > Thanks,
> > Ray
> >
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 17 +
> >   1 file changed, 9 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > index 5bed483..711476792 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > @@ -330,14 +330,11 @@ static int psp_load_fw(struct amdgpu_device
> *adev)
> >   {
> > int ret;
> > struct psp_context *psp = &adev->psp;
> > -   struct psp_gfx_cmd_resp *cmd;
> >
> > -   cmd = kzalloc(sizeof(struct psp_gfx_cmd_resp), GFP_KERNEL);
> > -   if (!cmd)
> > +   psp->cmd = kzalloc(sizeof(struct psp_gfx_cmd_resp), GFP_KERNEL);
> > +   if (!psp->cmd)
> > return -ENOMEM;
> >
> > -   psp->cmd = cmd;
> > -
> > ret = amdgpu_bo_create_kernel(adev, PSP_1_MEG, PSP_1_MEG,
> >   AMDGPU_GEM_DOMAIN_GTT,
> >   &psp->fw_pri_bo,
> > @@ -376,8 +373,6 @@ static int psp_load_fw(struct amdgpu_device
> *adev)
> > if (ret)
> > goto failed_mem;
> >
> > -   kfree(cmd);
> > -
> > return 0;
> >
> >   failed_mem:
> > @@ -387,7 +382,8 @@ static int psp_load_fw(struct amdgpu_device
> *adev)
> > amdgpu_bo_free_kernel(&psp->fw_pri_bo,
> >   &psp->fw_pri_mc_addr, &psp->fw_pri_buf);
> >   failed:
> > -   kfree(cmd);
> > +   kfree(psp->cmd);
> > +   psp->cmd = NULL;
> > return ret;
> >   }
> >
> > @@ -447,6 +443,11 @@ static int psp_hw_fini(void *handle)
> > amdgpu_bo_free_kernel(&psp->fence_buf_bo,
> >   &psp->fence_buf_mc_addr, &psp-
> >fence_buf);
> >
> > +   if (psp->cmd) {
> 
> As Michel noted as well please drop this extra check, kfree(NULL) is
> perfectly save.
> 
> With that fixed the patch is Reviewed-by: Christian König
>  for now, but I still think we could do better
> by only allocating the temporary command buffer when it is needed.

Yes, nice find Ray!  Glad to finally have this one solved!  With the extra 
check fixed:
Reviewed-by: Alex Deucher 

> 
> Regards,
> Christian.
> 
> > +   kfree(psp->cmd);
> > +   psp->cmd = NULL;
> > +   }
> > +
> > return 0;
> >   }
> >
> 

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2] drm/amdgpu: fix the memory corruption on S3

2017-06-29 Thread Yuan, Xiaojie
Tested-by: Xiaojie Yuan 


Regards,

Xiaojie


From: Huang Rui 
Sent: Thursday, June 29, 2017 4:09:21 PM
To: amd-gfx@lists.freedesktop.org; Deucher, Alexander; Koenig, Christian
Cc: Wang, Ken; Qiao, Joe(Markham); Jiang, Sonny; Huan, Alvin; Yuan, Xiaojie; 
Huang, Ray
Subject: [PATCH v2] drm/amdgpu: fix the memory corruption on S3

psp->cmd will be used on resume phase, so we can not free it on hw_init.
Otherwise, a memory corruption will be triggered.

Signed-off-by: Huang Rui 
---

V1 -> V2:
- remove "cmd" variable.
- fix typo of check.

Alex, Christian,

This is the final fix for vega10 S3. The random memory corruption issue is root
caused.

Thanks,
Ray

---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 5bed483..711476792 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -330,14 +330,11 @@ static int psp_load_fw(struct amdgpu_device *adev)
 {
 int ret;
 struct psp_context *psp = &adev->psp;
-   struct psp_gfx_cmd_resp *cmd;

-   cmd = kzalloc(sizeof(struct psp_gfx_cmd_resp), GFP_KERNEL);
-   if (!cmd)
+   psp->cmd = kzalloc(sizeof(struct psp_gfx_cmd_resp), GFP_KERNEL);
+   if (!psp->cmd)
 return -ENOMEM;

-   psp->cmd = cmd;
-
 ret = amdgpu_bo_create_kernel(adev, PSP_1_MEG, PSP_1_MEG,
   AMDGPU_GEM_DOMAIN_GTT,
   &psp->fw_pri_bo,
@@ -376,8 +373,6 @@ static int psp_load_fw(struct amdgpu_device *adev)
 if (ret)
 goto failed_mem;

-   kfree(cmd);
-
 return 0;

 failed_mem:
@@ -387,7 +382,8 @@ static int psp_load_fw(struct amdgpu_device *adev)
 amdgpu_bo_free_kernel(&psp->fw_pri_bo,
   &psp->fw_pri_mc_addr, &psp->fw_pri_buf);
 failed:
-   kfree(cmd);
+   kfree(psp->cmd);
+   psp->cmd = NULL;
 return ret;
 }

@@ -447,6 +443,11 @@ static int psp_hw_fini(void *handle)
 amdgpu_bo_free_kernel(&psp->fence_buf_bo,
   &psp->fence_buf_mc_addr, 
&psp->fence_buf);

+   if (psp->cmd) {
+   kfree(psp->cmd);
+   psp->cmd = NULL;
+   }
+
 return 0;
 }

--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2] drm/amdgpu: fix the memory corruption on S3

2017-06-29 Thread Huang Rui
On Thu, Jun 29, 2017 at 04:16:53PM +0800, Christian König wrote:
> Am 29.06.2017 um 10:09 schrieb Huang Rui:
> > psp->cmd will be used on resume phase, so we can not free it on hw_init.
> > Otherwise, a memory corruption will be triggered.
> >
> > Signed-off-by: Huang Rui 
> > ---
> >
> > V1 -> V2:
> > - remove "cmd" variable.
> > - fix typo of check.
> >
> > Alex, Christian,
> >
> > This is the final fix for vega10 S3. The random memory corruption issue is
> root
> > caused.
> >
> > Thanks,
> > Ray
> >
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 17 +
> >   1 file changed, 9 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/
> amdgpu/amdgpu_psp.c
> > index 5bed483..711476792 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > @@ -330,14 +330,11 @@ static int psp_load_fw(struct amdgpu_device *adev)
> >   {
> >int ret;
> >struct psp_context *psp = &adev->psp;
> > - struct psp_gfx_cmd_resp *cmd;
> >  
> > - cmd = kzalloc(sizeof(struct psp_gfx_cmd_resp), GFP_KERNEL);
> > - if (!cmd)
> > + psp->cmd = kzalloc(sizeof(struct psp_gfx_cmd_resp), GFP_KERNEL);
> > + if (!psp->cmd)
> >return -ENOMEM;
> >  
> > - psp->cmd = cmd;
> > -
> >ret = amdgpu_bo_create_kernel(adev, PSP_1_MEG, PSP_1_MEG,
> >  AMDGPU_GEM_DOMAIN_GTT,
> >  &psp->fw_pri_bo,
> > @@ -376,8 +373,6 @@ static int psp_load_fw(struct amdgpu_device *adev)
> >if (ret)
> >goto failed_mem;
> >  
> > - kfree(cmd);
> > -
> >return 0;
> >  
> >   failed_mem:
> > @@ -387,7 +382,8 @@ static int psp_load_fw(struct amdgpu_device *adev)
> >amdgpu_bo_free_kernel(&psp->fw_pri_bo,
> >  &psp->fw_pri_mc_addr, &psp->fw_pri_buf);
> >   failed:
> > - kfree(cmd);
> > + kfree(psp->cmd);
> > + psp->cmd = NULL;
> >return ret;
> >   }
> >  
> > @@ -447,6 +443,11 @@ static int psp_hw_fini(void *handle)
> >amdgpu_bo_free_kernel(&psp->fence_buf_bo,
> >  &psp->fence_buf_mc_addr, &psp->
> fence_buf);
> >  
> > + if (psp->cmd) {
> 
> As Michel noted as well please drop this extra check, kfree(NULL) is
> perfectly save.
> 
> With that fixed the patch is Reviewed-by: Christian König
>  for now, but I still think we could do better
> by only allocating the temporary command buffer when it is needed.
> 

Thanks. This is the quick fix for release. You know, it was a tragedy till
I found the root cause for S3 suspend/resume and make it stable, now it's
able to enter S3 more than 30+ cycles and never crash. 

I am planning to refine the psp codes, any suggestions are warm for me. I
will refer the comments such as fence and "temporary command buffter" to
modify it in following days. :-)

Thanks,
Ray
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2] drm/amdgpu: fix the memory corruption on S3

2017-06-29 Thread Christian König

Am 29.06.2017 um 10:09 schrieb Huang Rui:

psp->cmd will be used on resume phase, so we can not free it on hw_init.
Otherwise, a memory corruption will be triggered.

Signed-off-by: Huang Rui 
---

V1 -> V2:
- remove "cmd" variable.
- fix typo of check.

Alex, Christian,

This is the final fix for vega10 S3. The random memory corruption issue is root
caused.

Thanks,
Ray

---
  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 17 +
  1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 5bed483..711476792 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -330,14 +330,11 @@ static int psp_load_fw(struct amdgpu_device *adev)
  {
int ret;
struct psp_context *psp = &adev->psp;
-   struct psp_gfx_cmd_resp *cmd;
  
-	cmd = kzalloc(sizeof(struct psp_gfx_cmd_resp), GFP_KERNEL);

-   if (!cmd)
+   psp->cmd = kzalloc(sizeof(struct psp_gfx_cmd_resp), GFP_KERNEL);
+   if (!psp->cmd)
return -ENOMEM;
  
-	psp->cmd = cmd;

-
ret = amdgpu_bo_create_kernel(adev, PSP_1_MEG, PSP_1_MEG,
  AMDGPU_GEM_DOMAIN_GTT,
  &psp->fw_pri_bo,
@@ -376,8 +373,6 @@ static int psp_load_fw(struct amdgpu_device *adev)
if (ret)
goto failed_mem;
  
-	kfree(cmd);

-
return 0;
  
  failed_mem:

@@ -387,7 +382,8 @@ static int psp_load_fw(struct amdgpu_device *adev)
amdgpu_bo_free_kernel(&psp->fw_pri_bo,
  &psp->fw_pri_mc_addr, &psp->fw_pri_buf);
  failed:
-   kfree(cmd);
+   kfree(psp->cmd);
+   psp->cmd = NULL;
return ret;
  }
  
@@ -447,6 +443,11 @@ static int psp_hw_fini(void *handle)

amdgpu_bo_free_kernel(&psp->fence_buf_bo,
  &psp->fence_buf_mc_addr, &psp->fence_buf);
  
+	if (psp->cmd) {


As Michel noted as well please drop this extra check, kfree(NULL) is 
perfectly save.


With that fixed the patch is Reviewed-by: Christian König 
 for now, but I still think we could do better 
by only allocating the temporary command buffer when it is needed.


Regards,
Christian.


+   kfree(psp->cmd);
+   psp->cmd = NULL;
+   }
+
return 0;
  }
  



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v2] drm/amdgpu: fix the memory corruption on S3

2017-06-29 Thread Huang Rui
psp->cmd will be used on resume phase, so we can not free it on hw_init.
Otherwise, a memory corruption will be triggered.

Signed-off-by: Huang Rui 
---

V1 -> V2:
- remove "cmd" variable.
- fix typo of check.

Alex, Christian,

This is the final fix for vega10 S3. The random memory corruption issue is root
caused.

Thanks,
Ray

---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 5bed483..711476792 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -330,14 +330,11 @@ static int psp_load_fw(struct amdgpu_device *adev)
 {
int ret;
struct psp_context *psp = &adev->psp;
-   struct psp_gfx_cmd_resp *cmd;
 
-   cmd = kzalloc(sizeof(struct psp_gfx_cmd_resp), GFP_KERNEL);
-   if (!cmd)
+   psp->cmd = kzalloc(sizeof(struct psp_gfx_cmd_resp), GFP_KERNEL);
+   if (!psp->cmd)
return -ENOMEM;
 
-   psp->cmd = cmd;
-
ret = amdgpu_bo_create_kernel(adev, PSP_1_MEG, PSP_1_MEG,
  AMDGPU_GEM_DOMAIN_GTT,
  &psp->fw_pri_bo,
@@ -376,8 +373,6 @@ static int psp_load_fw(struct amdgpu_device *adev)
if (ret)
goto failed_mem;
 
-   kfree(cmd);
-
return 0;
 
 failed_mem:
@@ -387,7 +382,8 @@ static int psp_load_fw(struct amdgpu_device *adev)
amdgpu_bo_free_kernel(&psp->fw_pri_bo,
  &psp->fw_pri_mc_addr, &psp->fw_pri_buf);
 failed:
-   kfree(cmd);
+   kfree(psp->cmd);
+   psp->cmd = NULL;
return ret;
 }
 
@@ -447,6 +443,11 @@ static int psp_hw_fini(void *handle)
amdgpu_bo_free_kernel(&psp->fence_buf_bo,
  &psp->fence_buf_mc_addr, &psp->fence_buf);
 
+   if (psp->cmd) {
+   kfree(psp->cmd);
+   psp->cmd = NULL;
+   }
+
return 0;
 }
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx