[PATCH] drm/amdgpu: replace DRM_ERROR with DRM_WARN in ras_reserve_bad_pages

2019-09-12 Thread Zhou1, Tao
There are two cases of reserve error should be ignored:
1) a ras bad page has been allocated (used by someone);
2) a ras bad page has been reserved (duplicate error injection for one page);

DRM_ERROR is unnecessary for the failure of bad page reserve

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index dc5f94e6118b..0425b74e1a8b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1478,9 +1478,15 @@ int amdgpu_ras_reserve_bad_pages(struct amdgpu_device 
*adev)
for (i = data->last_reserved; i < data->count; i++) {
bp = data->bps[i].retired_page;
 
+   /*
+* There are two cases of reserve error should be ignored:
+* 1) a ras bad page has been allocated (used by someone);
+* 2) a ras bad page has been reserved (duplicate error 
injection
+* for one page);
+*/
if (amdgpu_ras_reserve_vram(adev, bp << PAGE_SHIFT,
PAGE_SIZE, &bo))
-   DRM_ERROR("RAS ERROR: reserve vram %llx fail\n", bp);
+   DRM_WARN("RAS WARN: reserve vram for retired page %llx 
fail\n", bp);
 
data->bps_bo[i] = bo;
data->last_reserved = i + 1;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: replace DRM_ERROR with DRM_WARN in ras_reserve_bad_pages

2019-09-16 Thread Zhou1, Tao
There are two cases of reserve error should be ignored:
1) a ras bad page has been allocated (used by someone);
2) a ras bad page has been reserved (duplicate error injection for one page);

DRM_ERROR is unnecessary for the failure of bad page reserve

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 79e5e5be8b34..5f623daf5078 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1409,10 +1409,15 @@ int amdgpu_ras_reserve_bad_pages(struct amdgpu_device 
*adev)
for (i = data->last_reserved; i < data->count; i++) {
bp = data->bps[i].retired_page;
 
+   /* There are two cases of reserve error should be ignored:
+* 1) a ras bad page has been allocated (used by someone);
+* 2) a ras bad page has been reserved (duplicate error 
injection
+*for one page);
+*/
if (amdgpu_bo_create_kernel_at(adev, bp << PAGE_SHIFT, 
PAGE_SIZE,
   AMDGPU_GEM_DOMAIN_VRAM,
   &bo, NULL))
-   DRM_ERROR("RAS ERROR: reserve vram %llx fail\n", bp);
+   DRM_WARN("RAS WARN: reserve vram for retired page %llx 
fail\n", bp);
 
data->bps_bo[i] = bo;
data->last_reserved = i + 1;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: replace DRM_ERROR with DRM_WARN in ras_reserve_bad_pages

2019-09-16 Thread Chen, Guchun


-Original Message-
From: Zhou1, Tao  
Sent: Tuesday, September 17, 2019 2:25 PM
To: amd-gfx@lists.freedesktop.org; Chen, Guchun ; Zhang, 
Hawking 
Cc: Zhou1, Tao 
Subject: [PATCH] drm/amdgpu: replace DRM_ERROR with DRM_WARN in 
ras_reserve_bad_pages

There are two cases of reserve error should be ignored:
1) a ras bad page has been allocated (used by someone);
2) a ras bad page has been reserved (duplicate error injection for one page);

DRM_ERROR is unnecessary for the failure of bad page reserve

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 79e5e5be8b34..5f623daf5078 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1409,10 +1409,15 @@ int amdgpu_ras_reserve_bad_pages(struct amdgpu_device 
*adev)
for (i = data->last_reserved; i < data->count; i++) {
bp = data->bps[i].retired_page;
 
+   /* There are two cases of reserve error should be ignored:
+* 1) a ras bad page has been allocated (used by someone);
+* 2) a ras bad page has been reserved (duplicate error 
injection
+*for one page);
+*/
if (amdgpu_bo_create_kernel_at(adev, bp << PAGE_SHIFT, 
PAGE_SIZE,
   AMDGPU_GEM_DOMAIN_VRAM,
   &bo, NULL))
[Guchun]Do we need to change PAGE_SHIFT to AMDGPU_GPU_PAGE_SHIFT here?

-   DRM_ERROR("RAS ERROR: reserve vram %llx fail\n", bp);
+   DRM_WARN("RAS WARN: reserve vram for retired page %llx 
fail\n", bp);
 
data->bps_bo[i] = bo;
data->last_reserved = i + 1;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: replace DRM_ERROR with DRM_WARN in ras_reserve_bad_pages

2019-09-17 Thread Zhou1, Tao


> -Original Message-
> From: Chen, Guchun 
> Sent: 2019年9月17日 14:52
> To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org;
> Zhang, Hawking 
> Subject: RE: [PATCH] drm/amdgpu: replace DRM_ERROR with DRM_WARN in
> ras_reserve_bad_pages
> 
> 
> 
> -Original Message-
> From: Zhou1, Tao 
> Sent: Tuesday, September 17, 2019 2:25 PM
> To: amd-gfx@lists.freedesktop.org; Chen, Guchun
> ; Zhang, Hawking 
> Cc: Zhou1, Tao 
> Subject: [PATCH] drm/amdgpu: replace DRM_ERROR with DRM_WARN in
> ras_reserve_bad_pages
> 
> There are two cases of reserve error should be ignored:
> 1) a ras bad page has been allocated (used by someone);
> 2) a ras bad page has been reserved (duplicate error injection for one page);
> 
> DRM_ERROR is unnecessary for the failure of bad page reserve
> 
> Signed-off-by: Tao Zhou 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index 79e5e5be8b34..5f623daf5078 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -1409,10 +1409,15 @@ int amdgpu_ras_reserve_bad_pages(struct
> amdgpu_device *adev)
>   for (i = data->last_reserved; i < data->count; i++) {
>   bp = data->bps[i].retired_page;
> 
> + /* There are two cases of reserve error should be ignored:
> +  * 1) a ras bad page has been allocated (used by someone);
> +  * 2) a ras bad page has been reserved (duplicate error
> injection
> +  *for one page);
> +  */
>   if (amdgpu_bo_create_kernel_at(adev, bp << PAGE_SHIFT,
> PAGE_SIZE,
>  AMDGPU_GEM_DOMAIN_VRAM,
>  &bo, NULL))
> [Guchun]Do we need to change PAGE_SHIFT to AMDGPU_GPU_PAGE_SHIFT
> here?
[Tao] Alex has another patch to fix it, you can find it in mail list.

> 
> - DRM_ERROR("RAS ERROR: reserve vram %llx fail\n",
> bp);
> + DRM_WARN("RAS WARN: reserve vram for retired
> page %llx fail\n", bp);
> 
>   data->bps_bo[i] = bo;
>   data->last_reserved = i + 1;
> --
> 2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: replace DRM_ERROR with DRM_WARN in ras_reserve_bad_pages

2019-09-17 Thread Chen, Guchun
Yeah, that's fine.

Reviewed-by: Guchun Chen 


-Original Message-
From: Zhou1, Tao  
Sent: Tuesday, September 17, 2019 3:01 PM
To: Chen, Guchun ; amd-gfx@lists.freedesktop.org; Zhang, 
Hawking 
Subject: RE: [PATCH] drm/amdgpu: replace DRM_ERROR with DRM_WARN in 
ras_reserve_bad_pages



> -Original Message-
> From: Chen, Guchun 
> Sent: 2019年9月17日 14:52
> To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org; 
> Zhang, Hawking 
> Subject: RE: [PATCH] drm/amdgpu: replace DRM_ERROR with DRM_WARN in 
> ras_reserve_bad_pages
> 
> 
> 
> -Original Message-
> From: Zhou1, Tao 
> Sent: Tuesday, September 17, 2019 2:25 PM
> To: amd-gfx@lists.freedesktop.org; Chen, Guchun ; 
> Zhang, Hawking 
> Cc: Zhou1, Tao 
> Subject: [PATCH] drm/amdgpu: replace DRM_ERROR with DRM_WARN in 
> ras_reserve_bad_pages
> 
> There are two cases of reserve error should be ignored:
> 1) a ras bad page has been allocated (used by someone);
> 2) a ras bad page has been reserved (duplicate error injection for one 
> page);
> 
> DRM_ERROR is unnecessary for the failure of bad page reserve
> 
> Signed-off-by: Tao Zhou 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index 79e5e5be8b34..5f623daf5078 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -1409,10 +1409,15 @@ int amdgpu_ras_reserve_bad_pages(struct
> amdgpu_device *adev)
>   for (i = data->last_reserved; i < data->count; i++) {
>   bp = data->bps[i].retired_page;
> 
> + /* There are two cases of reserve error should be ignored:
> +  * 1) a ras bad page has been allocated (used by someone);
> +  * 2) a ras bad page has been reserved (duplicate error
> injection
> +  *for one page);
> +  */
>   if (amdgpu_bo_create_kernel_at(adev, bp << PAGE_SHIFT, 
> PAGE_SIZE,
>  AMDGPU_GEM_DOMAIN_VRAM,
>  &bo, NULL))
> [Guchun]Do we need to change PAGE_SHIFT to AMDGPU_GPU_PAGE_SHIFT here?
[Tao] Alex has another patch to fix it, you can find it in mail list.

> 
> - DRM_ERROR("RAS ERROR: reserve vram %llx fail\n",
> bp);
> + DRM_WARN("RAS WARN: reserve vram for retired
> page %llx fail\n", bp);
> 
>   data->bps_bo[i] = bo;
>   data->last_reserved = i + 1;
> --
> 2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx