[PATCH] drm/amdgpu: replace DRM_ERROR with DRM_WARN in ras_reserve_bad_pages
There are two cases of reserve error should be ignored: 1) a ras bad page has been allocated (used by someone); 2) a ras bad page has been reserved (duplicate error injection for one page); DRM_ERROR is unnecessary for the failure of bad page reserve Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index dc5f94e6118b..0425b74e1a8b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -1478,9 +1478,15 @@ int amdgpu_ras_reserve_bad_pages(struct amdgpu_device *adev) for (i = data->last_reserved; i < data->count; i++) { bp = data->bps[i].retired_page; + /* +* There are two cases of reserve error should be ignored: +* 1) a ras bad page has been allocated (used by someone); +* 2) a ras bad page has been reserved (duplicate error injection +* for one page); +*/ if (amdgpu_ras_reserve_vram(adev, bp << PAGE_SHIFT, PAGE_SIZE, &bo)) - DRM_ERROR("RAS ERROR: reserve vram %llx fail\n", bp); + DRM_WARN("RAS WARN: reserve vram for retired page %llx fail\n", bp); data->bps_bo[i] = bo; data->last_reserved = i + 1; -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu: replace DRM_ERROR with DRM_WARN in ras_reserve_bad_pages
There are two cases of reserve error should be ignored: 1) a ras bad page has been allocated (used by someone); 2) a ras bad page has been reserved (duplicate error injection for one page); DRM_ERROR is unnecessary for the failure of bad page reserve Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 79e5e5be8b34..5f623daf5078 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -1409,10 +1409,15 @@ int amdgpu_ras_reserve_bad_pages(struct amdgpu_device *adev) for (i = data->last_reserved; i < data->count; i++) { bp = data->bps[i].retired_page; + /* There are two cases of reserve error should be ignored: +* 1) a ras bad page has been allocated (used by someone); +* 2) a ras bad page has been reserved (duplicate error injection +*for one page); +*/ if (amdgpu_bo_create_kernel_at(adev, bp << PAGE_SHIFT, PAGE_SIZE, AMDGPU_GEM_DOMAIN_VRAM, &bo, NULL)) - DRM_ERROR("RAS ERROR: reserve vram %llx fail\n", bp); + DRM_WARN("RAS WARN: reserve vram for retired page %llx fail\n", bp); data->bps_bo[i] = bo; data->last_reserved = i + 1; -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH] drm/amdgpu: replace DRM_ERROR with DRM_WARN in ras_reserve_bad_pages
-Original Message- From: Zhou1, Tao Sent: Tuesday, September 17, 2019 2:25 PM To: amd-gfx@lists.freedesktop.org; Chen, Guchun ; Zhang, Hawking Cc: Zhou1, Tao Subject: [PATCH] drm/amdgpu: replace DRM_ERROR with DRM_WARN in ras_reserve_bad_pages There are two cases of reserve error should be ignored: 1) a ras bad page has been allocated (used by someone); 2) a ras bad page has been reserved (duplicate error injection for one page); DRM_ERROR is unnecessary for the failure of bad page reserve Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 79e5e5be8b34..5f623daf5078 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -1409,10 +1409,15 @@ int amdgpu_ras_reserve_bad_pages(struct amdgpu_device *adev) for (i = data->last_reserved; i < data->count; i++) { bp = data->bps[i].retired_page; + /* There are two cases of reserve error should be ignored: +* 1) a ras bad page has been allocated (used by someone); +* 2) a ras bad page has been reserved (duplicate error injection +*for one page); +*/ if (amdgpu_bo_create_kernel_at(adev, bp << PAGE_SHIFT, PAGE_SIZE, AMDGPU_GEM_DOMAIN_VRAM, &bo, NULL)) [Guchun]Do we need to change PAGE_SHIFT to AMDGPU_GPU_PAGE_SHIFT here? - DRM_ERROR("RAS ERROR: reserve vram %llx fail\n", bp); + DRM_WARN("RAS WARN: reserve vram for retired page %llx fail\n", bp); data->bps_bo[i] = bo; data->last_reserved = i + 1; -- 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH] drm/amdgpu: replace DRM_ERROR with DRM_WARN in ras_reserve_bad_pages
> -Original Message- > From: Chen, Guchun > Sent: 2019年9月17日 14:52 > To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org; > Zhang, Hawking > Subject: RE: [PATCH] drm/amdgpu: replace DRM_ERROR with DRM_WARN in > ras_reserve_bad_pages > > > > -Original Message- > From: Zhou1, Tao > Sent: Tuesday, September 17, 2019 2:25 PM > To: amd-gfx@lists.freedesktop.org; Chen, Guchun > ; Zhang, Hawking > Cc: Zhou1, Tao > Subject: [PATCH] drm/amdgpu: replace DRM_ERROR with DRM_WARN in > ras_reserve_bad_pages > > There are two cases of reserve error should be ignored: > 1) a ras bad page has been allocated (used by someone); > 2) a ras bad page has been reserved (duplicate error injection for one page); > > DRM_ERROR is unnecessary for the failure of bad page reserve > > Signed-off-by: Tao Zhou > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 7 ++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > index 79e5e5be8b34..5f623daf5078 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > @@ -1409,10 +1409,15 @@ int amdgpu_ras_reserve_bad_pages(struct > amdgpu_device *adev) > for (i = data->last_reserved; i < data->count; i++) { > bp = data->bps[i].retired_page; > > + /* There are two cases of reserve error should be ignored: > + * 1) a ras bad page has been allocated (used by someone); > + * 2) a ras bad page has been reserved (duplicate error > injection > + *for one page); > + */ > if (amdgpu_bo_create_kernel_at(adev, bp << PAGE_SHIFT, > PAGE_SIZE, > AMDGPU_GEM_DOMAIN_VRAM, > &bo, NULL)) > [Guchun]Do we need to change PAGE_SHIFT to AMDGPU_GPU_PAGE_SHIFT > here? [Tao] Alex has another patch to fix it, you can find it in mail list. > > - DRM_ERROR("RAS ERROR: reserve vram %llx fail\n", > bp); > + DRM_WARN("RAS WARN: reserve vram for retired > page %llx fail\n", bp); > > data->bps_bo[i] = bo; > data->last_reserved = i + 1; > -- > 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH] drm/amdgpu: replace DRM_ERROR with DRM_WARN in ras_reserve_bad_pages
Yeah, that's fine. Reviewed-by: Guchun Chen -Original Message- From: Zhou1, Tao Sent: Tuesday, September 17, 2019 3:01 PM To: Chen, Guchun ; amd-gfx@lists.freedesktop.org; Zhang, Hawking Subject: RE: [PATCH] drm/amdgpu: replace DRM_ERROR with DRM_WARN in ras_reserve_bad_pages > -Original Message- > From: Chen, Guchun > Sent: 2019年9月17日 14:52 > To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org; > Zhang, Hawking > Subject: RE: [PATCH] drm/amdgpu: replace DRM_ERROR with DRM_WARN in > ras_reserve_bad_pages > > > > -Original Message- > From: Zhou1, Tao > Sent: Tuesday, September 17, 2019 2:25 PM > To: amd-gfx@lists.freedesktop.org; Chen, Guchun ; > Zhang, Hawking > Cc: Zhou1, Tao > Subject: [PATCH] drm/amdgpu: replace DRM_ERROR with DRM_WARN in > ras_reserve_bad_pages > > There are two cases of reserve error should be ignored: > 1) a ras bad page has been allocated (used by someone); > 2) a ras bad page has been reserved (duplicate error injection for one > page); > > DRM_ERROR is unnecessary for the failure of bad page reserve > > Signed-off-by: Tao Zhou > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 7 ++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > index 79e5e5be8b34..5f623daf5078 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > @@ -1409,10 +1409,15 @@ int amdgpu_ras_reserve_bad_pages(struct > amdgpu_device *adev) > for (i = data->last_reserved; i < data->count; i++) { > bp = data->bps[i].retired_page; > > + /* There are two cases of reserve error should be ignored: > + * 1) a ras bad page has been allocated (used by someone); > + * 2) a ras bad page has been reserved (duplicate error > injection > + *for one page); > + */ > if (amdgpu_bo_create_kernel_at(adev, bp << PAGE_SHIFT, > PAGE_SIZE, > AMDGPU_GEM_DOMAIN_VRAM, > &bo, NULL)) > [Guchun]Do we need to change PAGE_SHIFT to AMDGPU_GPU_PAGE_SHIFT here? [Tao] Alex has another patch to fix it, you can find it in mail list. > > - DRM_ERROR("RAS ERROR: reserve vram %llx fail\n", > bp); > + DRM_WARN("RAS WARN: reserve vram for retired > page %llx fail\n", bp); > > data->bps_bo[i] = bo; > data->last_reserved = i + 1; > -- > 2.17.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx