Re: [patch] problems with "fix visible VRAM handling during faults"

2024-05-09 Thread Jeremy Day
Christian König  wrote:
> Am 08.05.24 um 12:17 schrieb Michel Dänzer:
> > Does this instead of your patch help by any chance?
> >
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > index 109fe557a02b..29c197c00018 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > @@ -427,7 +427,7 @@ bool amdgpu_res_cpu_visible(struct amdgpu_device *adev,
> >
> >  amdgpu_res_first(res, 0, res->size, );
> >  while (cursor.remaining) {
> > -   if ((cursor.start + cursor.size) >= 
> > adev->gmc.visible_vram_size)
> > +   if ((cursor.start + cursor.size) > 
> > adev->gmc.visible_vram_size)
> 
> Oh, good catch. Yes that might be it.

Yes, that does it.  Thanks!


Re: [patch] problems with "fix visible VRAM handling during faults"

2024-05-08 Thread Christian König

Am 08.05.24 um 12:17 schrieb Michel Dänzer:

On 2024-05-07 18:39, Jeremy Day wrote:

This is just to report that I've had usually well-behaved applications
sometimes having problems with memory access violations since kernel
version 6.9-rc5.  This past weekend I stumbled across a way to reliably
reproduce the problem in the form of a Skyrim save file which causes a
crash shortly after loading the game on affected kernels.

Things go back to running smoothly only if I revert one of the changes
in 5th April's "[PATCH] drm/amdgpu: fix visible VRAM handling during
faults" as follows.

Patch is against v6.9-rc7.  It restores the check for partially
visible-to-cpu memory in amdgpu_bo_fault_reserve_notify.  Things
seem stable again with this change.

Does this instead of your patch help by any chance?


diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 109fe557a02b..29c197c00018 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -427,7 +427,7 @@ bool amdgpu_res_cpu_visible(struct amdgpu_device *adev,

 amdgpu_res_first(res, 0, res->size, );
 while (cursor.remaining) {
-   if ((cursor.start + cursor.size) >= adev->gmc.visible_vram_size)
+   if ((cursor.start + cursor.size) > adev->gmc.visible_vram_size)


Oh, good catch. Yes that might be it.

Thanks a lot,
Christian.


 return false;
 amdgpu_res_next(, cursor.size);
 }






Re: [patch] problems with "fix visible VRAM handling during faults"

2024-05-08 Thread Michel Dänzer
On 2024-05-07 18:39, Jeremy Day wrote:
> This is just to report that I've had usually well-behaved applications
> sometimes having problems with memory access violations since kernel
> version 6.9-rc5.  This past weekend I stumbled across a way to reliably
> reproduce the problem in the form of a Skyrim save file which causes a
> crash shortly after loading the game on affected kernels.
> 
> Things go back to running smoothly only if I revert one of the changes
> in 5th April's "[PATCH] drm/amdgpu: fix visible VRAM handling during
> faults" as follows.
> 
> Patch is against v6.9-rc7.  It restores the check for partially
> visible-to-cpu memory in amdgpu_bo_fault_reserve_notify.  Things
> seem stable again with this change.

Does this instead of your patch help by any chance?


diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 109fe557a02b..29c197c00018 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -427,7 +427,7 @@ bool amdgpu_res_cpu_visible(struct amdgpu_device *adev,

amdgpu_res_first(res, 0, res->size, );
while (cursor.remaining) {
-   if ((cursor.start + cursor.size) >= adev->gmc.visible_vram_size)
+   if ((cursor.start + cursor.size) > adev->gmc.visible_vram_size)
return false;
amdgpu_res_next(, cursor.size);
}


-- 
Earthling Michel Dänzer|  https://redhat.com
Libre software enthusiast  | Mesa and Xwayland developer