Re: [PATCH 2/2] drm/amdkfd: Fix resource cursor initialization

2021-03-15 Thread Christian König



Am 15.03.21 um 14:08 schrieb Felix Kuehling:

Am 2021-03-15 um 6:22 a.m. schrieb Christian König:

Am 13.03.21 um 03:43 schrieb Felix Kuehling:

Make sure the cur->size doesn't exceed cur->remaining. Otherwise the
first call to amdgpu_res_next will trigger the BUG_ON in that function.

Mhm the BUG_ON is correct since the function complains that we want to
move the cursor forward by more than originally expected.

The problem is rather that somebody is using cur->size which is the
size of the current segment as parameter for amdgpu_res_next().

Do you have a backtrace of this?

I didn't save the backtrace. The problem was in
amdgpu_vm_bo_update_mapping. num_entries is based on cursor.size and
later used in the amdpu_res_next call.


Yeah, found that in the meantime as well.


But I think that should be OK. If cursor.size is the current segment
size, it should not exceed cursor.remaining. Otherwise every user of the
cursor would have to check both cursor.size and cursor.remaining all the
time, which would be inconvenient. amdgpu_res_next ensures that with
cur->size = min(node->size << PAGE_SHIFT, cur->remaining). I think
amdgpu_res_first should do the same.


Ok, good point. Patch is Reviewed-by: Christian König 
 then.


Regards,
Christian.



Regards,
   Felix



Thanks,
Christian.


Fixes: 3af0a018a728 ("drm/amdgpu: new resource cursor")
CC: Christian König 
Signed-off-by: Felix Kuehling 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
index 1335e098510f..b49a61d07d60 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
@@ -68,7 +68,7 @@ static inline void amdgpu_res_first(struct
ttm_resource *res,
   start -= node++->size << PAGE_SHIFT;
     cur->start = (node->start << PAGE_SHIFT) + start;
-    cur->size = (node->size << PAGE_SHIFT) - start;
+    cur->size = min((node->size << PAGE_SHIFT) - start, size);
   cur->remaining = size;
   cur->node = node;
   }


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drm/amdkfd: Fix resource cursor initialization

2021-03-15 Thread Felix Kuehling

Am 2021-03-15 um 6:22 a.m. schrieb Christian König:
> Am 13.03.21 um 03:43 schrieb Felix Kuehling:
>> Make sure the cur->size doesn't exceed cur->remaining. Otherwise the
>> first call to amdgpu_res_next will trigger the BUG_ON in that function.
>
> Mhm the BUG_ON is correct since the function complains that we want to
> move the cursor forward by more than originally expected.
>
> The problem is rather that somebody is using cur->size which is the
> size of the current segment as parameter for amdgpu_res_next().
>
> Do you have a backtrace of this?
I didn't save the backtrace. The problem was in
amdgpu_vm_bo_update_mapping. num_entries is based on cursor.size and
later used in the amdpu_res_next call.

But I think that should be OK. If cursor.size is the current segment
size, it should not exceed cursor.remaining. Otherwise every user of the
cursor would have to check both cursor.size and cursor.remaining all the
time, which would be inconvenient. amdgpu_res_next ensures that with
cur->size = min(node->size << PAGE_SHIFT, cur->remaining). I think
amdgpu_res_first should do the same.

Regards,
  Felix


>
> Thanks,
> Christian.
>
>>
>> Fixes: 3af0a018a728 ("drm/amdgpu: new resource cursor")
>> CC: Christian König 
>> Signed-off-by: Felix Kuehling 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
>> index 1335e098510f..b49a61d07d60 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
>> @@ -68,7 +68,7 @@ static inline void amdgpu_res_first(struct
>> ttm_resource *res,
>>   start -= node++->size << PAGE_SHIFT;
>>     cur->start = (node->start << PAGE_SHIFT) + start;
>> -    cur->size = (node->size << PAGE_SHIFT) - start;
>> +    cur->size = min((node->size << PAGE_SHIFT) - start, size);
>>   cur->remaining = size;
>>   cur->node = node;
>>   }
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drm/amdkfd: Fix resource cursor initialization

2021-03-15 Thread Christian König

Am 13.03.21 um 03:43 schrieb Felix Kuehling:

Make sure the cur->size doesn't exceed cur->remaining. Otherwise the
first call to amdgpu_res_next will trigger the BUG_ON in that function.


Mhm the BUG_ON is correct since the function complains that we want to 
move the cursor forward by more than originally expected.


The problem is rather that somebody is using cur->size which is the size 
of the current segment as parameter for amdgpu_res_next().


Do you have a backtrace of this?

Thanks,
Christian.



Fixes: 3af0a018a728 ("drm/amdgpu: new resource cursor")
CC: Christian König 
Signed-off-by: Felix Kuehling 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
index 1335e098510f..b49a61d07d60 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
@@ -68,7 +68,7 @@ static inline void amdgpu_res_first(struct ttm_resource *res,
start -= node++->size << PAGE_SHIFT;
  
  	cur->start = (node->start << PAGE_SHIFT) + start;

-   cur->size = (node->size << PAGE_SHIFT) - start;
+   cur->size = min((node->size << PAGE_SHIFT) - start, size);
cur->remaining = size;
cur->node = node;
  }


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/2] drm/amdkfd: Fix resource cursor initialization

2021-03-12 Thread Felix Kuehling
Make sure the cur->size doesn't exceed cur->remaining. Otherwise the
first call to amdgpu_res_next will trigger the BUG_ON in that function.

Fixes: 3af0a018a728 ("drm/amdgpu: new resource cursor")
CC: Christian König 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
index 1335e098510f..b49a61d07d60 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
@@ -68,7 +68,7 @@ static inline void amdgpu_res_first(struct ttm_resource *res,
start -= node++->size << PAGE_SHIFT;
 
cur->start = (node->start << PAGE_SHIFT) + start;
-   cur->size = (node->size << PAGE_SHIFT) - start;
+   cur->size = min((node->size << PAGE_SHIFT) - start, size);
cur->remaining = size;
cur->node = node;
 }
-- 
2.30.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx