Re: [PATCH] drm/amdgpu: fix sdma v4 ring is disabled accidently

2018-10-22 Thread Koenig, Christian
Mhm, good catch.

And yes using the paging queue when it is available sounds like a good 
idea to me as well.

So far I've only used it for VM updates to actually test if it works as 
expected.

Regards,
Christian.

Am 19.10.18 um 21:53 schrieb Kuehling, Felix:
> [+Christian]
>
> Should the buffer funcs also use the paging ring? I think that would be
> important for being able to clear page tables or migrating a BO while
> handling a page fault.
>
> Regards,
>    Felix
>
> On 2018-10-19 3:13 p.m., Yang, Philip wrote:
>> For sdma v4, there is bug caused by
>> commit d4e869b6b5d6 ("drm/amdgpu: add ring test for page queue")'
>>
>> local variable ring is reused and changed, so 
>> amdgpu_ttm_set_buffer_funcs_status(adev, true)
>> is skipped accidently. As a result, amdgpu_fill_buffer() will fail, kernel 
>> message:
>>
>> [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear memory with ring 
>> turned off.
>> [   25.260444] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear 
>> memory with ring turned off.
>> [   25.260627] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear 
>> memory with ring turned off.
>> [   25.290119] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear 
>> memory with ring turned off.
>> [   25.290370] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear 
>> memory with ring turned off.
>> [   25.319971] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear 
>> memory with ring turned off.
>> [   25.320486] amdgpu :19:00.0: [mmhub] VMC page fault (src_id:0 
>> ring:154 vmid:8 pasid:32768, for process  pid 0 thread  pid 0)
>> [   25.320533] amdgpu :19:00.0:   in page starting at address 
>> 0x from 18
>> [   25.320563] amdgpu :19:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00800134
>>
>> Change-Id: Idacdf8e60557edb0a4a499aa4051b75d87ce4091
>> Signed-off-by: Philip Yang 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 7 ---
>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
>> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> index ede149a..cd368ac 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> @@ -1151,10 +1151,11 @@ static int sdma_v4_0_start(struct amdgpu_device 
>> *adev)
>>  }
>>   
>>  if (adev->sdma.has_page_queue) {
>> -ring = &adev->sdma.instance[i].page;
>> -r = amdgpu_ring_test_ring(ring);
>> +struct amdgpu_ring *page = &adev->sdma.instance[i].page;
>> +
>> +r = amdgpu_ring_test_ring(page);
>>  if (r) {
>> -ring->ready = false;
>> +page->ready = false;
>>  return r;
>>  }
>>  }

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: fix sdma v4 ring is disabled accidently

2018-10-19 Thread Deucher, Alexander
Reviewed-by: Alex Deucher 


From: amd-gfx  on behalf of Yang, Philip 

Sent: Friday, October 19, 2018 3:13:56 PM
To: amd-gfx@lists.freedesktop.org
Cc: Yang, Philip
Subject: [PATCH] drm/amdgpu: fix sdma v4 ring is disabled accidently

For sdma v4, there is bug caused by
commit d4e869b6b5d6 ("drm/amdgpu: add ring test for page queue")'

local variable ring is reused and changed, so 
amdgpu_ttm_set_buffer_funcs_status(adev, true)
is skipped accidently. As a result, amdgpu_fill_buffer() will fail, kernel 
message:

[drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear memory with ring 
turned off.
[   25.260444] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear memory 
with ring turned off.
[   25.260627] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear memory 
with ring turned off.
[   25.290119] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear memory 
with ring turned off.
[   25.290370] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear memory 
with ring turned off.
[   25.319971] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear memory 
with ring turned off.
[   25.320486] amdgpu :19:00.0: [mmhub] VMC page fault (src_id:0 ring:154 
vmid:8 pasid:32768, for process  pid 0 thread  pid 0)
[   25.320533] amdgpu :19:00.0:   in page starting at address 
0x from 18
[   25.320563] amdgpu :19:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00800134

Change-Id: Idacdf8e60557edb0a4a499aa4051b75d87ce4091
Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index ede149a..cd368ac 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -1151,10 +1151,11 @@ static int sdma_v4_0_start(struct amdgpu_device *adev)
 }

 if (adev->sdma.has_page_queue) {
-   ring = &adev->sdma.instance[i].page;
-   r = amdgpu_ring_test_ring(ring);
+   struct amdgpu_ring *page = &adev->sdma.instance[i].page;
+
+   r = amdgpu_ring_test_ring(page);
 if (r) {
-   ring->ready = false;
+   page->ready = false;
 return r;
 }
 }
--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: fix sdma v4 ring is disabled accidently

2018-10-19 Thread Kuehling, Felix
[+Christian]

Should the buffer funcs also use the paging ring? I think that would be
important for being able to clear page tables or migrating a BO while
handling a page fault.

Regards,
  Felix

On 2018-10-19 3:13 p.m., Yang, Philip wrote:
> For sdma v4, there is bug caused by
> commit d4e869b6b5d6 ("drm/amdgpu: add ring test for page queue")'
>
> local variable ring is reused and changed, so 
> amdgpu_ttm_set_buffer_funcs_status(adev, true)
> is skipped accidently. As a result, amdgpu_fill_buffer() will fail, kernel 
> message:
>
> [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear memory with ring 
> turned off.
> [   25.260444] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear 
> memory with ring turned off.
> [   25.260627] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear 
> memory with ring turned off.
> [   25.290119] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear 
> memory with ring turned off.
> [   25.290370] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear 
> memory with ring turned off.
> [   25.319971] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear 
> memory with ring turned off.
> [   25.320486] amdgpu :19:00.0: [mmhub] VMC page fault (src_id:0 ring:154 
> vmid:8 pasid:32768, for process  pid 0 thread  pid 0)
> [   25.320533] amdgpu :19:00.0:   in page starting at address 
> 0x from 18
> [   25.320563] amdgpu :19:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00800134
>
> Change-Id: Idacdf8e60557edb0a4a499aa4051b75d87ce4091
> Signed-off-by: Philip Yang 
> ---
>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> index ede149a..cd368ac 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> @@ -1151,10 +1151,11 @@ static int sdma_v4_0_start(struct amdgpu_device *adev)
>   }
>  
>   if (adev->sdma.has_page_queue) {
> - ring = &adev->sdma.instance[i].page;
> - r = amdgpu_ring_test_ring(ring);
> + struct amdgpu_ring *page = &adev->sdma.instance[i].page;
> +
> + r = amdgpu_ring_test_ring(page);
>   if (r) {
> - ring->ready = false;
> + page->ready = false;
>   return r;
>   }
>   }
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: fix sdma v4 ring is disabled accidently

2018-10-19 Thread Yang, Philip
For sdma v4, there is bug caused by
commit d4e869b6b5d6 ("drm/amdgpu: add ring test for page queue")'

local variable ring is reused and changed, so 
amdgpu_ttm_set_buffer_funcs_status(adev, true)
is skipped accidently. As a result, amdgpu_fill_buffer() will fail, kernel 
message:

[drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear memory with ring 
turned off.
[   25.260444] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear memory 
with ring turned off.
[   25.260627] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear memory 
with ring turned off.
[   25.290119] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear memory 
with ring turned off.
[   25.290370] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear memory 
with ring turned off.
[   25.319971] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear memory 
with ring turned off.
[   25.320486] amdgpu :19:00.0: [mmhub] VMC page fault (src_id:0 ring:154 
vmid:8 pasid:32768, for process  pid 0 thread  pid 0)
[   25.320533] amdgpu :19:00.0:   in page starting at address 
0x from 18
[   25.320563] amdgpu :19:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00800134

Change-Id: Idacdf8e60557edb0a4a499aa4051b75d87ce4091
Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index ede149a..cd368ac 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -1151,10 +1151,11 @@ static int sdma_v4_0_start(struct amdgpu_device *adev)
}
 
if (adev->sdma.has_page_queue) {
-   ring = &adev->sdma.instance[i].page;
-   r = amdgpu_ring_test_ring(ring);
+   struct amdgpu_ring *page = &adev->sdma.instance[i].page;
+
+   r = amdgpu_ring_test_ring(page);
if (r) {
-   ring->ready = false;
+   page->ready = false;
return r;
}
}
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx