RE: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF

Liu, Monk Wed, 07 Nov 2018 01:16:32 -0800

Yeah, we allow max up to 500ms to let RLCV finish the IDLE command for CP/GFX 
and SDMA together,  and this already introduce very poor user experience ...


Looks like this feature doesn't applicable for world switch case 

/Monk
-----Original Message-----
From: Koenig, Christian 
Sent: Wednesday, November 7, 2018 4:48 PM
To: Liu, Monk <monk....@amd.com>; Zhang, Jerry <jerry.zh...@amd.com>; Huang, 
Trigger <trigger.hu...@amd.com>; amd-gfx@lists.freedesktop.org; Deucher, 
Alexander <alexander.deuc...@amd.com>; Kuehling, Felix <felix.kuehl...@amd.com>
Subject: Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF

> is it prepared for PRT (or something like kernel page fault handling 
> on CPU/MMU side)?
That is for providing shared virtual address space (e.g. when the CPU and GPU 
have the same VA view) as well as changing our memory management in general.

> For SRIOV, in theoretically any feature*not* related with hardware 
> scheduling (MES) or OS preemption (buggy with world switch preemption) 
> is welcome to SR-IOV, no reason Not to support it as far as I know, 
> unless not mature enough to enable it
The problem is that recoverable page faults in Vega10 are incompatible with 
SRIOV because a page fault can block the GPU for an undefined amount of time 
and Vega10 can't schedule those away from the hardware.

So the shader thread is blocked and can't be switched away. Under SRIOV that 
would mean that we just get killed by the hypervisor rather soon.

Christian.

Am 07.11.18 um 09:40 schrieb Liu, Monk:
> Hi Christian
>
> Thanks for sharing,
> Do you further know why we need recoverable page faults ? is it prepared for 
> PRT (or something like kernel page fault handling on CPU/MMU side)?
>
> For SRIOV, in theoretically any feature*not* related with hardware 
> scheduling (MES) or OS preemption (buggy with world switch preemption) 
> is welcome to SR-IOV, no reason Not to support it as far as I know, 
> unless not mature enough to enable it
>
> /Monk
>
> -----Original Message-----
> From: Koenig, Christian
> Sent: Wednesday, November 7, 2018 3:30 PM
> To: Liu, Monk <monk....@amd.com>; Zhang, Jerry <jerry.zh...@amd.com>; 
> Huang, Trigger <trigger.hu...@amd.com>; amd-gfx@lists.freedesktop.org; 
> Deucher, Alexander <alexander.deuc...@amd.com>; Kuehling, Felix 
> <felix.kuehl...@amd.com>
> Subject: Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV 
> VF
>
> Hi guys,
>
> this is necessary for recoverable page fault handling.
>
> When the normal SDMA queue is blocked because of a page fault the SDMA 
> firmware will switch to the paging queue so that we are able to handle the 
> fault.
>
> In general it should work on all Vega (but not Raven) components and we are 
> going to need it when we enable recoverable page faults.
>
> The only case I can see where we don't immediately need it is SRIOV, because 
> the current planning is to not support recoverable page faults there.
>
> Christian.
>
> Am 07.11.18 um 08:21 schrieb Liu, Monk:
>> Hi team
>>
>> Why we need this page_queue in amdgpu ?  can anyone share something of its 
>> introduction to the kmd ?
>> According to my understanding , gpu-scheduler already have couple levels of 
>> priority for contexts/entities , thus the job page_queue supposed to do 
>> (should be mapping/unmapping/moving) is already good took care of by 
>> "KERNEL" priority entities, and all other context/entity SDMA jobs will be 
>> handled after "KERNEL" jobs ...
>>
>> So there is no real benefit to introduce page_queue (also for rlc_queue) to 
>> amdgpu with the existence of priority aware gpu-scheduler ... unless we are 
>> going to remove the "KERNEL" priority and always do the mapping/unmapping in 
>> page_queue ...
>>
>> /Monk
>>
>> -----Original Message-----
>> From: amd-gfx <amd-gfx-boun...@lists.freedesktop.org> On Behalf Of 
>> Zhang, Jerry(Junwei)
>> Sent: Wednesday, November 7, 2018 1:26 PM
>> To: Huang, Trigger <trigger.hu...@amd.com>; 
>> amd-gfx@lists.freedesktop.org; Deucher, Alexander 
>> <alexander.deuc...@amd.com>; Koenig, Christian 
>> <christian.koe...@amd.com>; Kuehling, Felix <felix.kuehl...@amd.com>
>> Subject: Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV 
>> VF
>>
>> On 11/7/18 1:15 PM, Trigger Huang wrote:
>>> Currently, SDMA page queue is not used under SR-IOV VF, and this 
>>> queue will cause ring test failure in amdgpu module reload case. So just 
>>> disable it.
>>>
>>> Signed-off-by: Trigger Huang <trigger.hu...@amd.com>
>> Looks we ran into several issues about it on vega.
>> kfd also disabled vega10 for development.(but not sure the detail 
>> issue for them)
>>
>> Thus, we may disable it for vega10 as well?
>> any comment? Alex, Christian, Flex.
>>
>> Regards,
>> Jerry
>>> ---
>>>     drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 +++-
>>>     1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>> index e39a09eb0f..4edc848 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>> @@ -1451,7 +1451,9 @@ static int sdma_v4_0_early_init(void *handle)
>>>                     adev->sdma.has_page_queue = false;
>>>             } else {
>>>                     adev->sdma.num_instances = 2;
>>> -           if (adev->asic_type != CHIP_VEGA20 &&
>>> +           if ((adev->asic_type == CHIP_VEGA10) && amdgpu_sriov_vf((adev)))
>>> +                   adev->sdma.has_page_queue = false;
>>> +           else if (adev->asic_type != CHIP_VEGA20 &&
>>>                                     adev->asic_type != CHIP_VEGA12)
>>>                             adev->sdma.has_page_queue = true;
>>>             }
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF

Reply via email to