amdgpu: invalidate mmhub semphore workaround in amdgpu_virt

Liu, Monk Wed, 20 Nov 2019 07:00:02 -0800

>>the KIQ is used to invalidate both the GFXHUB as well as the MMHUB on Vega.


I know,

> +     /* a read return value of 1 means semaphore acuqire */
> +     if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
> +         ring->funcs->vmhub == AMDGPU_MMHUB_1)
> +     amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);

But ring->funcs->vmhub wil always be AMDGPU_GFXHUB, right ? since this ring is 
from "&kiq->ring" ? 


>> Yes, agree. But since we now knew that we won't need that we can just drop 
>> this patch altogether.

Yeah, the semaphore wrapping is in PATCH 2/2, agree that this PATCH 1/2 could 
be dropped 


-----邮件原件-----
发件人: Christian König <ckoenig.leichtzumer...@gmail.com> 
发送时间: 2019年11月20日 22:39
收件人: Liu, Monk <monk....@amd.com>; Zhu, Changfeng <changfeng....@amd.com>; 
Koenig, Christian <christian.koe...@amd.com>; Xiao, Jack <jack.x...@amd.com>; 
Zhou1, Tao <tao.zh...@amd.com>; Huang, Ray <ray.hu...@amd.com>; Huang, Shimmer 
<xinmei.hu...@amd.com>; amd-gfx@lists.freedesktop.org
主题: Re: 答复: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore 
workaround in amdgpu_virt

Hi Monk,

the KIQ is used to invalidate both the GFXHUB as well as the MMHUB on Vega.

> Besides, amdgpu_virt_kiq_reg_write_reg_wait() is not deadly a helper 
> function that only serve VM invalidate, so I don't think You should 
> put the semaphore read/write in this routine, instead you can put 
> semaphore r/w out side of this routine and only Put them around the VM 
> invalidate logic
Yes, agree. But since we now knew that we won't need that we can just drop this 
patch altogether.

Regards,
Christian.

Am 20.11.19 um 15:30 schrieb Liu, Monk:
> Thanks for sharing this JIR
>
> now I got the picture of this issue from you and Christian.
>
> So the semaphore grabbing can prevent RTL to power off the MMHUB, I 
> see
>
> The practice is that SRIOV won't enable PG at all (even our GIM driver 
> won't enable PG, maybe in future we would enable it )
>
> I think I don't have too many concern about your patches,
>
> But I have comments on your patch 1:
>
> void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
>                                       uint32_t reg0, uint32_t reg1,
> -                                     uint32_t ref, uint32_t mask)
> +                                     uint32_t ref, uint32_t mask,
> +                                     uint32_t sem)
>   {
>       struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>       struct amdgpu_ring *ring = &kiq->ring; @@ -144,9 +145,30 @@ void 
> amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
>       uint32_t seq;
>   
>       spin_lock_irqsave(&kiq->ring_lock, flags);
> -     amdgpu_ring_alloc(ring, 32);
> +     amdgpu_ring_alloc(ring, 60);
> +
> +     /*
> +      * It may lose gpuvm invalidate acknowldege state across power-gating
> +      * off cycle, add semaphore acquire before invalidation and semaphore
> +      * release after invalidation to avoid entering power gated state
> +      * to WA the Issue
> +      */
> +
> +     /* a read return value of 1 means semaphore acuqire */
> +     if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
> +         ring->funcs->vmhub == AMDGPU_MMHUB_1)
> +     amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);
>
>
> See that in this routine, the ring is always KIQ, so below code looks 
> redundant :
>
> +     /* a read return value of 1 means semaphore acuqire */
> +     if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
> +         ring->funcs->vmhub == AMDGPU_MMHUB_1)
> +     amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);
>
> Besides, amdgpu_virt_kiq_reg_write_reg_wait() is not deadly a helper 
> function that only serve VM invalidate, so I don't think You should 
> put the semaphore read/write in this routine, instead you can put 
> semaphore r/w out side of this routine and only Put them around the VM 
> invalidate logic
>
> Thanks
>
> -----邮件原件-----
> 发件人: Zhu, Changfeng <changfeng....@amd.com>
> 发送时间: 2019年11月20日 22:17
> 收件人: Koenig, Christian <christian.koe...@amd.com>; Liu, Monk 
> <monk....@amd.com>; Xiao, Jack <jack.x...@amd.com>; Zhou1, Tao 
> <tao.zh...@amd.com>; Huang, Ray <ray.hu...@amd.com>; Huang, Shimmer 
> <xinmei.hu...@amd.com>; amd-gfx@lists.freedesktop.org
> 主题: RE: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore 
> workaround in amdgpu_virt
>
>>>> Did Changfeng already hit this issue under SRIOV ???
> I meet this problem on navi14 under gmc_v10_0_emit_flush_gpu_tlb .
> The problem is also seen by Zhou,Tao.
>
> And this is ticket:
> http://ontrack-internal.amd.com/browse/SWDEV-201459
>
> After the semaphore patch, the problem can be fixed.
>
> If SROV has concern about this problem,  it should not add semaphore in SROV.
>
> However, we should apply semaphore for gmc_v9_0_flush_gpu_tlb/ 
> gmc_v9_0_emit_flush_gpu_tlb/ gmc_v10_0_flush_gpu_tlb/ 
> gmc_v10_0_emit_flush_gpu_tlb
>
> Or how can we handle the ticket above?
>
> BR,
> Changfeng.
>
> -----Original Message-----
> From: Christian König <ckoenig.leichtzumer...@gmail.com>
> Sent: Wednesday, November 20, 2019 10:00 PM
> To: Liu, Monk <monk....@amd.com>; Koenig, Christian 
> <christian.koe...@amd.com>; Zhu, Changfeng <changfeng....@amd.com>; 
> Xiao, Jack <jack.x...@amd.com>; Zhou1, Tao <tao.zh...@amd.com>; Huang, 
> Ray <ray.hu...@amd.com>; Huang, Shimmer <xinmei.hu...@amd.com>; 
> amd-gfx@lists.freedesktop.org
> Subject: Re: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore 
> workaround in amdgpu_virt
>
>> Did Changfeng already hit this issue under SRIOV ?
> I don't think so, but Changfeng needs to answer this.
>
> Question is does the extra semaphore acquire has some negative effect on 
> SRIOV?
>
> I would like to avoid having even more SRIOV specific handling in here which 
> we can't really test on bare metal.
>
> Christian.
>
> Am 20.11.19 um 14:54 schrieb Liu, Monk:
>> Hah, but in SRIOV case, our guest KMD driver is not allowed to do 
>> such things .... (and even there is a bug that KMD try to power gate, 
>> the SMU firmware would not really do the jobs since We have PSP L1 
>> policy to prevent those danger operations )
>>
>> Did Changfeng already hit this issue under SRIOV ???
>>
>> -----邮件原件-----
>> 发件人: Koenig, Christian <christian.koe...@amd.com>
>> 发送时间: 2019年11月20日 21:21
>> 收件人: Liu, Monk <monk....@amd.com>; Zhu, Changfeng 
>> <changfeng....@amd.com>; Xiao, Jack <jack.x...@amd.com>; Zhou1, Tao 
>> <tao.zh...@amd.com>; Huang, Ray <ray.hu...@amd.com>; Huang, Shimmer 
>> <xinmei.hu...@amd.com>; amd-gfx@lists.freedesktop.org
>> 主题: Re: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore 
>> workaround in amdgpu_virt
>>
>> Hi Monk,
>>
>> this is a fix for power gating the MMHUB.
>>
>> Basic problem is that the MMHUB can power gate while an invalidation is in 
>> progress which looses all bits in the ACK register and so deadlocks the 
>> engine waiting for the invalidation to finish.
>>
>> This bug is hit immediately when we enable power gating of the MMHUB.
>>
>> Regards,
>> Christian.
>>
>> Am 20.11.19 um 14:18 schrieb Liu, Monk:
>>> Hi Changfeng
>>>
>>> Firs of all, there is no power-gating off circle involved in AMDGPU 
>>> SRIOV, since we don't allow VF/VM do such things so I do feel 
>>> strange why you post something like this Especially on VEGA10 
>>> serials which looks doesn't have any issue on those gpu_flush part
>>>
>>> Here is my questions for you:
>>> 1) Can you point me what issue had you been experienced ? and how to 
>>> repro the bug
>>> 2) if you do hit some issues, did you verified that your patch can fix it ?
>>>
>>> besides
>>>
>>> /Monk
>>>
>>> -----邮件原件-----
>>> 发件人: amd-gfx <amd-gfx-boun...@lists.freedesktop.org> 代表 
>>> Changfeng.Zhu
>>> 发送时间: 2019年11月20日 17:14
>>> 收件人: Koenig, Christian <christian.koe...@amd.com>; Xiao, Jack 
>>> <jack.x...@amd.com>; Zhou1, Tao <tao.zh...@amd.com>; Huang, Ray 
>>> <ray.hu...@amd.com>; Huang, Shimmer <xinmei.hu...@amd.com>; 
>>> amd-gfx@lists.freedesktop.org
>>> 抄送: Zhu, Changfeng <changfeng....@amd.com>
>>> 主题: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in 
>>> amdgpu_virt
>>>
>>> From: changzhu <changfeng....@amd.com>
>>>
>>> It may lose gpuvm invalidate acknowldege state across power-gating off 
>>> cycle. To avoid this issue in virt invalidation, add semaphore acquire 
>>> before invalidation and semaphore release after invalidation.
>>>
>>> Change-Id: Ie98304e475166b53eed033462d76423b6b0fc25b
>>> Signed-off-by: changzhu <changfeng....@amd.com>
>>> ---
>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 26 ++++++++++++++++++++++--  
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h |  3 ++-
>>>     drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c    |  3 ++-
>>>     3 files changed, 28 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>>> index f04eb1a64271..70ffaf91cd12 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>>> @@ -135,7 +135,8 @@ void amdgpu_virt_kiq_wreg(struct amdgpu_device 
>>> *adev, uint32_t reg, uint32_t v)
>>>     
>>>     void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
>>>                                             uint32_t reg0, uint32_t reg1,
>>> -                                   uint32_t ref, uint32_t mask)
>>> +                                   uint32_t ref, uint32_t mask,
>>> +                                   uint32_t sem)
>>>     {
>>>             struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>>>             struct amdgpu_ring *ring = &kiq->ring; @@ -144,9 +145,30 @@ 
>>> void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
>>>             uint32_t seq;
>>>     
>>>             spin_lock_irqsave(&kiq->ring_lock, flags);
>>> -   amdgpu_ring_alloc(ring, 32);
>>> +   amdgpu_ring_alloc(ring, 60);
>>> +
>>> +   /*
>>> +    * It may lose gpuvm invalidate acknowldege state across power-gating
>>> +    * off cycle, add semaphore acquire before invalidation and semaphore
>>> +    * release after invalidation to avoid entering power gated state
>>> +    * to WA the Issue
>>> +    */
>>> +
>>> +   /* a read return value of 1 means semaphore acuqire */
>>> +   if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
>>> +       ring->funcs->vmhub == AMDGPU_MMHUB_1)
>>> +   amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);
>>> +
>>>             amdgpu_ring_emit_reg_write_reg_wait(ring, reg0, reg1,
>>>                                                 ref, mask);
>>> +   /*
>>> +    * add semaphore release after invalidation,
>>> +    * write with 0 means semaphore release
>>> +    */
>>> +   if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
>>> +       ring->funcs->vmhub == AMDGPU_MMHUB_1)
>>> +   amdgpu_ring_emit_wreg(ring, sem, 0);
>>> +
>>>             amdgpu_fence_emit_polling(ring, &seq);
>>>             amdgpu_ring_commit(ring);
>>>             spin_unlock_irqrestore(&kiq->ring_lock, flags); diff --git 
>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>>> index b0b2bdc750df..bda6a2f37dc0 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>>> @@ -295,7 +295,8 @@ uint32_t amdgpu_virt_kiq_rreg(struct amdgpu_device 
>>> *adev, uint32_t reg);  void amdgpu_virt_kiq_wreg(struct amdgpu_device 
>>> *adev, uint32_t reg, uint32_t v);  void 
>>> amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
>>>                                             uint32_t reg0, uint32_t rreg1,
>>> -                                   uint32_t ref, uint32_t mask);
>>> +                                   uint32_t ref, uint32_t mask,
>>> +                                   uint32_t sem);
>>>     int amdgpu_virt_request_full_gpu(struct amdgpu_device *adev, 
>>> bool init);  int amdgpu_virt_release_full_gpu(struct amdgpu_device 
>>> *adev, bool init);  int amdgpu_virt_reset_gpu(struct amdgpu_device 
>>> *adev); diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> index f25cd97ba5f2..1ae59af7836a 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> @@ -448,9 +448,10 @@ static void gmc_v9_0_flush_gpu_tlb(struct 
>>> amdgpu_device *adev, uint32_t vmid,
>>>                             !adev->in_gpu_reset) {
>>>                     uint32_t req = hub->vm_inv_eng0_req + eng;
>>>                     uint32_t ack = hub->vm_inv_eng0_ack + eng;
>>> +           uint32_t sem = hub->vm_inv_eng0_sem + eng;
>>>     
>>>                     amdgpu_virt_kiq_reg_write_reg_wait(adev, req, ack, tmp,
>>> -                           1 << vmid);
>>> +                                              1 << vmid, sem);
>>>                     return;
>>>             }
>>>     
>>> --
>>> 2.17.1
>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

答复: 答复: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in amdgpu_virt

Reply via email to